R Programming | ProgrammingTipsForStatisticians

Google's R style guide

Handouts and Online Books

Common R Programming Techniques That Can Be Improved

Attaching Data Frames

Attaching a data frame makes it easy to reference its variables, but if you have two data frames attached things can get confusing, and users frequently forget to detach an attached object. Better to use data=, (when calling functions that use the statistical modeling language), with or within. within is an alternative to transform and upData. Unlike with, within allows you to change or add variables in the referenced data frame if it was a storable object (e.g., not subscripted).

xyplot(y ~ x | g, data=mine)
   plot(x, y)
   plot(x, z)
   y <- 2*y
   x <- x-1
   new <- x + y

Logical Operations

R can subscript using integer vectors consisting of those subscripts meeting a certain condition, or using logical TRUE/FALSE vectors whose lengths are the lengths of the original objects tested. The latter usually leads to more readable and reliable code. Instead of

male <- which(sex=='male')
al <- which(state=='AL')
mean(x[intersect(male,al)])  # mean of male Alabamians

mean(x[sex=='male' & state=='AL'])  #or:
maleal <- sex=='male' & state=='AL'

Repetitive Statistical Analyses

Instead of

spearman2(y ~ x1 + x2, data=mydata, subset=sex=='male')
spearman2(y ~ x1 + x2, data=mydata, subset=sex=='female')

for(sx in levels(mydata$sex))
   cat('\n------------------------------\n', sx, '\n\n')
   s <- spearman2(y ~ x1 + x2, data=mydata, subset=sex==sx)
Topic revision: r4 - 13 Feb 2014, JoAnnAlvarez

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback