R Programming
Handouts and Online Books
Common R Programming Techniques That Can Be Improved
Attaching Data Frames
Attaching a data frame makes it easy to reference its variables, but if you have two data frames attached things can get confusing, and users frequently forget to detach an attached object. Better to use
data=, (when calling functions that use the statistical modeling language),
with
or
within
.
within
is an alternative to
transform
and
upData
. Unlike
with
,
within
allows you to change or add variables in the referenced data frame if it was a storable object (e.g., not subscripted).
xyplot(y ~ x | g, data=mine)
with(mine,
{
plot(x, y)
plot(x, z)
})
within(mine,
{
y <- 2*y
x <- x-1
new <- x + y
}
Logical Operations
R can subscript using integer vectors consisting of those subscripts meeting a certain condition, or using logical TRUE/FALSE vectors whose lengths are the lengths of the original objects tested. The latter usually leads to more readable and reliable code. Instead of
male <- which(sex=='male')
al <- which(state=='AL')
mean(x[intersect(male,al)]) # mean of male Alabamians
use
mean(x[sex=='male' & state=='AL']) #or:
maleal <- sex=='male' & state=='AL'
mean(x[maleal])
Repetitive Statistical Analyses
Instead of
spearman2(y ~ x1 + x2, data=mydata, subset=sex=='male')
spearman2(y ~ x1 + x2, data=mydata, subset=sex=='female')
consider
for(sx in levels(mydata$sex))
{
cat('\n------------------------------\n', sx, '\n\n')
s <- spearman2(y ~ x1 + x2, data=mydata, subset=sex==sx)
print(s)
}