R Programing Gotchas
R Gotchas
R has several 'features' that can trip up those who are not aware of them.
Non-Value Values.
R contains several values are specially treated. All data types have an NA value. Lists can have NULL value. Numeric vectors have Inf, -Inf, and NaN values in addition to the NA value. All of these values are distinct from each other.
The NA VALUE
Some functions include a parameter for ignoring NA in the computation. For example, sum(..., na.rm=FALSE). By specifying na.rm=TRUE, you can get a non-NA result.
sum(c(1,5,10,NA))
[1] NA
sum(c(1,5,10,NA), na.rm=TRUE)
[1] 16
The NULL Value
- A NULL special value meaning essentially 'has no value'.
- Comparing NULL to anything is an invalid question. In order to check if a value is equal to NULL the is.null() function must be used.
- An element of a list that is equal to NULL means that this element contains no data.
- Assigning the NULL value to an element of a list indicates that the data presently residing in that location should be forgotten about.
Trying to access a list element by an invalid name will also result in NULL, but trying to access an index greater than the length of the list is be a subscript out of bounds error.
l <- list()
l[[1]] <- 4
l$x
NULL
l[[2]]
Error in list()[[2]] : subscript out of bounds
The Inf, -Inf, and NaN Values
Environments
Object Name Confusion
Object name confusion occurs when the variable that you think you are using is not the same as the variable that you are actually using.
- This means that typos can lead to functions that work but are using wrong values from the parent enviroment instead of throwing an error.
- If attach(), with() or within() functions are used there is a chance that this will lead to confusion on the user's part about which objects are being referenced.
- Lets say we want to add the object 'mod' to column 'a' in the data frame 'junk'.
mod <- 15
junk <- data.frame(a = 1:10)
with(junk, a + mod)
[1] 16 17 18 19 20 21 22 23 24 25
- What happens if there is a column in 'junk' named 'mod'?
mod <- 15
junk <- data.frame(a = 1:10, mod = 6:15)
with(junk, a + mod)
[1] 7 9 11 13 15 17 19 21 23 25
TRUE and FALSE vs. T and F
&& vs &
- Logical operations come in two forms. Vectorized and Non-Vectorized. The single character version ('&') is the vectorized version and the double character version ('&&') is the non vectorized version.
Partial Argument Matching
R performs partial argument matching on all function calls. It attempts to match names of the arguments passed to the function with the names of defined arguments of the function.
- Partial argument matching can cause difficulties when attempting to pass arguments through the '...' argument.
test1 <- function(x=5, b=2) {
b*x-5
}
test2 <- function(f, bob="Hi", ...) {
print(bob)
f(...)
}
test2(test1)
[1] "Hi"
[1] 5
test2(test1, b=8)
[1] 8
[1] 5
- We would expect that b=8 should have altered the returned value of the function. Looking closer at the test2() function arguments reveals the answer. test() has an argument 'bob' which is before the '...' argument. Partial argument matching happens mapping b=4 to bob=4. Following that no unmatched arguments remain to be assigned to the '...' argument.
The read.table() Function
- The read.table and data.frame() functions by default convert string vectors into factor vectors.
- NA value conversion
- By default read.table() converts the string value "NA" to the R NA value.
- For non-character vectors the zero length string value "" is converted to the R NA value.
- For character vectors the zero length string value is kept as is.
cat('A,B
1,a
,b
5,
NA,"NA"
6,NA
', file="tmp.csv")
read.table(file="tmp.csv", sep=',', header=TRUE, stringsAsFactors=TRUE)
A B
1 1 a
2 NA b
3 5
4 NA <NA>
5 6 <NA>
- By default read.table() believes that a "#" character is a comment character. It will ignore all text between the "#" and the next line.
cat('A,B
7,Patient #5
8,Patient #8
', file='tmp.csv')
read.table(file="tmp.csv", sep=',', header=TRUE, stringsAsFactors=TRUE)
A B
1 7 Patient
2 8 Patient
--
CharlesDupont - 23 Jun 2009