R Programing, Tips and Gotchas
R Gotchas
R has several 'features' that can trip up those who are not aware of them.
Non-Value Values.
R contains several values are specially treated. All data types have an NA value. Lists can have NULL value. Numeric vectors have Inf, -Inf, and NaN values in addition to the NA value. All of these values are distinct from each other.
The NA VALUE
- Commonly referred to as 'Not Applicable' or Missing. It is most like a representation of all possible values. For this reason almost any operation applied to a NA will return NA.
- NA + 5 is equivalent to all possible values + 5 which should equal all possible values.
NA + 5
[1] NA
- A[NA,] is asking for all possible rows as such it returns a vector of NAs representing all possible values for those columns.
A <- matrix(1:25, ncol=5)
A[NA,]
[,1] [,2] [,3] [,4] [,5]
[1,] NA NA NA NA NA
[2,] NA NA NA NA NA
[3,] NA NA NA NA NA
[4,] NA NA NA NA NA
[5,] NA NA NA NA NA
- Two operations that do not all ways return NA when applied to a NA are AND and OR operations.
- A & FALSE must be false. There is no possible value for A which would make this statement true. Therefor NA & FALSE is equal to FALSE.
NA & FALSE
[1] FALSE
- A | TRUE must be true. There is no possible value for A which would make this statement false. Therefor NA | TRUE is equal to TRUE.
NA | TRUE
[1] TRUE
- It is impossible to directly compare an NA to anything. In order to check if a value is equal to NA the is.na() function must be used.
is.na(NA)
[1] TRUE
The NULL Value
- A NULL special value meaning essentially 'has no value'.
- Comparing NULL to anything is an invalid question. In order to check if a value is equal to NULL the is.null() function must be used.
- An element of a list that is equal to NULL means that this element contains no data.
- Assigning the NULL value to an element of a list indicates that the data presently residing in that location should be forgotten about.
The Inf, -Inf, and NaN Values
Environments
Object Name Confusion
Object name confusion occurs when the variable that you think you are using is not the same as the variable that you are actually using.
- In functions typos can lead to functions that work but are using values from the parent enviroment instead of throwing an error.
- If attach(), with() or within() functions are used there is a chance that this will lead to confusion on the users part about which objects are being referenced.
- Lets say we want to add the object mod to column a in the data frame junk.
mod <- 15
junk <- data.frame(a = 1:10)
with(junk, a + mod) #[1] 16 17 18 19 20 21 22 23 24 25
- What happens if there column in 'junk' named 'mod'?
mod <- 15
junk <- data.frame(a = 1:10, mod = 6:15)
with(junk, a + mod) #[1] 7 9 11 13 15 17 19 21 23 25
TRUE and FALSE vs. T and F
&& vs &
- Logical operations come in two forms. Vectorized and Non-Vectorized. The single character version ('&') is the vectorized version and the double character version ('&&') is the non vectorized version.
--
CharlesDupont - 23 Jun 2009