R Programing, Tips and Gotchas

R Gotchas

R has several 'features' that can trip up those who are not aware of them.

Non-Value Values.

R contains several values are specially treated. All data types have an NA value. Lists can have NULL value. Numeric vectors have Inf, -Inf, and NaN values in addition to the NA value. All of these values are distinct from each other.

The NA VALUE

  • Commonly referred to as 'Not Applicable' or Missing. It is most like a representation of all possible values. For this reason almost any operation applied to a NA will return NA.
    • NA + 5 is equivalent to all possible values + 5 which should equal all possible values.
         NA + 5
         [1] NA
      
    • A[NA,] is asking for all possible rows as such it returns a vector of NAs representing all possible values for those columns.
      A <- matrix(1:25, ncol=5)
      A[NA,]
           [,1] [,2] [,3] [,4] [,5]
      [1,]   NA   NA   NA   NA   NA
      [2,]   NA   NA   NA   NA   NA
      [3,]   NA   NA   NA   NA   NA
      [4,]   NA   NA   NA   NA   NA
      [5,]   NA   NA   NA   NA   NA
      
  • Two operations that do not all ways return NA when applied to a NA are AND and OR operations.
    • A & FALSE must be false. There is no possible value for A which would make this statement true. Therefor NA & FALSE is equal to FALSE.
      NA & FALSE
      [1] FALSE
      
    • A | TRUE must be true. There is no possible value for A which would make this statement false. Therefor NA | TRUE is equal to TRUE.
      NA | TRUE
      [1] TRUE
      
  • It is impossible to directly compare an NA to anything. In order to check if a value is equal to NA the is.na() function must be used.
    is.na(NA)
    [1] TRUE
    

The NULL Value

  • A NULL special value meaning essentially 'has no value'.
  • Comparing NULL to anything is an invalid question. In order to check if a value is equal to NULL the is.null() function must be used.
    • The A == B statement asks is the value A the same as the value B.
    • The A == NULL statement asks is the value A the same as the value which has no value. There is no value to compare against so the operation returns a logical vector of length 0.%BR
      A <- 5
      A == NULL
      logical(0)
      
  • An element of a list that is equal to NULL means that this element contains no data.
  • Assigning the NULL value to an element of a list indicates that the data presently residing in that location should be forgotten about.

The Inf, -Inf, and NaN Values

  • Inf, -Inf, and NaN are special numeric values.
  • Inf, and -Inf represent positive and negative infinity and behave accordingly.
    A <- 1/0
    B <- -1/0
    A
    [1] Inf
    B
    [1] -Inf
    5 * A
    [1] Inf
    
  • NaN is short for Not a Number. It is the result of any undefined mathematical operation.
    0/0
    [1] NaN
    

Environments

  • Locations where objects are stored.
  • All environments (except the global environment) have a parent environment.
  • All function calls are executed in its own environment that is a child of the call environment.
  • values stored in parent environment are inherited by the child.
    a <- 1
    test <- function() {
       print(a)
       invisible(NULL)
    }
    test()
    [1] 1
    
  • However the child environment only has access to copies of the original values. An modifications to values done in the child environment will not propagate to the parent environment.
    a <- 1
    test <- function() {
       print(a)
       a <- 2
       print(a)
       invisible(NULL)
    }
    test();print(a)
    [1] 1
    [1] 2
    [1] 1
    
  • A parent environment cannot access any values from the child environment.

Object Name Confusion

Object name confusion occurs when the variable that you think you are using is not the same as the variable that you are actually using.
  • In functions typos can lead to functions that work but are using values from the parent enviroment instead of throwing an error.
    • This is the function we want to write
      test1 <- function(cat) {
         5 + cat
      }
      
      test1(5)
      [1] 10
      
    • This is the function we actually wrote.
      test1 <- function(cat) {
         5 + car     ## Typo should be cat
      }
      
      test1(5)
      Error in test1() : object 'car' not found
      
    • What happens if the object 'car' exists in your working environment.
      car <- 2
      test1(5)
      [1] 7
      
  • If attach(), with() or within() functions are used there is a chance that this will lead to confusion on the users part about which objects are being referenced.
    • Lets say we want to add the object mod to column a in the data frame junk.
      mod <- 15
      junk <- data.frame(a = 1:10)
      
      with(junk, a + mod)     #[1] 16 17 18 19 20 21 22 23 24 25
      
    • What happens if there column in 'junk' named 'mod'?
      mod <- 15
      junk <- data.frame(a = 1:10, mod = 6:15)
      
      with(junk, a + mod)     #[1]  7  9 11 13 15 17 19 21 23 25
      

TRUE and FALSE vs. T and F

  • always use TRUE and FALSE Variables. Variables T and F can be assigned other values.
    T <- 0
    TRUE == T
    [1] FALSE
    
    TRUE <- 0
    Error in TRUE <- 0 : invalid (do_set) left-hand side to assignment
    

&& vs &

  • Logical operations come in two forms. Vectorized and Non-Vectorized. The single character version ('&') is the vectorized version and the double character version ('&&') is the non vectorized version.
    • The vectorized operation will function across all elements of the arguments returning a result for set.
      a <- c(1:10)
      b <- c(1:10)
      a < 7 & b > 3
      [1] FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
      
    • The non vectorized operation only compares the first elements of the arguments returning a single value.
      a <- c(1:10)
      b <- c(1:10)
      a < 7 && b > 3
      [1] FALSE
      

Programming Tips For Statisticians

-- CharlesDupont - 23 Jun 2009
Edit | Attach | Print version | History: r3 < r2 < r1 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r1 - 23 Jun 2009, CharlesDupont
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback