You are here: Vanderbilt Biostatistics Wiki>Main Web>RCookBook (revision 3)EditAttach

R Cook Book

This page is dedicated to the little R functions that we write to do various things. Please feel free to add to the list or make improvements to current functions. If you do add (or change) something, please put your code into a (existing or new) category and include a small description that explains what your code does (for extra points include an example or two as well). Thanks!

String Manipulation

Splitting into segments

Here's a function to split a string into segments of length n. The default value of n is 80, since most terminals have rows that are 80 characters long.

strSegs <- function(x, n=80) {
  retval <- c()
  while(nchar(x) > n) {
    segment <- substr(x, 1, n)
    x <- substr(x, n+1, nchar(x))

    retval <- c(retval, segment)
  }
  retval <- c(retval, x)
}
Example:

myString <- "This is a string that I want to split into segments."
segments <- strSegs(myString, 10)
cat(segments, sep="\n")

Data Frame Manipulation

Multisort

Here's a function that sorts each column of a data frame in decreasing or increasing order (the normal sort method only does either decreasing or increasing for all columns). I took the idea for this function from Cole Beck. You give the function your data frame and a series of name/value pairs, where each name is the name of a column, and each value is either TRUE or FALSE (TRUE for a decreasing sort and FALSE for an increasing sort.) The sort precedence goes from left to right. You do not have to specify all columns, but you must specify at least one.

multisort <- function(x, ...) {
  # given the data frame x, and a list of name/value pairs, sort
  pairList <- list(...)

  # make sure names from ... are valid
  tmp <- intersect(names(pairList), names(x))
  if (length(tmp) == 0 || tmp != names(pairList)) {
    stop("Invalid column names.")
  }

  retval <- x
  for (i in length(pairList):1) {
    name <- names(pairList[i])
    if (pairList[[i]] != TRUE && pairList[[i]] != FALSE) {
      # make sure values are TRUE or FALSE only
      stop(paste("Invalid value (", pairList[[i]], ") for ", name,
           " (should be TRUE or FALSE).", sep=""))
    }

    retval <- retval[order(retval[,name], decreasing=pairList[[i]]),]
  }
  return(retval)
}

Example:

x <- data.frame(uno=sample(c("a","b","c"), 10, replace=TRUE), 
                dos=sample(c("x","y","z"), 10, replace=TRUE), 
                tres=sample(1:10, 10))
multisort(x, uno=TRUE, dos=FALSE)
multisort(x, dos=FALSE, uno=TRUE, tres=FALSE)

order.data.frame

The multisort function is inefficient because it has to rearrange the data several times. This function produces the order necessary to sort a data.frame or matrix based on a set of columns, how to handle NAs for each, and whether the column(s) should be decreasing or increasing. The sort precedence goes from left to right.

order.data.frame <- function(x, na.last=TRUE, decreasing=FALSE)
{
    len <- dim(x)[2]
    len2 <- length(na.last)
    len3 <- length(decreasing)
    if (len < len2)
        na.last <- na.last[1 : len]
    if (len > len2)
        na.last[(len2 + 1) : len] <- TRUE
    if (len < len3)
        decreasing <- decreasing[1 : len]
    if (len > len3)
        decreasing[(len3 + 1) : len] <- FALSE
    ox <- 1:dim(x)[1]
    for (i in len:1)
    {
        ox <- order(x[,i], order(ox) * (1 - 2 * decreasing[i]),
                    na.last=na.last[i], decreasing=decreasing[i])
    }
    return(ox)
}

Example:

x <- data.frame(uno=sample(c("a","b","c"), 10, replace=TRUE), 
                dos=sample(c("x","y","z"), 10, replace=TRUE), 
                tres=sample(1:10, 10))
x[order.data.frame(x[c("uno", "dos")],
                   decreasing=c(TRUE, FALSE)), ]
x[order.data.frame(x[c("dos", "uno", "tres")],
                   decreasing=c(FALSE, TRUE, FALSE)), ]

Printing (small) data.frames

I wanted print.data.frame to look more like the results of a query in MySql. So I overwrote it with this:

setMethod("print", signature(x="data.frame"),
    function(x, ...) {
        size=c()
#         for each column
        for(j in 1:dim(x)[2]) {
            len=nchar(names(x)[j])
            for(i in 1:dim(x)[1]) {
                if(nchar(as.character(x[i,j])) > len) {
                    len<-nchar(as.character(x[i,j]))
                }
            }
            size<-append(size, len+2)
        }
#         size is the number of '-' characters to display
        newrow=""
        for(s in size) {
            newrow<-paste(c(newrow,"+",rep("-",s)),collapse="")
        }
        newrow<-paste(c(newrow,"+\n"),collapse="")
#         newrow is an empty row to print out
        header=""
        for(j in 1:dim(x)[2]) {
            diff=size[j]-nchar(names(x)[j])-1
            header<-paste(c(header,"| ",names(x)[j],rep(" ",diff)),collapse="")
        }
        header<-paste(c(header,"|\n"),collapse="")
        cat(newrow)
        cat(header)
        cat(newrow)
        for(i in 1:dim(x)[1]) {
            row=""
            for(j in 1:dim(x)[2]) {
                diff=size[j]-nchar(as.character(x[i,j]))-1
                row<-paste(c(row,"| ",as.character(x[i,j]),rep(" ",diff)),collapse="")
            }
            row<-paste(c(row,"|\n"),collapse="")
            cat(row)
        }
        cat(newrow)
    }
)
This may not look very good for a data.frame that has many columns. (someone want to extend that functionality?)

Here's an example:

data.frame(Car=c(16, 22, 07, 31, 2, 5, 8, 99, 7, 24, 66, 11, 29,
88, 48, 9, 17, 32, 43, 14, 6, 19, 26, 42, 01, 12, 45, 96, 10,
38, 21, 41, 20, 40, 1, 25, 55, 4, 18),
Driver=c("Greg Biffle", "Dave Blaney", "Clint Bowyer",
"Jeff Burton", "Kurt Busch", "Kyle Busch", "Dale Earnhardt",
"Carl Edwards", "Robby Gordon", "Jeff Gordon", "Jeff Green",
"Denny Hamlin", "Kevin Harvick", "Dale Jarrett",
"Jimmie Johnson", "Kasey Kahne", "Matt Kenseth",
"Travis Kvapil", "Bobby Labonte", "Sterling Marlin",
"Mark Martin", "Jeremy Mayfield", "Jamie McMurray",
"Casey Mears", "Joe Nemechek", "Ryan Newman", "Kyle Petty",
"Tony Raines", "Scott Riggs", "Elliott Sadler", "Ken Schrader",
"Reed Sorenson", "Tony Stewart", "David Stremme",
"Martin Truex", "Brian Vickers", "Michael Waltrip",
"Scott Wimmer", "J.J. Yeley"),
Crew_Chief=c("Doug Richert", "Kevin Hamlin", "Gil Martin",
"Scott Miller", "Roy McCauley", "Alan Gustafson",
"Tony Eury Jr.", "Wally Brown", "Greg Erwin", "Steve Letarte",
"Bootie Barker", "Mike Ford", "Todd Berrier", "Slugger Labbe",
"Chad Knaus", "Kenny Francis", "Robbie Reiser", "Gary Putnam",
"Todd Parrott", "Doug Randolph", "Pat Tryson", "Chris Andrews",
"Bob Osborne", "Donnie Wingo", "Ryan Pemberton", "Matt Borland",
"Paul Andrews", "Philippe Lopez", "Rodney Childers",
"Tommy Baldwin Jr.", "David Hyder", "Jimmy Elledge",
"Greg Zipadelli", "Steven Lane", "Kevin Manion", "Lance McGrew",
"Joe Shear Jr.", "Chris Carrier", "Steve Addington"))

And here's the output:
+-----+-----------------+-------------------+
| Car | Driver          | Crew_Chief        |
+-----+-----------------+-------------------+
| 16  | Greg Biffle     | Doug Richert      |
| 22  | Dave Blaney     | Kevin Hamlin      |
| 7   | Clint Bowyer    | Gil Martin        |
| 31  | Jeff Burton     | Scott Miller      |
| 2   | Kurt Busch      | Roy McCauley      |
| 5   | Kyle Busch      | Alan Gustafson    |
| 8   | Dale Earnhardt  | Tony Eury Jr.     |
| 99  | Carl Edwards    | Wally Brown       |
| 7   | Robby Gordon    | Greg Erwin        |
| 24  | Jeff Gordon     | Steve Letarte     |
| 66  | Jeff Green      | Bootie Barker     |
| 11  | Denny Hamlin    | Mike Ford         |
| 29  | Kevin Harvick   | Todd Berrier      |
| 88  | Dale Jarrett    | Slugger Labbe     |
| 48  | Jimmie Johnson  | Chad Knaus        |
| 9   | Kasey Kahne     | Kenny Francis     |
| 17  | Matt Kenseth    | Robbie Reiser     |
| 32  | Travis Kvapil   | Gary Putnam       |
| 43  | Bobby Labonte   | Todd Parrott      |
| 14  | Sterling Marlin | Doug Randolph     |
| 6   | Mark Martin     | Pat Tryson        |
| 19  | Jeremy Mayfield | Chris Andrews     |
| 26  | Jamie McMurray  | Bob Osborne       |
| 42  | Casey Mears     | Donnie Wingo      |
| 1   | Joe Nemechek    | Ryan Pemberton    |
| 12  | Ryan Newman     | Matt Borland      |
| 45  | Kyle Petty      | Paul Andrews      |
| 96  | Tony Raines     | Philippe Lopez    |
| 10  | Scott Riggs     | Rodney Childers   |
| 38  | Elliott Sadler  | Tommy Baldwin Jr. |
| 21  | Ken Schrader    | David Hyder       |
| 41  | Reed Sorenson   | Jimmy Elledge     |
| 20  | Tony Stewart    | Greg Zipadelli    |
| 40  | David Stremme   | Steven Lane       |
| 1   | Martin Truex    | Kevin Manion      |
| 25  | Brian Vickers   | Lance McGrew      |
| 55  | Michael Waltrip | Joe Shear Jr.     |
| 4   | Scott Wimmer    | Chris Carrier     |
| 18  | J.J. Yeley      | Steve Addington   |
+-----+-----------------+-------------------+
Edit | Attach | Print version | History: r10 | r4 < r3 < r2 < r1 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r3 - 25 Apr 2007, WillGray
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback