A Way to Import SAS Data to R With All the Formats and Labels

If you have a SAS .sas7bdat file with your data and want to open it in R, this is one way to do it.

Write a separate file with your formats.

The formats need to be on a separate .sas7bdat or sas7bcat file. Use proc format with cntlout option.

Use stat/transfer to create the .Rdata file.

You will need stat/transfer version 9 to make a .Rdata file. To get the formats, go to the Options (3) tab and go under "Reading SAS value labels." Put the path and filename of the formats file you created in the last step.

Load the new data frame into R.

In R, type, load("newfilename.rdata"). This will load the data frame into memory. The name of the data frame is whatever you named it in stat/transfer, on the Transfer tab in the Table field. If you don't remember what you named it, after you load the data frame, type ls() (with empty parentheses) to get a list of objects currently in memory.

Label the variables with the labels you had in the SAS file and apply the formats.

You will notice that the new data frame does not have variables or formats. However, if you look at your data frame's attributes, you will notice that this information was pulled in by stat/transfer. You will need to use the upData function in the Hmisc package, so you need to install it if you don't all ready have it. Load the Hmisc package by entering library(Hmisc).

Thomas Dupont wrote the following code to assign the variable labels. In this example, the .rdata file is called "plfirstyear.rdata," and the actual data frame is called plfirstyear:

#############################################################################################################
## importpl.R Import sas data 
## 1.) Use stat transfer to read in data.
## 2.) Assign the variable labels and formats.
#############################################################################################################
setwd("~/cepidata/Projects/PrincipalLeadership/DataR/dataprocessing")

#This puts together all the commands and options we need to give to stat/transfer.
input <- c("ex batch", 
   paste("copy '..\\..\\Data_SAS\\Merge\\MergedDatasets\\plfirstyearmerge.sas7bdat' importraw\\plfirstyear.rdata /Y"),
   "quit")

#This opens stat/transfer under wine. (I'm running on a linux machine.)
system(paste("wine", shQuote("/home/ruddjm/.wine/drive_c/Program Files/StatTransfer9/st.exe")), input=input);cat("\n")


## 2.) Assign the variable labels.
load("importraw/plfirstyear.rdata")
library(Hmisc)


attrs <- attributes(plfirstyear)

#names(plfirstyear) is a vector of all the column (variable) names.
#attr$var.labels is where all the variable labels from SAS are stored. 
#Initially, attr$var.labels has no names, so names(attr$var.labels) is empty.
#The next line of code names attr$var.labels so that we can use this vector of labels in the upData funtion, which requires a named vector.

names(attrs$label.table) <- names(attrs$var.labels) <- names(plfirstyear)

#attrs$label.table is where all the formats from the SAS formats file are stored.
#It is a list with length equal to the number of columns in the new data frame.

factor.levels <- lapply(attrs$label.table,
      function(l) {
         if(is.null(l)) NULL else {
            as.list(l)
         }
      }
)   
   
#Many of the objects in the list are empty, since many of the variables in the data frame are not factors and thus don't need value labels.
#The next line of code replaces this vector of formats without all the empty elements and removes factors for the character variables.
   
factor.levels <- factor.levels[!sapply(factor.levels, is.null) & !sapply(plfirstyear, is.character)]

assign("plfirstyear", upData(plfirstyear, labels = attrs$var.labels, levels = factor.levels))

save("plfirstyear", file="plfirstyear.rda")   
 

How to Run Stat/Transfer in Batch Mode

As an alternative to the windows interface, you can run stat/transfer in batch mode using the stat/transfer command processor. This is useful when you have lots of data sets that you want to transfer or when your data sets may change.

Here is a file called batch.stcmd that the code above executes. It is using code for command line use of stat transfer.
log using "stattransferlog"
// Pull in the formats
set read-sas-fmts Y
set read-fmt-name "..\\..\\Data_SAS\\Formats\\textformats.sas7bdat"
drop tdhelp, tdchange, pdca16s, pdhelp, pdchange // These are variable names that I want to drop.

Tips

  • In SAS, you can use a format to apply to only some values, but leave unspecified values as they are. When using factors in R, any values that are not listed in the factor's levels are considered missing.
Edit | Attach | Print version | History: r8 | r7 < r6 < r5 < r4 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r6 - 27 Jan 2010, JoAnnAlvarez
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback