How to export data from SAS and import to R

Here are a few ways to get data from SAS to R with a focus on preserving metadata (labels and formats/factor levels).

Exporting from SAS to Stata

If you can export the SAS data to a stata .dta file, you should be able to load the foreign package in R and then use read.dta, or you can use the stata.get function in Hmisc. This option puts all the variable labels on the variables.

Using Stat/Transfer

You can use Stat/Transfer to convert any SAS dataset to Stata (probably works best - allows for long variable names and does not put blanks at end of variable labels; import into R using stata.get) or SPSS (use spss.get in R). Stat/Transfer is available on Linux and Windows computers. It is installed on the computer in the big conference room. Can export directly to R also. Some tips on getting the attributes on the data frame and running stat transfer in batch mode are here.

Using the SAS Viewer

The SAS Viewer can read sas7bdat files and export the dataset into a csv file. See http://biostat.mc.vanderbilt.edu/twiki/bin/view/Restricted/RestrictedSoftware for more information about getting the viewer and using it under Linux. It may be possible to use the SAS viewer to export SAS metadata along with the data and to use both of these files in importing SAS data into R.

Using SAS exportlib Macro with R sasxport.get Function

You must have access to SAS for this

Creating a SAS Data Library Suitable for Importing into R

LIBNAME d engine "directoryname";  * Use LIBNAME d SASV5XPT "foo.xpt"; to create transport files;
DATA d.dataset1; ....
DATA d.dataset2; ...
/* If the datasets are already created but you only want to export a
   subset of them do something like the following instead            */
LIBNAME old "olddirectoryname";
PROC COPY IN=old OUT=work; SELECT data1 data2 data3; RUN;

PROC FORMAT CNTLOUT=d.formats;RUN;  * If used any PROC FORMAT ...; VALUE ...;
* Can use CNTLOUT=formats if writing to work area;
* work can be the first argument given to the macro;
Note: If you used a newer version of SAS to create the dataset and used variable names longer than 8 characters, you'll need to specify the following option to SAS to allow truncation of names to 8 characters for creating a version 5 export file: OPTIONS VALIDVARNAME=V6;

Running SAS Job to Create csv Files

%INCLUDE "foo\exportlib.sas";    * Define macro;
LIBNAME d ...;                   * E.g. LIBNAME d SASV5XPT "foo.xpt";
/* To use regular SAS datasets (non-transport files) use for example
   LIBNAME d "olddirectory"; or LIBNAME d "."; (current working directory);*/

%exportlib(d, outdir, tempdir);
* Default outdir is . (current working directory);
* Default tempdir is C:/WINDOWS/TEMP;
This creates a .csv file in tempdir for every SAS dataset in d (including the PROC FORMAT output if any) plus a file called _contents_.csv containing PROC CONTENTS output for all datasets combined. _contents_.csv allows the SAS data import to know about variable labels, formats, and types (including date, time, date/time variables). Under Windows, this SAS job will run much faster if you store the SAS commands in a file such as exportsas.sas and you left click on exportsas.sas then click on BATCH SUBMISSION. After the job finishes you will see file exportsas.log in the same directory as exportsas.sas. The only error messages you should see are related to missing formats - ignore these.

Here are simple examples in which a single SAS dataset is exported to directory C:my/sascsv and there are no PROC FORMAT value labels. First consider the case where the dataset is the only dataset in the permanent data library.
LIBNAME d ".";  * SAS datasets are in current working directory;
%exportlib(d, C:my/sascsv);  
If the permanent data library has more than one SAS dataset but you only want to export one of them, say ds1, use for example
LIBNAME d "projects/mydatasets";  * SAS datasets are somewhere else;
DATA ds1; SET d.ds1; RUN;
%exportlib(work, C:my/sascsv);

Importing Data into R

d <- sasxport.get(file, method='csv')
# file is name of directory containing all the .csv files created by exportlib
This will produce a single data frame d if only one .csv file existed, or a list of data frames whose major elements are named by lower case versions of all the SAS datasets, with underscores replaced by periods.
See also http://www.oview.co.uk/dsread and JrSAStoR
Edit | Attach | Print version | History: r15 < r14 < r13 < r12 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r14 - 28 May 2013, JoAnnAlvarez
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback