How to export data from SAS and import to R
Here are a few ways to get data from SAS to R with a focus on preserving metadata (labels and formats/factor levels).
Exporting from SAS to Stata
If you can export the SAS data to a stata .dta file, you should be able to load the foreign package in R and then use read.dta, or you can use the stata.get function in Hmisc. This option puts all the variable labels on the variables.
Using Stat/Transfer
You can use Stat/Transfer to convert any SAS dataset to Stata (probably works best - allows for long variable names and does not put blanks at end of variable labels; import into R using
stata.get
) or SPSS (use
spss.get
in R). Stat/Transfer is available on Linux and Windows computers. It is installed on the computer in the big conference room. Can export directly to R also. Some tips on getting the attributes on the data frame and running stat transfer in batch mode are
here.
Using the SAS Viewer
The SAS Viewer can read
sas7bdat
files and export the dataset into a
csv
file. See
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Restricted/RestrictedSoftware for more information about getting the viewer and using it under Linux. It may be possible to use the SAS viewer to export SAS metadata along with the data and to use both of these files in importing SAS data into R.
You must have access to SAS for this
Creating a SAS Data Library Suitable for Importing into R
LIBNAME d engine "directoryname"; * Use LIBNAME d SASV5XPT "foo.xpt"; to create transport files;
DATA d.dataset1; ....
DATA d.dataset2; ...
/* If the datasets are already created but you only want to export a
subset of them do something like the following instead */
LIBNAME old "olddirectoryname";
PROC COPY IN=old OUT=work; SELECT data1 data2 data3; RUN;
PROC FORMAT CNTLOUT=d.formats;RUN; * If used any PROC FORMAT ...; VALUE ...;
* Can use CNTLOUT=formats if writing to work area;
* work can be the first argument given to the macro;
Note: If you used a newer version of SAS to create the dataset and used variable names longer than 8 characters, you'll need to specify the following option to SAS to allow truncation of names to 8 characters for creating a version 5 export file:
OPTIONS VALIDVARNAME=V6;
Running SAS Job to Create csv
Files
%INCLUDE "foo\exportlib.sas"; * Define macro;
LIBNAME d ...; * E.g. LIBNAME d SASV5XPT "foo.xpt";
/* To use regular SAS datasets (non-transport files) use for example
LIBNAME d "olddirectory"; or LIBNAME d "."; (current working directory);*/
%exportlib(d, outdir, tempdir);
* Default outdir is . (current working directory);
* Default tempdir is C:/WINDOWS/TEMP;
This creates a
.csv
file in
tempdir
for every SAS dataset in
d
(including the
PROC FORMAT
output if any) plus a file called
_contents_.csv
containing
PROC CONTENTS
output for all datasets combined.
_contents_.csv
allows the SAS data import to know about variable labels, formats, and types (including date, time, date/time variables). Under Windows, this SAS job will run much faster if you store the SAS commands in a file such as
exportsas.sas
and you left click on
exportsas.sas
then click on BATCH SUBMISSION. After the job finishes you will see file
exportsas.log
in the same directory as
exportsas.sas
. The only error messages you should see are related to missing formats - ignore these.
Here are simple examples in which a single SAS dataset is exported to directory
C:my/sascsv
and there are no PROC FORMAT value labels. First consider the case where the dataset is the only dataset in the permanent data library.
LIBNAME d "."; * SAS datasets are in current working directory;
%exportlib(d, C:my/sascsv);
If the permanent data library has more than one SAS dataset but you only want to export one of them, say ds1, use for example
LIBNAME d "projects/mydatasets"; * SAS datasets are somewhere else;
DATA ds1; SET d.ds1; RUN;
%exportlib(work, C:my/sascsv);
Importing Data into R
d <- sasxport.get(file, method='csv')
# file is name of directory containing all the .csv files created by exportlib
This will produce a single data frame
d
if only one
.csv
file existed, or a list of data frames whose major elements are named by lower case versions of all the SAS datasets, with underscores replaced by periods.
See also
http://www.oview.co.uk/dsread and
JrSAStoR