You are here:
Vanderbilt Biostatistics Wiki
>
Main Web
>
DataSets
>
RSDatasetRetrieval
(24 Jan 2008,
FrankHarrell
)
(raw view)
E
dit
A
ttach
---+ Methods of Retrieving Datasets Most of the datasets on this site are in the S =dumpdata= format (file suffix of =.sdd=) and R compressed =save()= file format (suffix of =.sav=). Some datasets are available in Excel or ASCII (=csv=) formats. * To manually download and install a dataset, right click on a file to save it to a temporary disk location, e.g., into a directory such as =\windows\temp= or =/tmp= * In S-Plus you can import the datasets using the =File ... Import ... S-Plus Transport File= dialog * Alternatively, use the S-Plus or R command =data.restore('/mydir/file.sdd')= * In =R=, =data.restore= is found in the =foreign= package, but the binary =save= files are much better to use. * If you have issued =library(Hmisc)= in R or S-Plus you can download and =load()= a dataset by just typing =getHdata(dataset name)=. To list available dataset names just type =getHdata()=. Type =?getHdata= to see other options including ones to browse a dataset's =html(contents())= file or its description file (if available) on our web site. Here's an example: <verbatim> getHdata(prostate) attach(prostate) ... </verbatim> If using S-Plus, your system does not have the =wget= executable you must install it for =getHdata= or =download.file= to work. Windows users may obtain =wget.exe= [[http://www.physionet.org/physiotools/utilities/wget/wget-1.8.1b.zip][here]]. Download to a temp file, unzip, and put =wget.exe= in the same directory in which Windows stores =ftp.exe=. * In S-Plus 5 or later you will need to run imported data frames through the =Hmisc= library's =cleanup.import= function if not using =getHdata=, e.g., =pbc <- cleanup.import(pbc)= to remove object classes that are not allowed in Version 4 of the S language due to its inability to handle multiple inheritance. If using =.sdd= files in =R= you may want to also run the files through =cleanup.import= to store them more efficiently (=save= files are already stored efficiently). When using =getHdata= in S-Plus it will automatically run =cleanup.import= for you. * In R you can also =load()= a dataset directly from the web using =load(url('http://biostat.mc.vanderbilt.edu/...foo.sav'))=. * It is best to use the datasets with the =Hmisc= library in effect. Among other things, this will allow you to use the =Hmisc describe= and =contents= functions to obtain documentation about the variables.
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r2
<
r1
|
B
acklinks
|
V
iew topic
|
Edit
w
iki text
|
M
ore topic actions
Topic revision: r2 - 24 Jan 2008,
FrankHarrell
Main
Department Home Page
Biostatistics Graduate Program
Vanderbilt University Medical Center
Main Web
Main Web Home
Search
Recent Changes
Changes
Topic list
Biostatistics Webs
Archive
Main
Sandbox
System
Register
|
Log In
Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki?
Send feedback