Methods of Retrieving Datasets
Most of the datasets on this site are in the S
dumpdata
format (file suffix of
.sdd
) and R compressed
save()
file
format (suffix of
.sav
). Some datasets are available in
Excel or ASCII formats.
- To manually download and install a dataset, right click on a file to save it to a temporary disk location, e.g., into a directory such as
\windows\temp
or /tmp
- In S-Plus 2000 or 6.x you can import the datasets using the
File ... Import ... S-Plus Transport File
dialog
- Alternatively, use the S-Plus or R command
data.restore('/mydir/file.sdd')
- In R,
data.restore
is found in the foreign
package, but the save
files are easier to use.
- If you have version 1.4-1 or later of the
Hmisc
package for R you can download and load()
a dataset by just typing getHdata(dataset name)
. To list available dataset names just type getHdata()
. Type ?getHdata
to see other options including ones to browse a dataset's html(contents())
file or its description file (if available) on our web site. Here's an example:
getHdata(prostate)
attach(prostate)
...
-
getHdata
is available for recent releases of S-Plus in Hmisc
. If your Windows system does not have the wget
executable you must install it for getHdata
or download.file
to work. You may obtain wget.exe
here. Download to a temp file, unzip, and put wget.exe
in the same directory in which Windows stores ftp.exe
.
- In S-Plus 5.x or 6.x you will need to run imported data frames through the
Hmisc
library's cleanup.import
function if not using getHdata
, e.g., pbc <- cleanup.import(pbc)
to remove class=es that are not allowed in Version 4 of the S language due to its inability to handle multiple inheritance. If using =.sdd
files in R
you may want to also run the files through cleanup.import
to store them more efficiently (save
files are already stored efficiently). When using getHdata
in S-Plus it will automatically run cleanup.import
for you.
- It is best to use the datasets with the
Hmisc
library in effect. Among other things, this will allow you to use the Hmisc describe
and contents
functions to obtain documentation about the variables.
--
FrankHarrell - 24 Jan 2004