Methods of Retrieving Datasets

Most of the datasets on this site are in the S dumpdata format (file suffix of .sdd) and R compressed save() file format (suffix of .sav). Some datasets are available in Excel or ASCII formats.
  • To manually download and install a dataset, right click on a file to save it to a temporary disk location, e.g., into a directory such as \windows\temp or /tmp
  • In S-Plus 2000 or 6.x you can import the datasets using the File ... Import ... S-Plus Transport File dialog
  • Alternatively, use the S-Plus or R command data.restore('/mydir/file.sdd')
  • In R, data.restore is found in the foreign package, but the save files are easier to use.
  • If you have version 1.4-1 or later of the Hmisc package for R you can download and load() a dataset by just typing getHdata(dataset name). To list available dataset names just type getHdata(). Type ?getHdata to see other options including ones to browse a dataset's html(contents()) file or its description file (if available) on our web site. Here's an example:
     getHdata(prostate)
     attach(prostate)
     ...
  • getHdata is available for recent releases of S-Plus in Hmisc. If your Windows system does not have the wget executable you must install it for getHdata or download.file to work. You may obtain wget.exe here. Download to a temp file, unzip, and put wget.exe in the same directory in which Windows stores ftp.exe.
  • In S-Plus 5.x or 6.x you will need to run imported data frames through the Hmisc library's cleanup.import function if not using getHdata, e.g., pbc <- cleanup.import(pbc) to remove class=es that are not allowed in Version 4 of the S language   due to its inability to handle multiple inheritance.  If using   =.sdd files in R you may want to also run the files through cleanup.import to store them more efficiently (save files are already stored efficiently). When using getHdata in S-Plus it will automatically run cleanup.import for you.
  • It is best to use the datasets with the Hmisc library in effect. Among other things, this will allow you to use the Hmisc describe and contents functions to obtain documentation about the variables.

-- FrankHarrell - 24 Jan 2004

This topic: Main > WebHome > DataSets > RetrievalDatasets
Topic revision: 24 Jan 2004, WikiGuest
 
This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback