Final Project Analysis File

By now you must have completed importing, annotating, and recoding the data you will use in your final project, and turn in an html report that shows the code you used to do this as well as nicely formatted data dictionary (contents(d)) and descriptive statistics (describe(d)) output. Make sure you point out the dependent variable(s) and look closely and their raw distribution.

Use good R programming practices to the extent possible. The upData function can be used to recode, label, and provide units of measurement all at once.

There are capabilities in the Hmisc package to help you manage variable names and labels in special ways, for example if you are importing a spreadsheet that contains variable labels and you need to add short variable names.

See this for in-depth information about analysis file creation.

You may want to keep the code for creating the analysis file in a separate script, and have it store the annotated data ready for analysis. Here is an outline of an approach using this technique.

# Import data from the web or from a local file on your computer
# Annotate with upData etc.
# Suppose the dataset is named d and the project is named alpha
saveRDS(d, 'alpha.rds')    # store data frame or data table in compact R binary format
...
# To read the data into your final project analysis script:
d <- readRDS('alpha.rds')
Topic revision: r1 - 01 Mar 2023, FrankHarrell
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback