Binary Response, Random Sample of 1000 Patients from the SUPPORT Study, Missing Data
Analyze the
support
dataset available at
http://biostat.mc.vanderbilt.edu/twiki/pub/Main/DataSets/support.sav (an R
save
file that can also be downloaded and
loaded using the
Hmisc getHdata
function) to develop a model to predict the probability that a patient dies in the hospital. Consider the following predictors:
age, sex, dzgroup, num.co, scoma, race, meanbp, hrt, temp, pafi, alb
. As part of your analysis do the following:
- Make a single chart showing proportions of deaths stratified by each of the other variables listed above
- Characterize patterns of missing values in the predictors by plotting missingness tendencies of single predictors and jointly of two predictors at a time, and by using recursive partitioning to determine what kind of patients tended to have a higher proportion of missing measurements for the predictor that is missing most often
- Impute missing lab data using "most normal" values; impute
race
using the most frequent category (hint: see the Hmisc impute
function)
- Initially estimate marginal relationships between continuous predictors and outcome using a nonparametric smoother
- Use marginal potential predictive discrimination of predictors to decide on how to spend degrees of freedom
- Fit a multivariable model with minimal observations deleted due to NAs
- Test partial effects of all predictors
- Graphically interpret the model three distinct ways
- Validate the model for discrimination and calibration ability