Multiple imputation and data reduction

UVA Biostatistics Discussion Board: Regression Modeling Strategies: Multiple imputation and data reduction
By Osman Al-Radi on Tuesday, May 06, 2003 - 11:50 am:

Q: How to incorporate one or more PC1 scores generated by princomp() as predictors in a model being fit by fit.mult.impute()?

Details:
I have a 67-deaths data set of 29 variables. The frequency of missing variables ranges from 0 to 160 out of 276.

I used aregImpute with n.impute=10, to impute missing variables..

I generated PC1 for 4 different clusters of variables with princomp(....,na.action=na.exclude), I used na.exclude because impute() dose not work on aregImpute objects..

How could I use fit.mult.impute() to fit a cph() model to this data and also use the PC1 variables?

Thanks

Osman

By Frank E Harrell Jr (Feh3k) on Tuesday, May 06, 2003 - 01:54 pm:

Here is one idea for an approximate solution.

  1. Determine clusters using subject matter knowledge and using variable clustering based on pairwise deletion of missing values
  2. Run aregImpute on all the original variables (or run it separately on the set of variables in each cluster plus the response variable and do more customized programming)
  3. Invoke impute(aregImputeobject,...) in a for loop over the multiple imputations. Each time compute the first PC of each cluster and fit a Cox model.
  4. Average the fits and compute the between-fit variance-covariance matrix to get at the final imputation-adjusted covariance matrix


What worries me about this algorithm is that first PCs are arbitrary to sign changes etc. so they really don't have the same meaning across imputations in some cases. This would need to be carefully studied. Another possible option is to compute the first PCs once on the average of all the multiply imputed values, but this will not work for categorical predictors.

Someday I'm going to fully implement the derived parameter in fit.mult.impute to do away with the need to program your own loop.

If single conditional mean imputation works OK for your purposes, transcan may simplify things.

Concerning the comment about impute not working on aregImpute objects, I assume you are referring to bug related to S-Plus 6 not allowing an 'impute' class to be added to a variable. The next release of Hmisc will get around that for S-Plus 6 by not attaching an 'impute' class in impute.transcan (which works on aregImpute objects and transcan objects alike). Anyone needing this fix can contact me.