Questions for Discussion

From FrankHarrell

  1. Methods based on filtering out "winners" result in effect estimates that are biased high. Likewise, stepwise variable selections results in regression coefficient estimates in the final model that are biased away from zero. It has been demonstrated (e.g., Tibshirani's lasso technique which does simultaneous variable selection and shrinkage) that shrinkage (discounting of estimates to account for overfitting) results in estimates that are more likely to validate. Have you considered incorporating shrinkage into your procedure?
  2. The Wilcoxon two-sample rank test has many advantages over parametric tests such as the t -test. What about making it one of the tests you are "averaging over"?
  3. The Kolmogorov-Smirnov test would provide information that is somewhat orthogonal to the information in the tests you are now combining. Would using it help enough to be worth the effort?
  4. Some test statistics are more redundant with each other than are others. Taking into account the covariance of the various test statistics might cut down on "double counting" some redundant tests, at some computational expense. In principle would this be a viable option?
  5. Using many individual t -tests results effectively uses individual estimates of the mean squared error that are inefficient. Would there be an advantage to getting some kind of pooled variance estimate?
  6. Leave-out-one cross-validation has been shown by Efron and Gong to not perform correctly when the model-building procedure involves variable selection. This is because the same variables are selected in virtually all n samples of size n-1 with all their overlap. So when one is not just fitting a pre-specified model, the leave-out-one approach does not penalize sufficiently for data mining. Why is leave-out-one still used in such situations?
Topic revision: r1 - 14 Sep 2004, FrankHarrell
This site is powered by FoswikiCopyright &© 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback