Changes and Additions Planned for the Second Edition of Regression Modeling Strategies

  • Add many new references published from 2001-2007
  • Add information on the hazards of dichotomizing continuous variables as described in CatContinuous. Include the fact that people who dichotomize predictors when fearing the appropriateness of the linear assumption are assuming far more about the true relationship (piecewise flatness).
  • Add a reference to American Journal of Human Genetics, Dec 2001, 69:1357-1369 with a brief discussion of points raised by Terry Therneau (and perhaps a brief discussion of some of the bad work being published now in gene expression analysis and molecular marker research, referencing editorial by David Ransohoff in Nature Reviews):
    • "For a genome-wide scan looking for QTL's (gene locations that are associated with disease), on the order of 1000 sites will be tested. The authors show that for a study of 1000 families (parents + 2 children), with complete data and no errors, that the final maximum estimated beta is almost a constant, when plotted versus the true largest effect size. This appears to be an extreme form of what you have preached for years, concerning coefficient inflation in stepwise regression."
  • Add general glm references e.g. Nelder
  • Make most references to S-Plus refer to S (meaning that it applies to R also)
  • Use the new URL for the book's web page
  • In all examples that use datasets from our web site, include the getHdata( ) command at the start of the code
  • Improve data reduction case studies, adding emphasis on pure redundancy analysis in which each predictor is attempted to be predictor from each other predictor
  • Remove mention of the use of transcan for multiple imputation
  • Write up aregImpute algorithm and recommend it for multiple imputation
    • Talk about confidence interval coverage and potential problem with PMM
  • Use aregImpute in multiple imputation examples
  • In the general modeling strategy chapter add:
    • General statement to the effect that if a procedure can hurt the model (e.g., reduce the complexity of an apparently weak predictor by removing the most important part (such as nonlinear effects) from how it is expressed in the model
    • Describe a strategy in which one uses the partial chi-square statistics minus their respective d.f., without partitioning it into linear and nonlinear effects and without examining the individual regression coefficients, to specify the complexity of each predictor in the model. This is an alternative to the generalized Spearman rho^2 approach currently described in the book.
  • Clarify that the number of parameters in the model (the model's d.f.) does not equal the number of predictors, when discussing the 15:1 rule
  • Reference simulation studies on the web in the text
  • Re-run the ols fit in Section 7.7 to see if tol needs to be specified to ols( )
  • Make a correction reported by Mark Grant (markg@uic.edu): "on page 173 the single imputation for $wt of obs 193 (right column) should be 104 not 262."
  • Add a chapter on generalized least squares for analyzing serial response data. Many statisticians are using mixed models and GEE for analyzing serial data, but for the common case of hierarchical models with a single level of clustering (the subject), generalized least squares is quite elegant and its assumptions are easy to understand.
Topic revision: r4 - 17 Mar 2007, FrankHarrell
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback