RmsShortAdd < Main < Vanderbilt Biostatistics Wiki

You are here: Vanderbilt Biostatistics Wiki>Main Web>StatComp>RmS>RmsShortAdd (24 May 2016, FrankHarrell)Edit Attach

RMS Short Course Supplemental Material

`rms` Package

Purpose

Make everyday statistical modeling easier to do
Make modern statistical methods easy to incorporate into everyday work
Make it easy to use the bootstrap to validate models
Provide "model presentation graphics"

Chapter 2: Why Regression?

Regression Can Do ...

Prediction, capitalizing on efficient estimation methods such as maximum likelihood and the predominant additivity in a variety of problems
- E.g.: effects of age, smoking, and air quality add to predict lung capacity
- When effects are predominantly additive, or when there aren't too many interactions and one knows the likely interacting variables in advance, regression can beat machine learning techniques that assume interaction effects are likely to be as strong as main effects
Separate effects of variables (especially exposure and treatment)
Hypothesis testing
Deep understanding of uncertainties associated with all model components
- Simplest example: confidence interval for the slope of a predictor
- Confidence intervals for predicted values; simultaneous confidence intervals for a series of predicted values
  - E.g.: confidence band for y over a series of x's

Alternative: Stratification

Cross-classify subjects on the basis of the Xs, estimate a property of Y for each stratum
Only handles a small number of Xs
Does not handle continuous X

Alternative: Single Trees (recursive partitioning/CART)

Interpretable because they are over-simplified and usually wrong
Cannot separate effects
Finds spurious interactions
Require huge sample size
Do not handle continuous X effectively; results in very heterogeneous nodes because of incomplete conditioning
Tree structure is unstable so insights are fragile

Alternative: Machine Learning

E.g. random forests, bagging, boosting, support vector machines, neural networks
Allows for high-order interactions and does not require pre-specification of interaction terms
Almost automatic; can save analyst time and do the analysis in one step (long computing time)
Uninterpretable black box
Effects of individual predictors are not separable
Interaction effects (e.g., differential treatment effect = precision medicine = personalized medicine) not available
Because of not using prior information about dominance of additivity, can require 200 events per candidate predictor when Y is binary
- Logistic regression may require 20 events per candidate predictor
- Can create a demand for "big data" where additive statistical models can work on moderate-size data

Topic revision: r6 - 24 May 2016, FrankHarrell

Main

Department Home Page

Biostatistics Graduate Program

Vanderbilt University Medical Center

Biostatistics Webs
- Archive
- Main
- Sandbox
- System

Copyright &© 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback