Make modern statistical methods easy to incorporate into everyday work
Make it easy to use the bootstrap to validate models
Provide "model presentation graphics"
Chapter 2: Why Regression?
Regression Can Do ...
Prediction, capitalizing on efficient estimation methods such as maximum likelihood and the predominant additivity in a variety of problems
E.g.: effects of age, smoking, and air quality add to predict lung capacity
When effects are predominantly additive, or when there aren't too many interactions and one knows the likely interacting variables in advance, regression can beat machine learning techniques that assume interaction effects are likely to be as strong as main effects
Separate effects of variables (especially exposure and treatment)
Hypothesis testing
Deep understanding of uncertainties associated with all model components
Simplest example: confidence interval for the slope of a predictor
Confidence intervals for predicted values; simultaneous confidence intervals for a series of predicted values
E.g.: confidence band for y over a series of x's
Alternative: Stratification
Cross-classify subjects on the basis of the Xs, estimate a property of Y for each stratum
Only handles a small number of Xs
Does not handle continuous X
Alternative: Single Trees (recursive partitioning/CART)
Interpretable because they are over-simplified and usually wrong
Cannot separate effects
Finds spurious interactions
Require huge sample size
Do not handle continuous X effectively; results in very heterogeneous nodes because of incomplete conditioning
Tree structure is unstable so insights are fragile
Alternative: Machine Learning
E.g. random forests, bagging, boosting, support vector machines, neural networks
Allows for high-order interactions and does not require pre-specification of interaction terms
Almost automatic; can save analyst time and do the analysis in one step (long computing time)
Uninterpretable black box
Effects of individual predictors are not separable
Interaction effects (e.g., differential treatment effect = precision medicine = personalized medicine) not available
Because of not using prior information about dominance of additivity, can require 200 events per candidate predictor when Y is binary
Logistic regression may require 20 events per candidate predictor
Can create a demand for "big data" where additive statistical models can work on moderate-size data