BIOS 362: Advanced Statistical Learning & Inference
Instructor
Teaching Assistants
Dates, Time, and Location
- First meeting: Tue. Jan. 10, 2023; Last meeting: Thu. Apr. 20, 2022
- Tuesday, Thursday 10:30AM-12:00PM
- Location: 111139, 11th floor, 2525 West End Ave. Nashville, TN
- Office hours: By appointment.
- We will use the Graduate School Academic Calendar
- We will not have classes the week of March 13-17, 2022, for spring break.
- ENAR occurs March 19-22, thus we will not meet Tue March 21.
Textbook
The book for this course is listed below, and free to download in PDF format at the book webpage:
Hastie, Tibshirani, Friedman. (2009) The elements of statistical learning: data mining, inference and prediction. Springer, 2nd edition.. In the course outline and class schedule, the textbook is abbreviated "HTF", often followed by chapter or page references "Ch. X-Y" or "pp. X-Y", respectively. The BibTeX entry for the book is as follows:
@book{HTF2009,
author = {Hastie, Trevor and Tibshirani, Robert and Friedman, Jerome},
title = {The elements of statistical learning: data mining, inference and prediction},
url = {http://www-stat.stanford.edu/~tibs/ElemStatLearn/},
publisher = {Springer},
year = 2009,
edition = 2
}
The wide margins of the linked PDF version of the book make it difficult to read on smart devices (e.g., an iPhone). The margins may be removed using the following ghostscript command in Linux, where "output.pdf" and "input.pdf" are substituted for the appropriate file names. Please see Dr. Shotwell for help with this.
gs -o output.pdf -sDEVICE=pdfwrite -c "[/CropBox [130 140 460 685] /PAGES pdfmark" -f input.pdf
Other Resources
Course Topics
- Overview of Supervised Learning and Review of Linear Methods: HTF Ch. 2-4
- Splines and Kernel Methods: HTF Ch. 5-6
- Model Assessment, Selection, and Inference: HTF Ch. 7-8
- Neural Networks: HTF Ch. 11
- Support Vector Machines: HTF Ch. 12
- Unsupervised Learning: HTF Ch. 14
Other information
- Unless otherwise stated, assigned homework is due in one week.
- Students are encouraged to work together on homework problems, but they must turn in their own write-ups.
- Class participation is encouraged.
- Please bring a laptop to class.
Grading
- Homework: 40%
- Take-home Midterm Exam: 30%
- Take-home Final Exam: 30%
Schedule of Topics
Date |
Reading (before class) |
Homework |
Topic/Content |
Presentation |
1/10/23 |
none |
none |
Syllabus, introduction |
Intro.pdf |
1/12/23 |
HTF Ch. 1 and Ch. 2.1, 2.2, and 2.3 |
See below: Homework 1 |
Least-squares, nearest-neighbors |
lecture-1.pdf mixture-data-lin-knn.R |
1/17/23 |
HTF Ch. 2.4 |
none |
Decision theory |
lecture-2.pdf |
1/19/23 |
none |
none |
Loss functions in practice |
lecture-2a.pdf prostate-data-lin.R |
1/24/23 |
HTF Ch. 2.7, 2.8, and 2.9 |
See below: Homework 2 |
Structured regression |
lecture-3.pdf ex-1.R ex-2.R ex-3.R |
1/26/23 |
HTF Ch. 3.1, 3.2, 3.3, 3.4 |
none |
Linear methods, subset selection, ridge, and lasso |
lecture-4a.pdf linear-regression-examples.R lecture-5.pdf lasso-example.R |
1/31/23 |
none |
none |
Linear methods for regression (cont.) |
Suggested supplemental reading: HTF Ch. 3.6, 3.7, 3.8, and 3.9. Suggested supplemental exercises: Ex. 3.12, 3.18 |
2/2/23 |
HTF Ch. 3.5 and 3.6 |
See below: Homework 3 |
Linear methods: principal components regression |
lecture-6.pdf pca-regression-example.R lec7.pdf lec8.pdf pca-and-g-inverses.html |
2/7/23 |
HTF Ch. 4.1, 4.2, and 4.3 |
none |
Linear methods: Linear discriminant analysis |
lecture-8.pdf simple-LDA-3D.R |
2/9/23 |
HTF Ch. 5.1 and 5.2 |
none |
Basis expansions: piecewise polynomials & splines |
lecture-11.pdf splines-example.R mixture-data-complete.R |
2/14/23 |
HTF Ch. 6.1-6.5 |
none |
Kernel methods |
lecture-13.pdf mixture-data-knn-local-kde.R kernel-methods-examples-mcycle.R |
2/16/23 |
HTF Ch. 7.1, 7.2, 7.3, 7.4 |
none |
Model assessment: Cp, AIC, BIC |
lecture-14.pdf effective-df-aic-bic-mcycle.R |
2/21/23 |
HTF Ch. 7.10 |
See below: Homework 4 |
Cross validation |
lecture-15.pdf kNN-CV.R Income2.csv |
2/23/23 |
HTF Ch. 9.2 |
none |
Classification and Regression Trees |
lecture-21.pdf mixture-data-rpart.R |
2/28/23 |
HTF Ch. 8.7, 8.8, 8.9 |
none |
Bagging |
lecture-18.pdf mixture-data-rpart-bagging.R nonlinear-bagging.html |
3/2/23 |
HTF Ch. 15.1, 15.2 |
none |
Random Forest |
lecture-25.pdf random-forest-example.R |
3/7/23 |
none |
none |
Random Forest (cont.) |
lecture-25.pdf random-forest-example.R |
3/9/23 |
none |
none |
Class cancelled |
|
3/14/23 |
HTF Ch. 10.1 |
none |
Boosting and AdaBoost.M1 (part 1) |
lecture-22.pdf boosting-trees.R |
3/16/23 |
HTF Ch. 10.1 |
none |
Boosting and AdaBoost.M1 (part 1; continued) & Midterm review |
lecture-22.pdf boosting-trees.R |
3/21/23 |
none |
none |
Class cancelled for ENAR |
|
3/23/22 |
HTF Ch. 10.2-10.9 |
none |
Boosting and AdaBoost.M1 (part 2) |
lecture-23.pdf |
Homework/Laboratory (other than problems listed in HTF)
Homework assignments should be completed in a
GitHub repository using the R language (unless otherwise noted). Make sure to add the TA(s) as collaborators on your repo. Any reproducible format that renders natively in Github is acceptable. In Rmarkdown, using the 'github_document' or 'md_document' output type in the header will produce a markdown (.md) file that can be rendered within Github, e.g.
---
title: "Homework 1"
author: Student
date: January 15, 2023
output: github_document
---
Make sure to include the raw code (.Rmd), the rendered file (.md), and any plots in your repo. Jupyter notebooks (.ipynb) using the R language are also ok.
Resubmissions are only allowed if the initial submission was made on time. A resubmission is due within one week of receiving feedback from TA and there is a maximum of 2 resubmissions for each assignment.
Homework 1
Using the RMarkdown/knitr/github mechanism, implement the following tasks by extending the example R script
mixture-data-lin-knn.R:
- Paste the code from the mixture-data-lin-knn.R file into the homework template Knitr document.
- Read the help file for R's built-in linear regression function lm
- Re-write the functions fit_lc and predict_lc using lm, and the associated predict method for lm objects.
- Consider making the linear classifier more flexible, by adding squared terms for x1 and x2 to the linear model
- Describe how this more flexible model affects the bias-variance tradeoff
Homework 2
Using the RMarkdown/knitr/github mechanism, implement the following tasks by extending the example R script (
prostate-data-lin.R):
- Write functions that implement the L1 loss and tilted absolute loss functions.
- Create a figure that shows lpsa (x-axis) versus lcavol (y-axis). Add and label (using the 'legend' function) the linear model predictors associated with L2 loss, L1 loss, and tilted absolute value loss for tau = 0.25 and 0.75.
- Write functions to fit and predict from a simple nonlinear model with three parameters defined by 'beta[1] + beta[2]*exp(-beta[3]*x)'. Hint: make copies of 'fit_lin' and 'predict_lin' and modify them to fit the nonlinear model. Use c(-1.0, 0.0, -0.3) as 'beta_init'.
- Create a figure that shows lpsa (x-axis) versus lcavol (y-axis). Add and label (using the 'legend' function) the nonlinear model predictors associated with L2 loss, L1 loss, and tilted absolute value loss for tau = 0.25 and 0.75.
Homework 3
Using the RMarkdown/knitr/github mechanism, implement the following tasks:
- Use the prostate cancer data.
- Use the
cor
function to reproduce the correlations listed in HTF Table 3.1, page 50.
- Treat
lcavol
as the outcome, and use all other variables in the data set as predictors.
- With the training subset of the prostate data, train a least-squares regression model with all predictors using the
lm
function.
- Use the testing subset to compute the test error (average squared-error loss) using the fitted least-squares regression model.
- Train a ridge regression model using the
glmnet
function, and tune the value of lambda
(i.e., use guess and check to find the value of lambda
that approximately minimizes the test error).
- Create a figure that shows the training and test error associated with ridge regression as a function of
lambda
- Create a path diagram of the ridge regression analysis, similar to HTF Figure 3.8
Homework 4
- Complete HTF exercises 7.4 and 7.6
- This homework should be submitted using the Github mechanism. However, you may complete the homework on paper and scan an image to upload. Or, you may use the LaTeX -style markup in an RMarkdown document.
Links
RStudio/Knitr