This page is for a past edition of this course. To view the current course description, visit https://www.vanderbilt.edu/biostatistics-graduate/course-descriptions-2023-2024/

It will be taught in Spring 2023 by Amir Asiaee.

BIOS 362: Advanced Statistical Learning & Inference

Instructor

Teaching Assistants

Dates, Time, and Location

  • First meeting: Tue. Jan. 10, 2023; Last meeting: Thu. Apr. 20, 2022
  • Tuesday, Thursday 10:30AM-12:00PM
  • Location: 111139, 11th floor, 2525 West End Ave. Nashville, TN
  • Office hours: By appointment.
  • We will use the Graduate School Academic Calendar
  • We will not have classes the week of March 13-17, 2022, for spring break.
  • ENAR occurs March 19-22, thus we will not meet Tue March 21.

Textbook

The book for this course is listed below, and free to download in PDF format at the book webpage: Hastie, Tibshirani, Friedman. (2009) The elements of statistical learning: data mining, inference and prediction. Springer, 2nd edition.. In the course outline and class schedule, the textbook is abbreviated "HTF", often followed by chapter or page references "Ch. X-Y" or "pp. X-Y", respectively. The BibTeX entry for the book is as follows:
@book{HTF2009,
  author = {Hastie, Trevor and Tibshirani, Robert and Friedman, Jerome},
  title = {The elements of statistical learning: data mining, inference and prediction},
  url = {http://www-stat.stanford.edu/~tibs/ElemStatLearn/},
  publisher = {Springer},
  year = 2009,
  edition = 2
}

The wide margins of the linked PDF version of the book make it difficult to read on smart devices (e.g., an iPhone). The margins may be removed using the following ghostscript command in Linux, where "output.pdf" and "input.pdf" are substituted for the appropriate file names. Please see Dr. Shotwell for help with this.
gs -o output.pdf -sDEVICE=pdfwrite -c "[/CropBox [130 140 460 685] /PAGES pdfmark" -f input.pdf

Other Resources

Course Topics

  • Overview of Supervised Learning and Review of Linear Methods: HTF Ch. 2-4
  • Splines and Kernel Methods: HTF Ch. 5-6
  • Model Assessment, Selection, and Inference: HTF Ch. 7-8
  • Neural Networks: HTF Ch. 11
  • Support Vector Machines: HTF Ch. 12
  • Unsupervised Learning: HTF Ch. 14

Other information

  • Unless otherwise stated, assigned homework is due in one week.
  • Students are encouraged to work together on homework problems, but they must turn in their own write-ups.
  • Class participation is encouraged.
  • Please bring a laptop to class.

Grading

  • Homework: 40%
  • Take-home Midterm Exam: 30%
  • Take-home Final Exam: 30%

Schedule of Topics

Date Reading (before class) Homework Topic/Content Presentation
1/10/23 none none Syllabus, introduction Intro.pdf
1/12/23 HTF Ch. 1 and Ch. 2.1, 2.2, and 2.3 See below: Homework 1 Least-squares, nearest-neighbors lecture-1.pdf mixture-data-lin-knn.R
1/17/23 HTF Ch. 2.4 none Decision theory lecture-2.pdf
1/19/23 none none Loss functions in practice lecture-2a.pdf prostate-data-lin.R
1/24/23 HTF Ch. 2.7, 2.8, and 2.9 See below: Homework 2 Structured regression lecture-3.pdf ex-1.R ex-2.R ex-3.R
1/26/23 HTF Ch. 3.1, 3.2, 3.3, 3.4 none Linear methods, subset selection, ridge, and lasso lecture-4a.pdf linear-regression-examples.R lecture-5.pdf lasso-example.R
1/31/23 none none Linear methods for regression (cont.) Suggested supplemental reading: HTF Ch. 3.6, 3.7, 3.8, and 3.9. Suggested supplemental exercises: Ex. 3.12, 3.18
2/2/23 HTF Ch. 3.5 and 3.6 See below: Homework 3 Linear methods: principal components regression lecture-6.pdf pca-regression-example.R lec7.pdf lec8.pdf pca-and-g-inverses.html
2/7/23 HTF Ch. 4.1, 4.2, and 4.3 none Linear methods: Linear discriminant analysis lecture-8.pdf simple-LDA-3D.R
2/9/23 HTF Ch. 5.1 and 5.2 none Basis expansions: piecewise polynomials & splines lecture-11.pdf splines-example.R mixture-data-complete.R
2/14/23 HTF Ch. 6.1-6.5 none Kernel methods lecture-13.pdf mixture-data-knn-local-kde.R kernel-methods-examples-mcycle.R
2/16/23 HTF Ch. 7.1, 7.2, 7.3, 7.4 none Model assessment: Cp, AIC, BIC lecture-14.pdf effective-df-aic-bic-mcycle.R
2/21/23 HTF Ch. 7.10 See below: Homework 4 Cross validation lecture-15.pdf kNN-CV.R Income2.csv
2/23/23 HTF Ch. 9.2 none Classification and Regression Trees lecture-21.pdf mixture-data-rpart.R
2/28/23 HTF Ch. 8.7, 8.8, 8.9 none Bagging lecture-18.pdf mixture-data-rpart-bagging.R nonlinear-bagging.html
3/2/23 HTF Ch. 15.1, 15.2 none Random Forest lecture-25.pdf random-forest-example.R
3/7/23 none none Random Forest (cont.) lecture-25.pdf random-forest-example.R
3/9/23 none none Class cancelled  
3/14/23 HTF Ch. 10.1 none Boosting and AdaBoost.M1 (part 1) lecture-22.pdf boosting-trees.R
3/16/23 HTF Ch. 10.1 none Boosting and AdaBoost.M1 (part 1; continued) & Midterm review lecture-22.pdf boosting-trees.R
3/21/23 none none Class cancelled for ENAR  
3/23/23 HTF Ch. 10.2-10.9 none Boosting and AdaBoost.M1 (part 2) lecture-23.pdf
3/28/23 none none Midterm discussion  
3/30/23 HTF Ch. 10.10, 10.13 none Boosting and AdaBoost.M1 (part 3) lecture-24.pdf gradient-boosting-example.R
4/4/23 HTF Ch. 11.1, 11.2, 11.3, 11.4, 11.5 none Introduction to Neural networks lecture-31.pdf nnet.R
4/6/23 HTF Ch. 11.1, 11.2, 11.3, 11.4, 11.5 none Introduction to Neural networks (cont.) lecture-31.pdf nnet.R
4/11/23 HTF Ch. 14.5 none Principal curves and surfaces lecture-28.pdf principal-curves.R
4/13/23 HTF 14.8 none Multidimensional scaling lecture-30.pdf MDS-examples.R
4/18/23 HTF 14.5.3 none k-means, hierarchical, and spectral clustering lecture-29.pdf spectral-clustering.R
4/20/23 none none Clustering with mixtures lecture-32.pdf normal-mixture-examples.R

Homework/Laboratory (other than problems listed in HTF)

Homework assignments should be completed in a GitHub repository using the R language (unless otherwise noted). Make sure to add the TA(s) as collaborators on your repo. Any reproducible format that renders natively in Github is acceptable. In Rmarkdown, using the 'github_document' or 'md_document' output type in the header will produce a markdown (.md) file that can be rendered within Github, e.g.

---
title: "Homework 1"
author: Student
date: January 15, 2023
output: github_document
---

Make sure to include the raw code (.Rmd), the rendered file (.md), and any plots in your repo. Jupyter notebooks (.ipynb) using the R language are also ok.

Resubmissions are only allowed if the initial submission was made on time. A resubmission is due within one week of receiving feedback from TA and there is a maximum of 2 resubmissions for each assignment.

Homework 1

Using the RMarkdown/knitr/github mechanism, implement the following tasks by extending the example R script mixture-data-lin-knn.R:

  • Paste the code from the mixture-data-lin-knn.R file into the homework template Knitr document.
  • Read the help file for R's built-in linear regression function lm
  • Re-write the functions fit_lc and predict_lc using lm, and the associated predict method for lm objects.
  • Consider making the linear classifier more flexible, by adding squared terms for x1 and x2 to the linear model
  • Describe how this more flexible model affects the bias-variance tradeoff

Homework 2

Using the RMarkdown/knitr/github mechanism, implement the following tasks by extending the example R script ( prostate-data-lin.R):

  • Write functions that implement the L1 loss and tilted absolute loss functions.
  • Create a figure that shows lpsa (x-axis) versus lcavol (y-axis). Add and label (using the 'legend' function) the linear model predictors associated with L2 loss, L1 loss, and tilted absolute value loss for tau = 0.25 and 0.75.
  • Write functions to fit and predict from a simple nonlinear model with three parameters defined by 'beta[1] + beta[2]*exp(-beta[3]*x)'. Hint: make copies of 'fit_lin' and 'predict_lin' and modify them to fit the nonlinear model. Use c(-1.0, 0.0, -0.3) as 'beta_init'.
  • Create a figure that shows lpsa (x-axis) versus lcavol (y-axis). Add and label (using the 'legend' function) the nonlinear model predictors associated with L2 loss, L1 loss, and tilted absolute value loss for tau = 0.25 and 0.75.

Homework 3

Using the RMarkdown/knitr/github mechanism, implement the following tasks:
  • Use the prostate cancer data.
  • Use the cor function to reproduce the correlations listed in HTF Table 3.1, page 50.
  • Treat lcavol as the outcome, and use all other variables in the data set as predictors.
  • With the training subset of the prostate data, train a least-squares regression model with all predictors using the lm function.
  • Use the testing subset to compute the test error (average squared-error loss) using the fitted least-squares regression model.
  • Train a ridge regression model using the glmnet function, and tune the value of lambda (i.e., use guess and check to find the value of lambda that approximately minimizes the test error).
  • Create a figure that shows the training and test error associated with ridge regression as a function of lambda
  • Create a path diagram of the ridge regression analysis, similar to HTF Figure 3.8

Homework 4

  • Complete HTF exercises 7.4 and 7.6
  • This homework should be submitted using the Github mechanism. However, you may complete the homework on paper and scan an image to upload. Or, you may use the LaTeX -style markup in an RMarkdown document.

Homework 5

Goal: Get started using Keras to construct simple neural networks

  1. Read through the "Image Classification" tutorial on the RStudio Keras website.
  2. Use the Keras library to create a convolutional neural network similar to (or more sophisticated than) "Net-5" described during lecture on 4/4 and also described in the ESL book section 11.7. See the ConvNet tutorial on the RStudio Keras website.
  3. Fit the CNN to the zipcode data from the authors website and create a figure similar to that from the slides that shows test error as a function of training epochs.

Links

RStudio/Knitr

Topic attachments
I Attachment Action Size Date Who Comment
2016-midterm-phoneme.RR 2016-midterm-phoneme.R manage 3.7 K 25 Mar 2016 - 08:33 MattShotwell Code for solution to 2016 midterm.
HW10.pdfpdf HW10.pdf manage 44.3 K 09 Mar 2015 - 09:51 MattShotwell Homework 10
Income2.csvcsv Income2.csv manage 1.6 K 16 Mar 2021 - 08:40 MattShotwell  
Intro.pdfpdf Intro.pdf manage 781.9 K 10 Jan 2023 - 08:46 MattShotwell  
LA_Examples_DS_Bootcamp.htmlhtml LA_Examples_DS_Bootcamp.html manage 2374.0 K 05 Feb 2020 - 11:04 MattShotwell  
LAozone.RR LAozone.R manage 3.1 K 14 Mar 2018 - 10:57 MattShotwell  
LagrangeMultipliers-Bishop-PatternRecognitionMachineLearning.pdfpdf LagrangeMultipliers-Bishop-PatternRecognitionMachineLearning.pdf manage 1574.4 K 06 Apr 2016 - 17:53 MattShotwell Lagrange Multipliers; Bishop; Pattern Recognition and Machine Learning
MCB-20121115.pdfpdf MCB-20121115.pdf manage 676.5 K 17 Dec 2014 - 10:32 MattShotwell The Matrix Cookbook (version 15 November 2012)
MDS-examples.RR MDS-examples.R manage 2.0 K 15 Apr 2020 - 09:51 MattShotwell  
airquality-EM-mixture.RR airquality-EM-mixture.R manage 2.2 K 11 Apr 2016 - 10:59 MattShotwell EM algorithm with finite normal mixture
airquality-agnes.RR airquality-agnes.R manage 1.3 K 13 Apr 2016 - 11:22 MattShotwell [Ag]glomerative [nes]ting (clustering) with airquality data
boosting-trees.RR boosting-trees.R manage 5.5 K 01 Apr 2021 - 09:00 MattShotwell Boosting a tree stump with the AdaBoost.M1 algorithm
bootstrap-calibration.RR bootstrap-calibration.R manage 3.2 K 23 Feb 2018 - 11:43 MattShotwell  
df-stepwise.RDataRData df-stepwise.RData manage 2.5 K 17 Feb 2016 - 16:43 MattShotwell  
df-stepwise.RmdRmd df-stepwise.Rmd manage 5.5 K 12 Feb 2017 - 20:50 MattShotwell  
df-stepwise.htmlhtml df-stepwise.html manage 737.8 K 12 Feb 2017 - 20:50 MattShotwell  
effective-df-aic-bic-mcycle.RR effective-df-aic-bic-mcycle.R manage 3.9 K 09 Mar 2020 - 10:58 MattShotwell  
gradient-boosting-example.RR gradient-boosting-example.R manage 8.5 K 08 Apr 2021 - 08:44 MattShotwell  
kNN-CV.RR kNN-CV.R manage 4.0 K 16 Mar 2021 - 08:39 MattShotwell  
kernel-manipulate-example.RR kernel-manipulate-example.R manage 1.2 K 15 Jan 2020 - 10:20 MattShotwell  
kernel-methods-examples-mcycle.RR kernel-methods-examples-mcycle.R manage 3.6 K 24 Feb 2020 - 10:32 MattShotwell  
lab1.pdfpdf lab1.pdf manage 226.6 K 12 Jan 2015 - 14:04 GuanhuaChen BIOS362_lab1
lab2.pdfpdf lab2.pdf manage 1901.4 K 21 Jan 2015 - 11:20 GuanhuaChen slides from Dr. Jojic (UNC)'s Machine learning class
lasso-example.RR lasso-example.R manage 5.4 K 23 Feb 2021 - 08:42 MattShotwell  
lecture-1.pdfpdf lecture-1.pdf manage 408.0 K 12 Jan 2023 - 08:37 MattShotwell  
lecture-10.RmdRmd lecture-10.Rmd manage 3.8 K 10 Feb 2020 - 11:01 MattShotwell  
lecture-10.pdfpdf lecture-10.pdf manage 170.9 K 10 Feb 2020 - 11:17 MattShotwell  
lecture-11.pdfpdf lecture-11.pdf manage 285.1 K 12 Feb 2020 - 08:42 MattShotwell  
lecture-12.pdfpdf lecture-12.pdf manage 473.8 K 26 Feb 2020 - 10:48 MattShotwell  
lecture-13.pdfpdf lecture-13.pdf manage 363.7 K 14 Feb 2023 - 11:43 MattShotwell  
lecture-14.pdfpdf lecture-14.pdf manage 382.7 K 09 Mar 2020 - 10:22 MattShotwell  
lecture-15.pdfpdf lecture-15.pdf manage 354.6 K 16 Mar 2021 - 08:36 MattShotwell  
lecture-16.pdfpdf lecture-16.pdf manage 240.5 K 28 Feb 2020 - 10:50 MattShotwell  
lecture-17.pdfpdf lecture-17.pdf manage 372.7 K 20 Feb 2019 - 11:05 MattShotwell  
lecture-18.pdfpdf lecture-18.pdf manage 190.7 K 28 Feb 2018 - 10:43 MattShotwell  
lecture-2.pdfpdf lecture-2.pdf manage 243.6 K 17 Jan 2023 - 08:52 MattShotwell  
lecture-20.pdfpdf lecture-20.pdf manage 143.4 K 14 Mar 2018 - 10:57 MattShotwell  
lecture-21.pdfpdf lecture-21.pdf manage 456.8 K 20 Mar 2020 - 09:10 MattShotwell  
lecture-22.pdfpdf lecture-22.pdf manage 499.9 K 27 Mar 2020 - 10:33 MattShotwell  
lecture-23.pdfpdf lecture-23.pdf manage 292.3 K 30 Mar 2020 - 09:50 MattShotwell  
lecture-24.pdfpdf lecture-24.pdf manage 494.2 K 01 Apr 2020 - 09:45 MattShotwell  
lecture-25.pdfpdf lecture-25.pdf manage 410.4 K 25 Mar 2020 - 12:54 MattShotwell  
lecture-26.pdfpdf lecture-26.pdf manage 569.1 K 27 Mar 2019 - 11:01 MattShotwell  
lecture-27.pdfpdf lecture-27.pdf manage 190.7 K 29 Mar 2019 - 10:59 MattShotwell  
lecture-28.pdfpdf lecture-28.pdf manage 330.7 K 10 Apr 2020 - 10:58 MattShotwell  
lecture-29.pdfpdf lecture-29.pdf manage 955.6 K 13 Apr 2020 - 09:34 MattShotwell  
lecture-2a.pdfpdf lecture-2a.pdf manage 97.9 K 19 Jan 2023 - 07:22 MattShotwell  
lecture-3.pdfpdf lecture-3.pdf manage 564.7 K 24 Jan 2023 - 08:54 MattShotwell  
lecture-30.pdfpdf lecture-30.pdf manage 626.1 K 15 Apr 2020 - 10:04 MattShotwell  
lecture-31.pdfpdf lecture-31.pdf manage 4059.4 K 08 Apr 2020 - 10:17 MattShotwell  
lecture-32.pdfpdf lecture-32.pdf manage 165.5 K 17 Apr 2020 - 10:03 MattShotwell  
lecture-4.pdfpdf lecture-4.pdf manage 175.7 K 14 Jan 2019 - 10:58 MattShotwell  
lecture-4a.pdfpdf lecture-4a.pdf manage 152.8 K 17 Jan 2020 - 10:11 MattShotwell  
lecture-5.pdfpdf lecture-5.pdf manage 578.2 K 22 Jan 2020 - 10:18 MattShotwell  
lecture-6.pdfpdf lecture-6.pdf manage 97.9 K 25 Feb 2021 - 09:04 MattShotwell  
lecture-7.pdfpdf lecture-7.pdf manage 136.6 K 23 Jan 2019 - 11:18 MattShotwell  
lecture-8.pdfpdf lecture-8.pdf manage 596.0 K 31 Jan 2020 - 10:18 MattShotwell  
lecture-9.pdfpdf lecture-9.pdf manage 1199.6 K 05 Feb 2020 - 11:05 MattShotwell  
linear-regression-examples.RR linear-regression-examples.R manage 5.2 K 15 Feb 2021 - 21:09 MattShotwell  
linear-spline-manipulate-example.RR linear-spline-manipulate-example.R manage 1.2 K 15 Jan 2020 - 10:20 MattShotwell  
mLR-delta.RmdRmd mLR-delta.Rmd manage 4.7 K 12 Feb 2020 - 09:41 MattShotwell  
medExtractR_lecture.pdfpdf medExtractR_lecture.pdf manage 5878.6 K 27 Feb 2020 - 14:26 HannahWeeks medExtractR_lecture
mixture-data-complete.RR mixture-data-complete.R manage 5.7 K 10 Feb 2015 - 09:12 MattShotwell splines regression, local regression, and kernel density classification of the mixture data
mixture-data-knn-local-kde.RR mixture-data-knn-local-kde.R manage 8.4 K 09 Mar 2021 - 10:27 MattShotwell  
mixture-data-knn-local.RR mixture-data-knn-local.R manage 4.7 K 17 Jan 2018 - 10:25 MattShotwell  
mixture-data-lin-knn.RR mixture-data-lin-knn.R manage 4.2 K 12 Jan 2023 - 10:10 MattShotwell  
mixture-data-rpart-bagging.RR mixture-data-rpart-bagging.R manage 3.7 K 23 Mar 2020 - 07:44 MattShotwell  
mixture-data-rpart.RR mixture-data-rpart.R manage 2.6 K 23 Mar 2021 - 08:53 MattShotwell  
mixture-data-svm.RR mixture-data-svm.R manage 3.3 K 07 Apr 2017 - 12:31 MattShotwell SVM with mixture data; 3D graphic
mixture-data.RR mixture-data.R manage 2.1 K 30 Jan 2015 - 09:31 MattShotwell Lab 3; demo code for mixture data
mnist-convnet.RR mnist-convnet.R manage 2.6 K 08 Apr 2019 - 11:09 MattShotwell  
multivariate-KDE.htmlhtml multivariate-KDE.html manage 862.9 K 24 Feb 2020 - 10:31 MattShotwell  
nlls_v2.RR nlls_v2.R manage 3.2 K 19 Jan 2018 - 10:47 MattShotwell  
nnet.RR nnet.R manage 3.0 K 05 Apr 2020 - 16:09 MattShotwell  
nonlinear-bagging.csvcsv nonlinear-bagging.csv manage 0.5 K 29 Feb 2016 - 11:09 MattShotwell nonlinear bagging example data
nonlinear-bagging.htmlhtml nonlinear-bagging.html manage 656.0 K 23 Mar 2020 - 10:46 MattShotwell  
normal-mixture-examples.RR normal-mixture-examples.R manage 1.8 K 17 Apr 2020 - 10:03 MattShotwell  
pca-and-g-inverses.RmdRmd pca-and-g-inverses.Rmd manage 2.4 K 03 Feb 2020 - 08:43 MattShotwell  
pca-and-g-inverses.htmlhtml pca-and-g-inverses.html manage 1507.4 K 03 Feb 2020 - 08:43 MattShotwell  
pca-regression-example.RR pca-regression-example.R manage 3.4 K 25 Feb 2021 - 08:28 MattShotwell  
presentation.pdfpdf presentation.pdf manage 333.3 K 11 Mar 2019 - 10:41 MattShotwell  
principal-curves.RR principal-curves.R manage 3.7 K 10 Apr 2020 - 10:09 MattShotwell Principal curves example
prostate-data-lin.RR prostate-data-lin.R manage 2.5 K 31 Jan 2023 - 07:51 MattShotwell  
prostate.RR prostate.R manage 2.9 K 26 Jan 2015 - 09:45 MattShotwell least-squares, ridge, and principal components regression with prostate data.
random-forest-example.RR random-forest-example.R manage 1.1 K 30 Mar 2021 - 08:51 MattShotwell  
simple-LDA-3D.RR simple-LDA-3D.R manage 2.7 K 31 Jan 2020 - 10:18 MattShotwell  
simple-neural-network.RR simple-neural-network.R manage 3.8 K 28 Mar 2017 - 09:43 MattShotwell Neural network with one hidden layer, 20 units, fully connected
smooth-splines-manipulate-example.RR smooth-splines-manipulate-example.R manage 1.0 K 15 Jan 2020 - 10:20 MattShotwell  
smoothing-splines-example.RR smoothing-splines-example.R manage 1.1 K 26 Feb 2020 - 08:22 MattShotwell  
spectral-clustering.RR spectral-clustering.R manage 3.1 K 13 Apr 2020 - 09:34 MattShotwell  
sphered-and-canonical-inputs.RR sphered-and-canonical-inputs.R manage 6.3 K 07 Feb 2020 - 11:24 MattShotwell  
splines-example.RR splines-example.R manage 3.9 K 09 Feb 2023 - 11:45 MattShotwell Splines example
vowel-data-LR.RmdRmd vowel-data-LR.Rmd manage 3.2 K 10 Feb 2020 - 12:19 MattShotwell  
yuying_1.pdfpdf yuying_1.pdf manage 953.4 K 13 Feb 2015 - 15:30 GuanhuaChen  
yuying_2.pdfpdf yuying_2.pdf manage 2283.6 K 13 Feb 2015 - 15:31 GuanhuaChen  
Topic revision: r404 - 04 Dec 2023, PegDuthie
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback