BIOS 362: Advanced Statistical Inference (Statistical Learning)

Instructor

Teaching Assistants

Dates, Time, and Location

  • First meeting: Tue. Jan. 26, 2021; Last meeting: Thu. Apr. 29, 2021
  • Tuesday, Thursday 9:00AM-10:30AM
  • Virtually using Zoom via Brightspace
  • Office hours: By appointment, initially. Will determine a regular schedule. * We will use the Graduate School Academic Calendar

Textbook

The book for this course is listed below, and free to download in PDF format at the book webpage: Hastie, Tibshirani, Friedman. (2009) The elements of statistical learning: data mining, inference and prediction. Springer, 2nd edition.. In the course outline and class schedule, the textbook is abbreviated "HTF", often followed by chapter or page references "Ch. X-Y" or "pp. X-Y", respectively. The BibTeX entry for the book is as follows:

@book{HTF2009,
  author = {Hastie, Trevor and Tibshirani, Robert and Friedman, Jerome},
  title = {The elements of statistical learning: data mining, inference and prediction},
  url = {http://www-stat.stanford.edu/~tibs/ElemStatLearn/},
  publisher = {Springer},
  year = 2009,
  edition = 2
}

The wide margins of the linked PDF version of the book make it difficult to read on smart devices (e.g., an iPhone). The margins may be removed using the following ghostscript command in Linux, where "output.pdf" and "input.pdf" are substituted for the appropriate file names. Please see Dr. Shotwell for help with this.

gs -o output.pdf -sDEVICE=pdfwrite -c "[/CropBox [130 140 460 685] /PAGES pdfmark" -f input.pdf

Other Resources

Course Topics

  • Overview of Supervised Learning and Review of Linear Methods: HTF Ch. 2-4
  • Splines and Kernel Methods: HTF Ch. 5-6
  • Model Assessment, Selection, and Inference: HTF Ch. 7-8
  • Neural Networks: HTF Ch. 11
  • Support Vector Machines: HTF Ch. 12
  • Unsupervised Learning: HTF Ch. 14

Other information

  • Unless otherwise stated, assigned homework is due in one week.
  • Students are encouraged to work together on homework problems, but they must turn in their own write-ups.
  • Class participation is encouraged.
  • Please bring a laptop to class.

Grading

  • Homework: 40%
  • Take-home Midterm Exam: 30%
  • Take-home Final Exam: 30%

Schedule of Topics

Date Reading (before class) Homework Topic/Content Presentation
Tue. 1/26 none none Syllabus, introduction Intro.pdf
Thu. 1/28 HTF Ch. 1 and Ch. 2.1, 2.2, and 2.3 See below: Thu. 1/28 Least-squares, nearest-neighbors lecture-1.pdf mixture-data-lin-knn.R
Tue. 2/2 none none Least-squares, nearest-neighbors code mixture-data-lin-knn.R
Thu. 2/4 HTF Ch. 2.4 none Decision theory lecture-2.pdf
Tue. 2/9 none See below: Tue. 2/9 Loss functions in practice lecture-2a.pdf prostate-data-lin.R
Thu. 2/11 HTF Ch. 2.7, 2.8, and 2.9 none Structured regression lecture-3.pdf ex-1.R ex-2.R ex-3.R
Tue. 2/16 HTF Ch. 3.1, 3.2, 3.3, 3.4 none Linear methods, subset selection, ridge, and lasso lecture-4a.pdf linear-regression-examples.R lecture-5.pdf lasso-example.R
Thu. 2/18 none See below: Tue. 2/18 No Class Reading day focused on linear methods for regression. Suggested supplemental reading: HTF Ch. 3.6, 3.7, 3.8, and 3.9. Suggested supplemental exercises: Ex. 3.12, 3.18
Tue. 2/23 none none Linear methods, subset selection, ridge, and lasso (cont.) lecture-5.pdf lasso-example.R
Thu. 2/25 HTF Ch. 3.5 and 3.6 none Linear methods: principal components regression lecture-6.pdf pca-regression-example.R lec7.pdf lec8.pdf pca-and-g-inverses.html
Tue. 3/2 HTF Ch. 4.1, 4.2, and 4.3 See below: Tue. 3/2 Linear methods: Linear discriminant analysis lecture-8.pdf simple-LDA-3D.R
Thu. 3/4 HTF Ch. 5.1 and 5.2 none Basis expansions: piecewise polynomials & splines lecture-11.pdf splines-example.R mixture-data-complete.R
Tue. 3/9 HTF Ch. 6.1-6.5 none Kernel methods lecture-13.pdf mixture-data-knn-local-kde.R kernel-methods-examples-mcycle.R
Thu. 3/11 HTF Ch. 7.1, 7.2, 7.3, 7.4 See below: Thu. 3/11 Model assessment: Cp, AIC, BIC lecture-14.pdf effective-df-aic-bic-mcycle.R
Tue. 3/16 HTF Ch. 7.10 none Cross validation lecture-15.pdf kNN-CV.R Income2.csv
Thu. 3/18 none none Midterm Review none
Tue. 3/23 HTF Ch. 9.2 none Classification and Regression Trees lecture-21.pdf mixture-data-rpart.R
Thu. 3/25 HTF Ch. 8.7, 8.8, 8.9 none Bagging lecture-18.pdf mixture-data-rpart-bagging.R nonlinear-bagging.html
Tue. 3/30 HTF Ch. 15.1, 15.2 Tue. 3/30 (below) Random Forest lecture-25.pdf random-forest-example.R
Thu. 4/1 HTF Ch. 10.1 none Boosting and AdaBoost.M1 (part 1) lecture-22.pdf boosting-trees.R
Tue. 4/6 HTF Ch. 10.2-10.9 Work through this nice GBM tutorial Boosting and AdaBoost.M1 (part 2) lecture-23.pdf
Thu. 4/8 HTF Ch. 10.10, 10.13 none Boosting and AdaBoost.M1 (part 3) lecture-24.pdf gradient-boosting-example.R
Tue. 4/13 HTF Ch. 11.1, 11.2, 11.3, 11.4, 11.5 none Introduction to Neural networks lecture-31.pdf nnet.R
Thu. 4/15 HTF Ch. 11.1, 11.2, 11.3, 11.4, 11.5 Thu. 4/14 (below) Introduction to Neural networks (cont.) lecture-31.pdf nnet.R
Tue. 4/20 HTF Ch. 14.5 none Principal curves and surfaces lecture-28.pdf principal-curves.R
Thu. 4/22 HTF 14.8 none Multidimensional scaling lecture-30.pdf MDS-examples.R
Tue. 4/27 HTF 14.5.3 none k-means, hierarchical, and spectral clustering lecture-29.pdf spectral-clustering.R
Thu. 4/29 none none Distribute final exam. Last day of class  

Homework/Laboratory (other than problems listed in HTF)

Thu. 1/28

Using the RMarkdown/knitr/github mechanism, implement the following tasks by extending the example R script mixture-data-lin-knn.R:

  • Paste the code from the mixture-data-lin-knn.R file into the homework template Knitr document.
  • Read the help file for R's built-in linear regression function lm
  • Re-write the functions fit_lc and predict_lc using lm, and the associated predict method for lm objects.
  • Consider making the linear classifier more flexible, by adding squared terms for x1 and x2 to the linear model
  • Describe how this more flexible model affects the bias-variance tradeoff

Tue. 2/9

Using the RMarkdown/knitr/github mechanism, implement the following tasks by extending the example R script ( prostate-data-lin.R):

  • Write functions that implement the L1 loss and tilted absolute loss functions.
  • Create a figure that shows lpsa (x-axis) versus lcavol (y-axis). Add and label (using the 'legend' function) the linear model predictors associated with L2 loss, L1 loss, and tilted absolute value loss for tau = 0.25 and 0.75.
  • Write functions to fit and predict from a simple nonlinear model with three parameters defined by 'beta[1] + beta[2]*exp(-beta[3]*x)'. Hint: make copies of 'fit_lin' and 'predict_lin' and modify them to fit the nonlinear model. Use c(-1.0, 0.0, -0.3) as 'beta_init'.
  • Create a figure that shows lpsa (x-axis) versus lcavol (y-axis). Add and label (using the 'legend' function) the nonlinear model predictors associated with L2 loss, L1 loss, and tilted absolute value loss for tau = 0.25 and 0.75.

Tue. 2/18

Using the RMarkdown/knitr/github mechanism, implement the following tasks:
  • Use the prostate cancer data.
  • Use the cor function to reproduce the correlations listed in HTF Table 3.1, page 50.
  • Treat lcavol as the outcome, and use all other variables in the data set as predictors.
  • With the training subset of the prostate data, train a least-squares regression model with all predictors using the lm function.
  • Use the testing subset to compute the test error (average squared-error loss) using the fitted least-squares regression model.
  • Train a ridge regression model using the glmnet function, and tune the value of lambda (i.e., use guess and check to find the value of lambda that approximately minimizes the test error).
  • Create a figure that shows the training and test error associated with ridge regression as a function of lambda
  • Create a path diagram of the ridge regression analysis, similar to HTF Figure 3.8

Tue. 3/2

Goal: Understand and implement reduced rank LDA in R. This homework covers new material that we will not cover in class.

Using the RMarkdown/knitr/github mechanism, implement the following tasks:
  • Retrieve the vowel data (training and testing) from the HTF website or R package.
  • Review HTF section 4.3.3 and (optionally): LA Examples and example.R
  • Implement reduced-rank LDA using the vowel training data. Check your work by plotting the first two discriminant variables as in HTF Figure 4.4. Hint: Center the 10 training predictors before implementing LDA. See built-in R function ’scale’. The singular value or Eigen decompositions may be computed using the built-in R functions ’svd’ or ’eigen’, respectively.
  • Use the vowel testing data to estimate the expected prediction error (assuming zero-one loss), varying the number of canonical variables used for classification.
  • Plot the EPE as a function of the number of discriminant variables, and compare this with HTF Figure 4.10.
  • (Optional) Reproduce HTF Figure 4.11. Note: The reproduction need not be exact. However, the information content should be preserved.

Thu. 3/11

  • Complete HTF exercises 7.4 and 7.6
  • This homework should be submitted using the Github mechanism. However, you may complete the homework on paper and scan an image to upload. Or, you may use the LaTeX-style markup in an RMarkdown document.

Tue. 3/30

  • Complete HTF exercise Ex. 15.4.

Thu. 4/14

Goal: Get started using Keras to construct simple neural networks

  1. Work through the "Image Classification" tutorial on the RStudio Keras website.
  2. Use the Keras library to re-implement the simple neural network discussed during lecture for the mixture data (see nnet.R). Use a single 10-node hidden layer; fully connected.
  3. Create a figure to illustrate that the predictions are (or are not) similar using the 'nnet' function versus the Keras model.
  4. (optional extra credit) Convert the neural network described in the "Image Classification" tutorial to a network that is similar to one of the convolutional networks described during lecture on 4/15 (i.e., Net-3, Net-4, or Net-5) and also described in the ESL book section 11.7. See the !ConvNet tutorial on the RStudio Keras website.

Links

RStudio/Knitr

Topic attachments
I Attachment Action Size Date Who Comment
2016-midterm-phoneme.RR 2016-midterm-phoneme.R manage 3.7 K 25 Mar 2016 - 08:33 MattShotwell Code for solution to 2016 midterm.
HW10.pdfpdf HW10.pdf manage 44.3 K 09 Mar 2015 - 09:51 MattShotwell Homework 10
Income2.csvcsv Income2.csv manage 1.6 K 16 Mar 2021 - 08:40 MattShotwell  
Intro.pdfpdf Intro.pdf manage 781.2 K 06 Jan 2020 - 08:21 MattShotwell  
LA_Examples_DS_Bootcamp.htmlhtml LA_Examples_DS_Bootcamp.html manage 2374.0 K 05 Feb 2020 - 11:04 MattShotwell  
LAozone.RR LAozone.R manage 3.1 K 14 Mar 2018 - 10:57 MattShotwell  
LagrangeMultipliers-Bishop-PatternRecognitionMachineLearning.pdfpdf LagrangeMultipliers-Bishop-PatternRecognitionMachineLearning.pdf manage 1574.4 K 06 Apr 2016 - 17:53 MattShotwell Lagrange Multipliers; Bishop; Pattern Recognition and Machine Learning
MCB-20121115.pdfpdf MCB-20121115.pdf manage 676.5 K 17 Dec 2014 - 10:32 MattShotwell The Matrix Cookbook (version 15 November 2012)
MDS-examples.RR MDS-examples.R manage 2.0 K 15 Apr 2020 - 09:51 MattShotwell  
airquality-EM-mixture.RR airquality-EM-mixture.R manage 2.2 K 11 Apr 2016 - 10:59 MattShotwell EM algorithm with finite normal mixture
airquality-agnes.RR airquality-agnes.R manage 1.3 K 13 Apr 2016 - 11:22 MattShotwell [Ag]glomerative [nes]ting (clustering) with airquality data
boosting-trees.RR boosting-trees.R manage 5.5 K 01 Apr 2021 - 09:00 MattShotwell Boosting a tree stump with the AdaBoost.M1 algorithm
bootstrap-calibration.RR bootstrap-calibration.R manage 3.2 K 23 Feb 2018 - 11:43 MattShotwell  
df-stepwise.RDataRData df-stepwise.RData manage 2.5 K 17 Feb 2016 - 16:43 MattShotwell  
df-stepwise.RmdRmd df-stepwise.Rmd manage 5.5 K 12 Feb 2017 - 20:50 MattShotwell  
df-stepwise.htmlhtml df-stepwise.html manage 737.8 K 12 Feb 2017 - 20:50 MattShotwell  
effective-df-aic-bic-mcycle.RR effective-df-aic-bic-mcycle.R manage 3.9 K 09 Mar 2020 - 10:58 MattShotwell  
gradient-boosting-example.RR gradient-boosting-example.R manage 8.5 K 08 Apr 2021 - 08:44 MattShotwell  
kNN-CV.RR kNN-CV.R manage 4.0 K 16 Mar 2021 - 08:39 MattShotwell  
kernel-manipulate-example.RR kernel-manipulate-example.R manage 1.2 K 15 Jan 2020 - 10:20 MattShotwell  
kernel-methods-examples-mcycle.RR kernel-methods-examples-mcycle.R manage 3.6 K 24 Feb 2020 - 10:32 MattShotwell  
lab1.pdfpdf lab1.pdf manage 226.6 K 12 Jan 2015 - 14:04 GuanhuaChen BIOS362_lab1
lab2.pdfpdf lab2.pdf manage 1901.4 K 21 Jan 2015 - 11:20 GuanhuaChen slides from Dr. Jojic (UNC)'s Machine learning class
lasso-example.RR lasso-example.R manage 5.4 K 23 Feb 2021 - 08:42 MattShotwell  
lecture-1.pdfpdf lecture-1.pdf manage 408.0 K 08 Jan 2020 - 09:08 MattShotwell  
lecture-10.RmdRmd lecture-10.Rmd manage 3.8 K 10 Feb 2020 - 11:01 MattShotwell  
lecture-10.pdfpdf lecture-10.pdf manage 170.9 K 10 Feb 2020 - 11:17 MattShotwell  
lecture-11.pdfpdf lecture-11.pdf manage 285.1 K 12 Feb 2020 - 08:42 MattShotwell  
lecture-12.pdfpdf lecture-12.pdf manage 473.8 K 26 Feb 2020 - 10:48 MattShotwell  
lecture-13.pdfpdf lecture-13.pdf manage 376.9 K 12 Feb 2018 - 10:31 MattShotwell  
lecture-14.pdfpdf lecture-14.pdf manage 382.7 K 09 Mar 2020 - 10:22 MattShotwell  
lecture-15.pdfpdf lecture-15.pdf manage 354.6 K 16 Mar 2021 - 08:36 MattShotwell  
lecture-16.pdfpdf lecture-16.pdf manage 240.5 K 28 Feb 2020 - 10:50 MattShotwell  
lecture-17.pdfpdf lecture-17.pdf manage 372.7 K 20 Feb 2019 - 11:05 MattShotwell  
lecture-18.pdfpdf lecture-18.pdf manage 190.7 K 28 Feb 2018 - 10:43 MattShotwell  
lecture-2.pdfpdf lecture-2.pdf manage 243.4 K 10 Jan 2020 - 09:36 MattShotwell  
lecture-20.pdfpdf lecture-20.pdf manage 143.4 K 14 Mar 2018 - 10:57 MattShotwell  
lecture-21.pdfpdf lecture-21.pdf manage 456.8 K 20 Mar 2020 - 09:10 MattShotwell  
lecture-22.pdfpdf lecture-22.pdf manage 499.9 K 27 Mar 2020 - 10:33 MattShotwell  
lecture-23.pdfpdf lecture-23.pdf manage 292.3 K 30 Mar 2020 - 09:50 MattShotwell  
lecture-24.pdfpdf lecture-24.pdf manage 494.2 K 01 Apr 2020 - 09:45 MattShotwell  
lecture-25.pdfpdf lecture-25.pdf manage 410.4 K 25 Mar 2020 - 12:54 MattShotwell  
lecture-26.pdfpdf lecture-26.pdf manage 569.1 K 27 Mar 2019 - 11:01 MattShotwell  
lecture-27.pdfpdf lecture-27.pdf manage 190.7 K 29 Mar 2019 - 10:59 MattShotwell  
lecture-28.pdfpdf lecture-28.pdf manage 330.7 K 10 Apr 2020 - 10:58 MattShotwell  
lecture-29.pdfpdf lecture-29.pdf manage 955.6 K 13 Apr 2020 - 09:34 MattShotwell  
lecture-2a.pdfpdf lecture-2a.pdf manage 97.6 K 13 Jan 2020 - 12:02 MattShotwell  
lecture-3.pdfpdf lecture-3.pdf manage 569.0 K 15 Jan 2020 - 10:20 MattShotwell  
lecture-30.pdfpdf lecture-30.pdf manage 626.1 K 15 Apr 2020 - 10:04 MattShotwell  
lecture-31.pdfpdf lecture-31.pdf manage 4059.4 K 08 Apr 2020 - 10:17 MattShotwell  
lecture-32.pdfpdf lecture-32.pdf manage 165.5 K 17 Apr 2020 - 10:03 MattShotwell  
lecture-4.pdfpdf lecture-4.pdf manage 175.7 K 14 Jan 2019 - 10:58 MattShotwell  
lecture-4a.pdfpdf lecture-4a.pdf manage 152.8 K 17 Jan 2020 - 10:11 MattShotwell  
lecture-5.pdfpdf lecture-5.pdf manage 578.2 K 22 Jan 2020 - 10:18 MattShotwell  
lecture-6.pdfpdf lecture-6.pdf manage 97.9 K 25 Feb 2021 - 09:04 MattShotwell  
lecture-7.pdfpdf lecture-7.pdf manage 136.6 K 23 Jan 2019 - 11:18 MattShotwell  
lecture-8.pdfpdf lecture-8.pdf manage 596.0 K 31 Jan 2020 - 10:18 MattShotwell  
lecture-9.pdfpdf lecture-9.pdf manage 1199.6 K 05 Feb 2020 - 11:05 MattShotwell  
linear-regression-examples.RR linear-regression-examples.R manage 5.2 K 15 Feb 2021 - 21:09 MattShotwell  
linear-spline-manipulate-example.RR linear-spline-manipulate-example.R manage 1.2 K 15 Jan 2020 - 10:20 MattShotwell  
mLR-delta.RmdRmd mLR-delta.Rmd manage 4.7 K 12 Feb 2020 - 09:41 MattShotwell  
medExtractR_lecture.pdfpdf medExtractR_lecture.pdf manage 5878.6 K 27 Feb 2020 - 14:26 HannahWeeks medExtractR_lecture
mixture-data-complete.RR mixture-data-complete.R manage 5.7 K 10 Feb 2015 - 09:12 MattShotwell splines regression, local regression, and kernel density classification of the mixture data
mixture-data-knn-local-kde.RR mixture-data-knn-local-kde.R manage 8.4 K 09 Mar 2021 - 10:27 MattShotwell  
mixture-data-knn-local.RR mixture-data-knn-local.R manage 4.7 K 17 Jan 2018 - 10:25 MattShotwell  
mixture-data-lin-knn.RR mixture-data-lin-knn.R manage 4.0 K 28 Jan 2021 - 08:47 MattShotwell  
mixture-data-rpart-bagging.RR mixture-data-rpart-bagging.R manage 3.7 K 23 Mar 2020 - 07:44 MattShotwell  
mixture-data-rpart.RR mixture-data-rpart.R manage 2.6 K 23 Mar 2021 - 08:53 MattShotwell  
mixture-data-svm.RR mixture-data-svm.R manage 3.3 K 07 Apr 2017 - 12:31 MattShotwell SVM with mixture data; 3D graphic
mixture-data.RR mixture-data.R manage 2.1 K 30 Jan 2015 - 09:31 MattShotwell Lab 3; demo code for mixture data
mnist-convnet.RR mnist-convnet.R manage 2.6 K 08 Apr 2019 - 11:09 MattShotwell  
multivariate-KDE.htmlhtml multivariate-KDE.html manage 862.9 K 24 Feb 2020 - 10:31 MattShotwell  
nlls_v2.RR nlls_v2.R manage 3.2 K 19 Jan 2018 - 10:47 MattShotwell  
nnet.RR nnet.R manage 3.0 K 05 Apr 2020 - 16:09 MattShotwell  
nonlinear-bagging.csvcsv nonlinear-bagging.csv manage 0.5 K 29 Feb 2016 - 11:09 MattShotwell nonlinear bagging example data
nonlinear-bagging.htmlhtml nonlinear-bagging.html manage 656.0 K 23 Mar 2020 - 10:46 MattShotwell  
normal-mixture-examples.RR normal-mixture-examples.R manage 1.8 K 17 Apr 2020 - 10:03 MattShotwell  
pca-and-g-inverses.RmdRmd pca-and-g-inverses.Rmd manage 2.4 K 03 Feb 2020 - 08:43 MattShotwell  
pca-and-g-inverses.htmlhtml pca-and-g-inverses.html manage 1507.4 K 03 Feb 2020 - 08:43 MattShotwell  
pca-regression-example.RR pca-regression-example.R manage 3.4 K 25 Feb 2021 - 08:28 MattShotwell  
presentation.pdfpdf presentation.pdf manage 333.3 K 11 Mar 2019 - 10:41 MattShotwell  
principal-curves.RR principal-curves.R manage 3.7 K 10 Apr 2020 - 10:09 MattShotwell Principal curves example
prostate-data-lin.RR prostate-data-lin.R manage 2.5 K 09 Feb 2021 - 08:46 MattShotwell  
prostate.RR prostate.R manage 2.9 K 26 Jan 2015 - 09:45 MattShotwell least-squares, ridge, and principal components regression with prostate data.
random-forest-example.RR random-forest-example.R manage 1.1 K 30 Mar 2021 - 08:51 MattShotwell  
simple-LDA-3D.RR simple-LDA-3D.R manage 2.7 K 31 Jan 2020 - 10:18 MattShotwell  
simple-neural-network.RR simple-neural-network.R manage 3.8 K 28 Mar 2017 - 09:43 MattShotwell Neural network with one hidden layer, 20 units, fully connected
smooth-splines-manipulate-example.RR smooth-splines-manipulate-example.R manage 1.0 K 15 Jan 2020 - 10:20 MattShotwell  
smoothing-splines-example.RR smoothing-splines-example.R manage 1.1 K 26 Feb 2020 - 08:22 MattShotwell  
spectral-clustering.RR spectral-clustering.R manage 3.1 K 13 Apr 2020 - 09:34 MattShotwell  
sphered-and-canonical-inputs.RR sphered-and-canonical-inputs.R manage 6.3 K 07 Feb 2020 - 11:24 MattShotwell  
splines-example.RR splines-example.R manage 3.8 K 14 Feb 2020 - 11:02 MattShotwell Splines example
vowel-data-LR.RmdRmd vowel-data-LR.Rmd manage 3.2 K 10 Feb 2020 - 12:19 MattShotwell  
yuying_1.pdfpdf yuying_1.pdf manage 953.4 K 13 Feb 2015 - 15:30 GuanhuaChen  
yuying_2.pdfpdf yuying_2.pdf manage 2283.6 K 13 Feb 2015 - 15:31 GuanhuaChen  
Topic revision: r347 - 27 Apr 2021, MattShotwell
 

This site is powered by FoswikiCopyright © 2013-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback