(r344) CourseBios362 < Main < Vanderbilt Biostatistics Wiki

You are here: Vanderbilt Biostatistics Wiki>Main Web>Education>CourseBios>CourseBios362 (revision 344)~~EditAttach~~

BIOS 362: Advanced Statistical Inference (Statistical Learning)

Instructor

Matthew S. Shotwell, PhD
matt.shotwell@vanderbilt.edu
11118B: 2525 West End Avenue
615-875-3397
Github: biostatmatt

Teaching Assistants

Chiara DiGravio
Github: ChiaraDG

Dates, Time, and Location

First meeting: Tue. Jan. 26, 2021; Last meeting: Thu. Apr. 29, 2021
Tuesday, Thursday 9:00AM-10:30AM
Virtually using Zoom via Brightspace
Office hours: By appointment, initially. Will determine a regular schedule. * We will use the Graduate School Academic Calendar

Textbook

The book for this course is listed below, and free to download in PDF format at the book webpage: Hastie, Tibshirani, Friedman. (2009) The elements of statistical learning: data mining, inference and prediction. Springer, 2nd edition.. In the course outline and class schedule, the textbook is abbreviated "HTF", often followed by chapter or page references "Ch. X-Y" or "pp. X-Y", respectively. The BibTeX entry for the book is as follows:

@book{HTF2009,
  author = {Hastie, Trevor and Tibshirani, Robert and Friedman, Jerome},
  title = {The elements of statistical learning: data mining, inference and prediction},
  url = {http://www-stat.stanford.edu/~tibs/ElemStatLearn/},
  publisher = {Springer},
  year = 2009,
  edition = 2
}

The wide margins of the linked PDF version of the book make it difficult to read on smart devices (e.g., an iPhone). The margins may be removed using the following ghostscript command in Linux, where "output.pdf" and "input.pdf" are substituted for the appropriate file names. Please see Dr. Shotwell for help with this.

gs -o output.pdf -sDEVICE=pdfwrite -c "[/CropBox [130 140 460 685] /PAGES pdfmark" -f input.pdf

Other Resources

A more applied book (also free to download) with slides, R code, and video tutorials: http://www-bcf.usc.edu/~gareth/ISL/
The Matrix Cookbook (version 15 November 2012): MCB-20121115.pdf

Course Topics

Overview of Supervised Learning and Review of Linear Methods: HTF Ch. 2-4
Splines and Kernel Methods: HTF Ch. 5-6
Model Assessment, Selection, and Inference: HTF Ch. 7-8
Neural Networks: HTF Ch. 11
Support Vector Machines: HTF Ch. 12
Unsupervised Learning: HTF Ch. 14

Other information

Unless otherwise stated, assigned homework is due in one week.
Students are encouraged to work together on homework problems, but they must turn in their own write-ups.
Class participation is encouraged.
Please bring a laptop to class.

Grading

Homework: 40%
Take-home Midterm Exam: 30%
Take-home Final Exam: 30%

Schedule of Topics

Date	Reading (before class)	Homework	Topic/Content	Presentation
Tue. 1/26	none	none	Syllabus, introduction	Intro.pdf
Thu. 1/28	HTF Ch. 1 and Ch. 2.1, 2.2, and 2.3	See below: Thu. 1/28	Least-squares, nearest-neighbors	lecture-1.pdf mixture-data-lin-knn.R
Tue. 2/2	none	none	Least-squares, nearest-neighbors code	mixture-data-lin-knn.R
Thu. 2/4	HTF Ch. 2.4	none	Decision theory	lecture-2.pdf
Tue. 2/9	none	See below: Tue. 2/9	Loss functions in practice	lecture-2a.pdf prostate-data-lin.R
Thu. 2/11	HTF Ch. 2.7, 2.8, and 2.9	none	Structured regression	lecture-3.pdf ex-1.R ex-2.R ex-3.R
Tue. 2/16	HTF Ch. 3.1, 3.2, 3.3, 3.4	none	Linear methods, subset selection, ridge, and lasso	lecture-4a.pdf linear-regression-examples.R lecture-5.pdf lasso-example.R
Thu. 2/18	none	See below: Tue. 2/18	No Class Reading day focused on linear methods for regression.	Suggested supplemental reading: HTF Ch. 3.6, 3.7, 3.8, and 3.9. Suggested supplemental exercises: Ex. 3.12, 3.18
Tue. 2/23	none	none	Linear methods, subset selection, ridge, and lasso (cont.)	lecture-5.pdf lasso-example.R
Thu. 2/25	HTF Ch. 3.5 and 3.6	none	Linear methods: principal components regression	lecture-6.pdf pca-regression-example.R lec7.pdf lec8.pdf pca-and-g-inverses.html
Tue. 3/2	HTF Ch. 4.1, 4.2, and 4.3	See below: Tue. 3/2	Linear methods: Linear discriminant analysis	lecture-8.pdf simple-LDA-3D.R
Thu. 3/4	HTF Ch. 5.1 and 5.2	none	Basis expansions: piecewise polynomials & splines	lecture-11.pdf splines-example.R mixture-data-complete.R
Tue. 3/9	HTF Ch. 6.1-6.5	none	Kernel methods	lecture-13.pdf mixture-data-knn-local-kde.R kernel-methods-examples-mcycle.R
Thu. 3/11	HTF Ch. 7.1, 7.2, 7.3, 7.4	See below: Thu. 3/11	Model assessment: Cp, AIC, BIC	lecture-14.pdf effective-df-aic-bic-mcycle.R
Tue. 3/16	HTF Ch. 7.10	none	Cross validation	lecture-15.pdf kNN-CV.R Income2.csv
Thu. 3/18	none	none	Midterm Review	none
Tue. 3/23	HTF Ch. 9.2	none	Classification and Regression Trees	lecture-21.pdf mixture-data-rpart.R
Thu. 3/25	HTF Ch. 8.7, 8.8, 8.9	none	Bagging	lecture-18.pdf mixture-data-rpart-bagging.R nonlinear-bagging.html
Tue. 3/30	HTF Ch. 15.1, 15.2	Tue. 3/30 (below)	Random Forest	lecture-25.pdf random-forest-example.R
Thu. 4/1	HTF Ch. 10.1	none	Boosting and AdaBoost.M1 (part 1)	lecture-22.pdf boosting-trees.R
Tue. 4/6	HTF Ch. 10.2-10.9	Work through this nice GBM tutorial	Boosting and AdaBoost.M1 (part 2)	lecture-23.pdf
Thu. 4/8	HTF Ch. 10.10, 10.13	none	Boosting and AdaBoost.M1 (part 3)	lecture-24.pdf gradient-boosting-example.R
Tue. 4/12	HTF Ch. 11.1, 11.2, 11.3, 11.4, 11.5	none	Introduction to Neural networks	lecture-31.pdf nnet.R
Thu. 4/14	HTF Ch. 11.1, 11.2, 11.3, 11.4, 11.5	Thu. 4/14 (below)	Introduction to Neural networks (cont.)	lecture-31.pdf nnet.R

Homework/Laboratory (other than problems listed in HTF)

Thu. 1/28

Using the RMarkdown/knitr/github mechanism, implement the following tasks by extending the example R script mixture-data-lin-knn.R:

Paste the code from the mixture-data-lin-knn.R file into the homework template Knitr document.
Read the help file for R's built-in linear regression function lm
Re-write the functions fit_lc and predict_lc using lm, and the associated predict method for lm objects.
Consider making the linear classifier more flexible, by adding squared terms for x1 and x2 to the linear model
Describe how this more flexible model affects the bias-variance tradeoff

Tue. 2/9

Using the RMarkdown/knitr/github mechanism, implement the following tasks by extending the example R script ( prostate-data-lin.R):

Write functions that implement the L1 loss and tilted absolute loss functions.
Create a figure that shows lpsa (x-axis) versus lcavol (y-axis). Add and label (using the 'legend' function) the linear model predictors associated with L2 loss, L1 loss, and tilted absolute value loss for tau = 0.25 and 0.75.
Write functions to fit and predict from a simple nonlinear model with three parameters defined by 'beta[1] + beta[2]*exp(-beta[3]*x)'. Hint: make copies of 'fit_lin' and 'predict_lin' and modify them to fit the nonlinear model. Use c(-1.0, 0.0, -0.3) as 'beta_init'.
Create a figure that shows lpsa (x-axis) versus lcavol (y-axis). Add and label (using the 'legend' function) the nonlinear model predictors associated with L2 loss, L1 loss, and tilted absolute value loss for tau = 0.25 and 0.75.

Tue. 2/18

Using the RMarkdown/knitr/github mechanism, implement the following tasks:

Use the prostate cancer data.
Use the cor function to reproduce the correlations listed in HTF Table 3.1, page 50.
Treat lcavol as the outcome, and use all other variables in the data set as predictors.
With the training subset of the prostate data, train a least-squares regression model with all predictors using the lm function.
Use the testing subset to compute the test error (average squared-error loss) using the fitted least-squares regression model.
Train a ridge regression model using the glmnet function, and tune the value of lambda (i.e., use guess and check to find the value of lambda that approximately minimizes the test error).
Create a figure that shows the training and test error associated with ridge regression as a function of lambda
Create a path diagram of the ridge regression analysis, similar to HTF Figure 3.8

Tue. 3/2

Goal: Understand and implement reduced rank LDA in R. This homework covers new material that we will not cover in class.

Using the RMarkdown/knitr/github mechanism, implement the following tasks:

Retrieve the vowel data (training and testing) from the HTF website or R package.
Review HTF section 4.3.3 and (optionally): LA Examples and example.R
Implement reduced-rank LDA using the vowel training data. Check your work by plotting the first two discriminant variables as in HTF Figure 4.4. Hint: Center the 10 training predictors before implementing LDA. See built-in R function ’scale’. The singular value or Eigen decompositions may be computed using the built-in R functions ’svd’ or ’eigen’, respectively.
Use the vowel testing data to estimate the expected prediction error (assuming zero-one loss), varying the number of canonical variables used for classification.
Plot the EPE as a function of the number of discriminant variables, and compare this with HTF Figure 4.10.
(Optional) Reproduce HTF Figure 4.11. Note: The reproduction need not be exact. However, the information content should be preserved.

Thu. 3/11

Complete HTF exercises 7.4 and 7.6
This homework should be submitted using the Github mechanism. However, you may complete the homework on paper and scan an image to upload. Or, you may use the LaTeX-style markup in an RMarkdown document.

Tue. 3/30

Complete HTF exercise Ex. 15.4.

Thu. 4/14

Goal: Get started using Keras to construct simple neural networks

Work through the "Image Classification" tutorial on the RStudio Keras website.
Use the Keras library to re-implement the simple neural network discussed during lecture for the mixture data (see nnet.R). Use a single 10-node hidden layer; fully connected.
Create a figure to illustrate that the predictions are (or are not) similar using the 'nnet' function versus the Keras model.
(optional extra credit) Convert the neural network described in the "Image Classification" tutorial to a network that is similar to one of the convolutional networks described during lecture on 4/15 (i.e., Net-3, Net-4, or Net-5) and also described in the ESL book section 11.7. See the !ConvNet tutorial on the RStudio Keras website.

Links

RStudio/Knitr

Topic attachments
I	Attachment	Action	Size	Date	Who	Comment
R	2016-midterm-phoneme.R	manage	3.7 K	25 Mar 2016 - 08:33	MattShotwell	Code for solution to 2016 midterm.
pdf	HW10.pdf	manage	44.3 K	09 Mar 2015 - 09:51	MattShotwell	Homework 10
csv	Income2.csv	manage	1.6 K	16 Mar 2021 - 08:40	MattShotwell
pdf	Intro.pdf	manage	781.2 K	06 Jan 2020 - 08:21	MattShotwell
html	LA_Examples_DS_Bootcamp.html	manage	2374.0 K	05 Feb 2020 - 11:04	MattShotwell
R	LAozone.R	manage	3.1 K	14 Mar 2018 - 10:57	MattShotwell
pdf	LagrangeMultipliers-Bishop-PatternRecognitionMachineLearning.pdf	manage	1574.4 K	06 Apr 2016 - 17:53	MattShotwell	Lagrange Multipliers; Bishop; Pattern Recognition and Machine Learning
pdf	MCB-20121115.pdf	manage	676.5 K	17 Dec 2014 - 10:32	MattShotwell	The Matrix Cookbook (version 15 November 2012)
R	MDS-examples.R	manage	2.0 K	15 Apr 2020 - 09:51	MattShotwell
R	airquality-EM-mixture.R	manage	2.2 K	11 Apr 2016 - 10:59	MattShotwell	EM algorithm with finite normal mixture
R	airquality-agnes.R	manage	1.3 K	13 Apr 2016 - 11:22	MattShotwell	[Ag]glomerative [nes]ting (clustering) with airquality data
R	boosting-trees.R	manage	5.5 K	01 Apr 2021 - 09:00	MattShotwell	Boosting a tree stump with the AdaBoost.M1 algorithm
R	bootstrap-calibration.R	manage	3.2 K	23 Feb 2018 - 11:43	MattShotwell
RData	df-stepwise.RData	manage	2.5 K	17 Feb 2016 - 16:43	MattShotwell
Rmd	df-stepwise.Rmd	manage	5.5 K	12 Feb 2017 - 20:50	MattShotwell
html	df-stepwise.html	manage	737.8 K	12 Feb 2017 - 20:50	MattShotwell
R	effective-df-aic-bic-mcycle.R	manage	3.9 K	09 Mar 2020 - 10:58	MattShotwell
R	gradient-boosting-example.R	manage	8.5 K	08 Apr 2021 - 08:44	MattShotwell
R	kNN-CV.R	manage	4.0 K	16 Mar 2021 - 08:39	MattShotwell
R	kernel-manipulate-example.R	manage	1.2 K	15 Jan 2020 - 10:20	MattShotwell
R	kernel-methods-examples-mcycle.R	manage	3.6 K	24 Feb 2020 - 10:32	MattShotwell
pdf	lab1.pdf	manage	226.6 K	12 Jan 2015 - 14:04	GuanhuaChen	BIOS362_lab1
pdf	lab2.pdf	manage	1901.4 K	21 Jan 2015 - 11:20	GuanhuaChen	slides from Dr. Jojic (UNC)'s Machine learning class
R	lasso-example.R	manage	5.4 K	23 Feb 2021 - 08:42	MattShotwell
pdf	lecture-1.pdf	manage	408.0 K	08 Jan 2020 - 09:08	MattShotwell
Rmd	lecture-10.Rmd	manage	3.8 K	10 Feb 2020 - 11:01	MattShotwell
pdf	lecture-10.pdf	manage	170.9 K	10 Feb 2020 - 11:17	MattShotwell
pdf	lecture-11.pdf	manage	285.1 K	12 Feb 2020 - 08:42	MattShotwell
pdf	lecture-12.pdf	manage	473.8 K	26 Feb 2020 - 10:48	MattShotwell
pdf	lecture-13.pdf	manage	376.9 K	12 Feb 2018 - 10:31	MattShotwell
pdf	lecture-14.pdf	manage	382.7 K	09 Mar 2020 - 10:22	MattShotwell
pdf	lecture-15.pdf	manage	354.6 K	16 Mar 2021 - 08:36	MattShotwell
pdf	lecture-16.pdf	manage	240.5 K	28 Feb 2020 - 10:50	MattShotwell
pdf	lecture-17.pdf	manage	372.7 K	20 Feb 2019 - 11:05	MattShotwell
pdf	lecture-18.pdf	manage	190.7 K	28 Feb 2018 - 10:43	MattShotwell
pdf	lecture-2.pdf	manage	243.4 K	10 Jan 2020 - 09:36	MattShotwell
pdf	lecture-20.pdf	manage	143.4 K	14 Mar 2018 - 10:57	MattShotwell
pdf	lecture-21.pdf	manage	456.8 K	20 Mar 2020 - 09:10	MattShotwell
pdf	lecture-22.pdf	manage	499.9 K	27 Mar 2020 - 10:33	MattShotwell
pdf	lecture-23.pdf	manage	292.3 K	30 Mar 2020 - 09:50	MattShotwell
pdf	lecture-24.pdf	manage	494.2 K	01 Apr 2020 - 09:45	MattShotwell
pdf	lecture-25.pdf	manage	410.4 K	25 Mar 2020 - 12:54	MattShotwell
pdf	lecture-26.pdf	manage	569.1 K	27 Mar 2019 - 11:01	MattShotwell
pdf	lecture-27.pdf	manage	190.7 K	29 Mar 2019 - 10:59	MattShotwell
pdf	lecture-28.pdf	manage	330.7 K	10 Apr 2020 - 10:58	MattShotwell
pdf	lecture-29.pdf	manage	955.6 K	13 Apr 2020 - 09:34	MattShotwell
pdf	lecture-2a.pdf	manage	97.6 K	13 Jan 2020 - 12:02	MattShotwell
pdf	lecture-3.pdf	manage	569.0 K	15 Jan 2020 - 10:20	MattShotwell
pdf	lecture-30.pdf	manage	626.1 K	15 Apr 2020 - 10:04	MattShotwell
pdf	lecture-31.pdf	manage	4059.4 K	08 Apr 2020 - 10:17	MattShotwell
pdf	lecture-32.pdf	manage	165.5 K	17 Apr 2020 - 10:03	MattShotwell
pdf	lecture-4.pdf	manage	175.7 K	14 Jan 2019 - 10:58	MattShotwell
pdf	lecture-4a.pdf	manage	152.8 K	17 Jan 2020 - 10:11	MattShotwell
pdf	lecture-5.pdf	manage	578.2 K	22 Jan 2020 - 10:18	MattShotwell
pdf	lecture-6.pdf	manage	97.9 K	25 Feb 2021 - 09:04	MattShotwell
pdf	lecture-7.pdf	manage	136.6 K	23 Jan 2019 - 11:18	MattShotwell
pdf	lecture-8.pdf	manage	596.0 K	31 Jan 2020 - 10:18	MattShotwell
pdf	lecture-9.pdf	manage	1199.6 K	05 Feb 2020 - 11:05	MattShotwell
R	linear-regression-examples.R	manage	5.2 K	15 Feb 2021 - 21:09	MattShotwell
R	linear-spline-manipulate-example.R	manage	1.2 K	15 Jan 2020 - 10:20	MattShotwell
Rmd	mLR-delta.Rmd	manage	4.7 K	12 Feb 2020 - 09:41	MattShotwell
pdf	medExtractR_lecture.pdf	manage	5878.6 K	27 Feb 2020 - 14:26	HannahWeeks	medExtractR_lecture
R	mixture-data-complete.R	manage	5.7 K	10 Feb 2015 - 09:12	MattShotwell	splines regression, local regression, and kernel density classification of the mixture data
R	mixture-data-knn-local-kde.R	manage	8.4 K	09 Mar 2021 - 10:27	MattShotwell
R	mixture-data-knn-local.R	manage	4.7 K	17 Jan 2018 - 10:25	MattShotwell
R	mixture-data-lin-knn.R	manage	4.0 K	28 Jan 2021 - 08:47	MattShotwell
R	mixture-data-rpart-bagging.R	manage	3.7 K	23 Mar 2020 - 07:44	MattShotwell
R	mixture-data-rpart.R	manage	2.6 K	23 Mar 2021 - 08:53	MattShotwell
R	mixture-data-svm.R	manage	3.3 K	07 Apr 2017 - 12:31	MattShotwell	SVM with mixture data; 3D graphic
R	mixture-data.R	manage	2.1 K	30 Jan 2015 - 09:31	MattShotwell	Lab 3; demo code for mixture data
R	mnist-convnet.R	manage	2.6 K	08 Apr 2019 - 11:09	MattShotwell
html	multivariate-KDE.html	manage	862.9 K	24 Feb 2020 - 10:31	MattShotwell
R	nlls_v2.R	manage	3.2 K	19 Jan 2018 - 10:47	MattShotwell
R	nnet.R	manage	3.0 K	05 Apr 2020 - 16:09	MattShotwell
csv	nonlinear-bagging.csv	manage	0.5 K	29 Feb 2016 - 11:09	MattShotwell	nonlinear bagging example data
html	nonlinear-bagging.html	manage	656.0 K	23 Mar 2020 - 10:46	MattShotwell
R	normal-mixture-examples.R	manage	1.8 K	17 Apr 2020 - 10:03	MattShotwell
Rmd	pca-and-g-inverses.Rmd	manage	2.4 K	03 Feb 2020 - 08:43	MattShotwell
html	pca-and-g-inverses.html	manage	1507.4 K	03 Feb 2020 - 08:43	MattShotwell
R	pca-regression-example.R	manage	3.4 K	25 Feb 2021 - 08:28	MattShotwell
pdf	presentation.pdf	manage	333.3 K	11 Mar 2019 - 10:41	MattShotwell
R	principal-curves.R	manage	3.7 K	10 Apr 2020 - 10:09	MattShotwell	Principal curves example
R	prostate-data-lin.R	manage	2.5 K	09 Feb 2021 - 08:46	MattShotwell
R	prostate.R	manage	2.9 K	26 Jan 2015 - 09:45	MattShotwell	least-squares, ridge, and principal components regression with prostate data.
R	random-forest-example.R	manage	1.1 K	30 Mar 2021 - 08:51	MattShotwell
R	simple-LDA-3D.R	manage	2.7 K	31 Jan 2020 - 10:18	MattShotwell
R	simple-neural-network.R	manage	3.8 K	28 Mar 2017 - 09:43	MattShotwell	Neural network with one hidden layer, 20 units, fully connected
R	smooth-splines-manipulate-example.R	manage	1.0 K	15 Jan 2020 - 10:20	MattShotwell
R	smoothing-splines-example.R	manage	1.1 K	26 Feb 2020 - 08:22	MattShotwell
R	spectral-clustering.R	manage	3.1 K	13 Apr 2020 - 09:34	MattShotwell
R	sphered-and-canonical-inputs.R	manage	6.3 K	07 Feb 2020 - 11:24	MattShotwell
R	splines-example.R	manage	3.8 K	14 Feb 2020 - 11:02	MattShotwell	Splines example
Rmd	vowel-data-LR.Rmd	manage	3.2 K	10 Feb 2020 - 12:19	MattShotwell
pdf	yuying_1.pdf	manage	953.4 K	13 Feb 2015 - 15:30	GuanhuaChen
pdf	yuying_2.pdf	manage	2283.6 K	13 Feb 2015 - 15:31	GuanhuaChen

Topic revision: r344 - 14 Apr 2021, MattShotwell

Main

Department Home Page

Biostatistics Graduate Program

Vanderbilt University Medical Center

Biostatistics Webs
- Archive
- Main
- Sandbox
- System

Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback