DSI 5640: Modeling & Machine Learning I
Instructor
Teaching Assistants
Dates, Time, and Location
- First meeting: Tue. Jan. 26, 2021; Last meeting: Thu. Apr. 29, 2021
- Tuesday, Thursday 11:10AM-12:25PM
- Sony Building 2001-A, and virtually using Zoom via Brightspace
- Office hours: By appointment, initially. Will determine a regular schedule. * We will use the Graduate School Academic Calendar
Textbook
The main book for this course is listed below, and free to download in PDF format at the book webpage:
Hastie, Tibshirani, Friedman. (2009) The elements of statistical learning: data mining, inference and prediction. Springer, 2nd edition.. In the course outline and class schedule, the textbook is abbreviated "HTF", often followed by chapter or page references "Ch. X-Y" or "pp. X-Y", respectively.
Other Resources
Course Topics
- Overview of Supervised Learning and Review of Linear Methods: HTF Ch. 2-4
- Splines and Kernel Methods: HTF Ch. 5-6
- Model Assessment, Selection, and Inference: HTF Ch. 7-8
- Neural Networks: HTF Ch. 11
- Support Vector Machines: HTF Ch. 12
- Unsupervised Learning: HTF Ch. 14
Other information
- Unless otherwise stated, assigned homework is due in one week. Late homework will be subject to a penalty of 20% for each day late.
- Students are encouraged to work together on homework problems, but must turn in their own write-ups.
- Class participation is encouraged.
- Please bring a laptop to class, when classes are held in-person.
Grading
- Homework: 40%
- Midterm Exam: 30%
- Final Exam: 30%
Letter Grade |
Lowest Score |
A+ |
96.5 |
A |
93.5 |
A- |
90.0 |
B+ |
86.5 |
B |
83.5 |
B- |
80.0 |
C |
70.0 |
F |
0.0 |
Schedule of Topics
Date |
Reading (before class) |
Homework |
Topic/Content |
Presentation |
Tue. 1/26 |
none |
none |
Syllabus, introduction |
Intro.pdf |
Thu. 1/28 |
HTF Ch. 1 and Ch. 2.1, 2.2, and 2.3 |
none |
Least-squares, nearest-neighbors |
lecture-1.pdf mixture-data-lin-knn.R |
Tue. 2/2 |
none |
See below: Tue. 2/2 |
Least-squares, nearest-neighbors code |
mixture-data-lin-knn.R |
Thu. 2/4 |
HTF Ch. 2.4 |
none |
Decision theory |
lecture-2.pdf |
Tue. 2/9 |
none |
See below: Tue. 2/9 |
Loss functions in practice |
lecture-2a.pdf prostate-data-lin.R |
Thu. 2/11 |
HTF Ch. 2.7, 2.8, and 2.9 |
none |
Structured regression |
lecture-3.pdf ex-1.R ex-2.R ex-3.R |
Tue. 2/16 |
HTF Ch. 3.1, 3.2, 3.3, 3.4 |
none |
Linear methods, subset selection, ridge, and lasso |
lecture-4a.pdf linear-regression-examples.R lecture-5.pdf lasso-example.R |
Thu. 2/18 |
none |
See below: Thu. 2/18 |
No Class Reading day focused on linear methods for regression. |
Suggested supplemental reading: Introduction to Statistical Learning Ch.3 and Laboratory (section 3.6). |
Tue. 2/23 |
none |
none |
Linear methods, subset selection, ridge, and lasso (cont.) |
lecture-5.pdf lasso-example.R |
Thu. 2/25 |
HTF Ch. 3.5 and 3.6 |
none |
Linear methods: principal components regression |
lecture-6.pdf pca-regression-example.R |
Tue. 3/2 |
HTF Ch. 4.1, 4.2, and 4.3 |
See below: Tue. 3/2 |
Linear methods: Linear discriminant analysis |
lecture-8.pdf simple-LDA-3D.R |
Thu. 3/4 |
HTF Ch. 5.1 and 5.2 |
none |
Basis expansions: piecewise polynomials & splines |
lecture-11.pdf splines-example.R mixture-data-complete.R |
Tue. 3/9 |
HTF Ch. 6.1-6.5 |
none |
Kernel methods |
lecture-13.pdf mixture-data-knn-local-kde.R kernel-methods-examples-mcycle.R |
Thu. 3/11 |
HTF Ch. 7.1, 7.2, 7.3, 7.4 |
See below: Thu. 3/11 |
Model assessment: Cp, AIC, BIC |
lecture-14.pdf effective-df-aic-bic-mcycle.R |
Tue. 3/16 |
HTF Ch. 7.10 |
none |
Cross validation |
lecture-15.pdf kNN-CV.R Income2.csv |
Thu. 3/18 |
none |
none |
Midterm Review |
none |
Tue. 3/23 |
HTF Ch. 9.2 |
none |
Classification and Regression Trees |
lecture-21.pdf mixture-data-rpart.R |
Thu. 3/25 |
HTF Ch. 8.7, 8.8, 8.9 |
none |
Bagging |
lecture-18.pdf mixture-data-rpart-bagging.R nonlinear-bagging.html |
Tue. 3/30 |
HTF Ch. 15.1, 15.2 |
Tue. 3/30 (below) |
Random Forest |
lecture-25.pdf random-forest-example.R |
Thu. 4/1 |
HTF Ch. 10.1 |
none |
Boosting and AdaBoost.M1 (part 1) |
lecture-22.pdf boosting-trees.R |
Tue. 4/6 |
HTF Ch. 10.2-10.9 |
Work through this nice GBM tutorial |
Boosting and AdaBoost.M1 (part 2) |
lecture-23.pdf |
Thu. 4/8 |
HTF Ch. 10.10, 10.13 |
none |
Boosting and AdaBoost.M1 (part 3) |
lecture-24.pdf gradient-boosting-example.R |
Tue. 4/12 |
HTF Ch. 10.10, 10.13 |
none |
Boosting and AdaBoost.M1 (part 3; cont.) |
lecture-24.pdf gradient-boosting-example.R |
Thu. 4/14 |
HTF Ch. 11.1, 11.2, 11.3, 11.4, 11.5 |
none |
Introduction to Neural networks |
lecture-31.pdf nnet.R |
Tue. 4/20 |
HTF Ch. 11.1, 11.2, 11.3, 11.4, 11.5 |
Thu. 4/14 (below) |
Introduction to Neural networks (cont.) |
lecture-31.pdf nnet.R |
Thu. 4/22 |
HTF Ch. 11.1, 11.2, 11.3, 11.4, 11.5 |
none |
Introduction to Neural networks (cont.) |
lecture-31.pdf nnet.R |
Homework/Laboratory (other than problems listed in HTF)
Homework assignments should be completed in a
GitHub repository using the R language (unless otherwise noted). Make sure to add the instructor and TAs as collaborators on your repo. Any reproducible format that renders natively in Github is acceptable. In Rmarkdown, using the 'github_document' or 'md_document' output type in the header will produce a markdown (.md) file that can be rendered within Github, e.g.
---
title: "Homework 1"
author: DS Student
date: January 15, 2020
output: github_document
---
Make sure to include the raw code (.Rmd), the rendered file (.md), and any plots in your repo. Jupyter notebooks (.ipynb) using the R language are also ok.
Resubmissions are only allowed if the initial submission was made on time. A resubmission is due within one week of receiving feedback from TA and there is a maximum of 2 resubmissions for each assignment.
Tue. 2/2
Using the RMarkdown/knitr/github mechanism, implement the following tasks by extending the example R script
mixture-data-lin-knn.R:
- Paste the code from the mixture-data-lin-knn.R file into the homework template Knitr document.
- Read the help file for R's built-in linear regression function lm
- Re-write the functions fit_lc and predict_lc using lm, and the associated predict method for lm objects.
- Consider making the linear classifier more flexible, by adding squared terms for x1 and x2 to the linear model
- Describe how this more flexible model affects the bias-variance tradeoff
Tue. 2/9
Using the RMarkdown/knitr/github mechanism, implement the following tasks by extending the example R script (
prostate-data-lin.R):
- Write functions that implement the L1 loss and tilted absolute loss functions.
- Create a figure that shows lpsa (x-axis) versus lcavol (y-axis). Add and label (using the 'legend' function) the linear model predictors associated with L2 loss, L1 loss, and tilted absolute value loss for tau = 0.25 and 0.75.
- Write functions to fit and predict from a simple nonlinear model with three parameters defined by 'beta[1] + beta[2]*exp(-beta[3]*x)'. Hint: make copies of 'fit_lin' and 'predict_lin' and modify them to fit the nonlinear model. Use c(-1.0, 0.0, -0.3) as 'beta_init'.
- Create a figure that shows lpsa (x-axis) versus lcavol (y-axis). Add and label (using the 'legend' function) the nonlinear model predictors associated with L2 loss, L1 loss, and tilted absolute value loss for tau = 0.25 and 0.75.
Thu. 2/18
Using the RMarkdown/knitr/github mechanism, implement the following tasks:
- Use the prostate cancer data.
- Use the
cor
function to reproduce the correlations listed in HTF Table 3.1, page 50.
- Treat
lcavol
as the outcome, and use all other variables in the data set as predictors.
- With the training subset of the prostate data, train a least-squares regression model with all predictors using the
lm
function.
- Use the testing subset to compute the test error (average squared-error loss) using the fitted least-squares regression model.
- Train a ridge regression model using the
glmnet
function, and tune the value of lambda
(i.e., use guess and check to find the value of lambda
that approximately minimizes the test error).
- Create a figure that shows the training and test error associated with ridge regression as a function of
lambda
- Create a path diagram of the ridge regression analysis, similar to HTF Figure 3.8
Tue. 3/2
Using the RMarkdown/knitr/github mechanism, complete the following exercises from chapter 4, section 4.7 (beginning pp 168) or the
https://www.statlearning.com/:
- Exercise 4: "When the number of features p is large, there tends to be a deterioration in the performance of KNN and other local approaches that perform prediction using only observations..." Please type your solutions within your R Markdown document. No R coding is required for this exercise.
- Exercise 10: "This question should be answered using the Weekly data set, which is part of the ISLR package. This data is similar..." This exercise requires R coding.
Thu. 3/11
Goal: Understand and implement various ways to approximate test error.
In the
ISLR book, read section 6.1.3 “Choosing the Optimal Model” and section 5.1 “Cross-Validation”. Extend and convert the attached
effective-df-aic-bic-mcycle.R R script into an R markdown file that accomplishes the following tasks.
- Randomly split the mcycle data into training (75%) and validation (25%) subsets.
- Using the mcycle data, consider predicting the mean acceleration as a function of time. Use the Nadaraya-Watson method with the k-NN kernel function to create a series of prediction models by varying the tuning parameter over a sequence of values. (hint: the script already implements this)
- With the squared-error loss function, compute and plot the training error, AIC, BIC, and validation error (using the validation data) as functions of the tuning parameter.
- For each value of the tuning parameter, Perform 5-fold cross-validation using the combined training and validation data. This results in 5 estimates of test error per tuning parameter value.
- Plot the CV-estimated test error (average of the five estimates from each fold) as a function of the tuning parameter. Add vertical line segments to the figure (using the
segments
function in R) that represent one “standard error” of the CV-estimated test error (standard deviation of the five estimates from each fold).
- Interpret the resulting figures and select a suitable value for the tuning parameter.
Tue. 3/30
Goal: Understand and implement a random forest classifier.
Using the
“vowel.train” data, and the “randomForest” function in the R package “randomForest”. Develop a random forest classifier for the vowel data by doing the following:
- Convert the response variable in the “vowel.train” data frame to a factor variable prior to training, so that “randomForest” does classification rather than regression.
- Review the documentation for the “randomForest” function.
- Fit the random forest model to the vowel data using all of the 11 features using the default values of the tuning parameters.
- Use 5-fold CV and tune the model by performing a grid search for the following tuning parameters: 1) the number of variables randomly sampled as candidates at each split; consider values 3, 4, and 5, and 2) the minimum size of terminal nodes; consider a sequence (1, 5, 10, 20, 40, and 80).
- With the tuned model, make predictions using the majority vote method, and compute the misclassification rate using the ‘vowel.test’ data.
Thu. 4/14
Goal: Get started using Keras to construct simple neural networks
Due: Tuesday, April 27.
- Work through the "Image Classification" tutorial on the RStudio Keras website.
- Use the Keras library to re-implement the simple neural network discussed during lecture for the mixture data (see nnet.R). Use a single 10-node hidden layer; fully connected.
- Create a figure to illustrate that the predictions are (or are not) similar using the 'nnet' function versus the Keras model.
- (optional extra credit) Convert the neural network described in the "Image Classification" tutorial to a network that is similar to one of the convolutional networks described during lecture on 4/15 (i.e., Net-3, Net-4, or Net-5) and also described in the ESL book section 11.7. See the !ConvNet tutorial on the RStudio Keras website.
Links
RStudio/Knitr