GRADES

Your grade in this class is calculated from your scores on the following items:
Assignment Points
Final Project 20 pts
Paper Review 20 pts
Daily HW 4 pts x 15 = 60 pts
Total 100 pts

The instructors may provide opportunities to earn additional points. Your final grade will be calculated from the final score as follows:
Score Grade
90 - 100 A
80 -  89 B
70 -  79 C
60 -  69 D
00 -  59 F




ASSIGNMENTS

Final Project
Overview:
You will plan, execute, and present a data analysis on a question of your own choosing. We encourage you to choose a question from your own research.
Due dates:
Analysis plan will be submitted as a HW - see schedule
Presentations will occur in class - see schedule
Presentation slides (if any) will be shared with the entire class (via slack)
Final reports due on the last day of class at 11:59PM
Name the report as follows: lastname-firstname-finalproject. For example, stewart-thomas-finalproject.pdf
Use a header so that each page prints your name in the top margin.
Instructions:
  • Choose a research question so that development of a predictive model makes sense.
  • Select a dataset that includes predictors of various types (both continuous and categorical).
  • Incorporate methods discussed in the course. (For example, transformations, splines, missing data methods, bootstrap, etc.)
  • Address model selection and fit.
  • Use graphical displays in both the report and the presentation.
  • Plan for a 15 minute presentation and 5 minutes of questions
  • Prepare a statistical analysis plan for your final project. The following is a suggested outline for the analysis plan.
    1. Introduction. This is a statistical analysis plan, so a (very) short introduction of the scientific / biological content is sufficient. Give the reader enough content to orient themselves.
    2. Context. List earlier studies that form the foundation of your current research project. As a statistical analysis plan, focus on the statistical aspects of the earlier studies, i.e., study design, patient population, analysis method, type of hypothesis test. Indicate how the project adds to or replicates earlier work.
    3. Research questions.
    4. Analysis plan. For each research question, provide information on the following:
      • Study design
      • Study population, inclusion/exclusion criteria
      • Variables
      • Statistical methods
      • Missing Data
      • Sensitivity analyses
      • Output. Provide mock-ups of tables and figures that summarize the results of the analyses. Mock-ups should be as complete as possible. Think how you might generate tables and figures that compare results from earlier research with the results from the present project.



Paper Review
Overview:
You will present to the class a critique of a published paper from a medical journal.
Due dates:
Discussion in class - see schedule

Presentation slides (if any) will be shared with the entire class (via slack)
Use a header so that each page prints your name in the top margin.
Instructions:
  • Select a paper from the list of papers on slack (or add a paper of your choice)
  • Review of the paper, focusing on the statistical analysis.
    • Identify strengths and weaknesses.
    • If there are issues with the analysis, explain what the issue is, how it may affect the analysis, and how you would analyze the data to avoid the issue.
  • Be prepared to discuss your critique and lead a discussion in class.
  • Read two other papers from the list of papers so that you may participate in the discussions during class



Daily Homework
Due:
The next class period after the question is assigned.
Include code if the problem requires STATA computation
Problems:
  1. Read the description of the data in ABD chapter 17 problem 1. The dataset is available from ABD Datasets as 17-q-01.
    1. Plot the data in a scatter plot. Plot the estimated linear regression line with the scatter plot.
    2. Report the parameter estimates and confidence intervals of the regression line as one might find them in a medical journal.
    3. Write the parameter estimates of the regression line in the form of an equation.
    4. Interpret each parameter in the model.
    5. What do the confidence intervals communicate?
    6. Create a scatter plot with the regression estimate and confidence band.
  2. Do ABD chapter 17 problem 3. Skip part (b). Instead of calculating values by hand, use STATA.
    *For part (c), the notation in ABD is different than what we use in class.  Here is the question using the notation from class:
    (c) What is the t-statistic for the hypothesis that beta_1 (coefficient of the predictor) is zero?
     *For part (d):
    display invttail( degrees-of-freedom, alpha/2 )
    *For part (f):
    display 2*ttail( degrees-of-freedom, value-of-t-statistic )
    
  3. Do ABD chapter 17 problem 8
  4. Do ABD chapter 17 problem 16
  5. Do ABD chapter 17 problem 20. Data available in ABD Datasets.
  6. Do ABD chapter 17 problem 21. Data available in ABD Datasets.
  7. Do ABD chapter 17 problem 28. Data available in ABD Datasets.
  8. Do ABD chapter 18 problem 10
  9. Do ABD chapter 18 problem 11. Data available in ABD Datasets.
  10. Do ABD chapter 18 problem 12 with the following modifications:
    (a) Generate an ANOVA and partial effect table as described in the BBR notes.  The table will look like this:
    
    | Effect         | DF | F | P |
    |----------------|----|---|---|
    | Species        | xx | x | x |
    |   interactions | xx | x | x |
    |----------------|----|---|---|
    | Mass           | xx | x | x |
    |   interactions | xx | x | x |
    |----------------|----|---|---|
    | Regression     | xx | x | x |
    |----------------|----|---|---|
    | Total          | xx |   |   |
    
    
    (b) How does the model with SPECIES*MASS term differ from the model without the term?
    (c) As in the book
    (d) As in the book
    (e) As in the book
    (f) Generate a scatter plot of the data.  Underlay the scatter plot with a plot of the model and confidence bands.
    
    Data available in ABD Datasets: 18-q-12
  11. Do ABD chapter 18 problem 15. Data available in ABD Datasets. Reproduce the figure in the question.
  12. Do ABD chapter 18 problem 18. Data available in ABD Datasets.
  13. Do RMS chapter 2 problem 1.
  14. Do RMS chapter 2 problem 2.
  15. Do RMS chapter 2 problem 3.
  16. Do RMS chapter 2 problem 4.
  17. Do RMS chapter 2 problem 5. Data available with command
    getvdata sat
    
  18. With the original data from ABD chapter 19 problem 11 (First 18 obs of 19-q-11):
    1. Fit a simple linear model with spermstored as outcome and shellvolume as predictor.
      E[spermstored] = b0 + b1 shellvolumne
    2. Plot the data and the regression model.
    3. Calculate the slope coefficient (b1).
    4. Use bootstrapping to generate a confidence interval for the slope coefficient (b1).
    5. Explain the advantages of the bootstrap confidence interval in this situation?
  19. With the data from ABD chapter 19 problem 13 (19-q-13):
    1. Provide a bootstrap estimate of the 95% confidence interval of the median.
    2. Provide a bootstrap estimate of the 95% confidence interval of the 75th percentile.
  20. Using the linear model in ABD chapter 17 problem 1,
    1. Generate the following diagnostic plots:
      1. Q-Q plot of the residuals
      2. Residuals by predictor
    2. What aspect of model fit does the Q-Q plot communicate?
    3. What does the Q-Q plot look like if the model fits well?
    4. What does the Q-Q plot look like if the model fits poorly?
    5. What aspect of model fit does the residual by predictor plot communicate?
    6. What does the residual by predictor plot look like if the model fits well?
    7. What does the residual by predictor plot look like if the model fits poorly?
    8. Generate the predicted outcome by predictor plot with confidence bands appropriate for forecasting the outcome for a single, previously unseen observation.
  21. In section 10.10.3, there is one-way analysis of covariance of Rosner's lead data. Using the same data (getvdata lead):
    1. Write the linear model relating maxfwt and group using reference group parameteriztion
    2. Interpret the model parameters in the context of the scientific question of the analysis
    3. Fit the model in STATA and report the parameter estimates
    4. Write the linear model relating maxfwt and group using group means parameteriztion
    5. Interpret the model parameters in the context of the scientific question of the analysis
    6. Fit the model in STATA and report the parameter estimates
    7. Write the linear model relating maxfwt with group and age using reference group parameteriztion
    8. Interpret the model parameters in the context of the scientific question of the analysis
    9. Fit the model in STATA and report the parameter estimates
    10. Replicate the predicted outcome plot shown in section 10.10.3 of BBR
    11. Replicate the ANOVA table with age and group in section 10.10.3 of BBR
    12. Interpret the hypothesis test represented by "group" in the context of the scientific question
  22. Using the 2.20.Framingham data, fit the following model
    (a)  Model E[spb | sex bmi age scl] = 
       const + MALE  
     + rcs(bmi,5) + rcs(bmi,5) × MALE 
     + rcs(age,5) + rcs(age,5) × MALE
     + rcs(scl,5) + rcs(scl,5) × MALE 
    
    (b) Generate the following plots:
    *   Plot E[spb|-] over bmi, for males and females
    *   Plot E[spb|-] over age, for males and females
    
    (c) Perform a chunk test to determine if there is a
       common or sex specific "slope" of 
       bmi, age, and scl.
    
  23. Using the nhgh data, do the following
    (a) Create a variable cluster plot of wt ht bmi leg arml armc waist tri sub age sex re
    
    (b) Generate the first principle component of the variables leg, arml, ht, report the internal coefficients of the principle component.
    
  24. Using the support data, do the following
    (a) Create the potential predictive power plot and the variable cluster plot of the following variables:
    
    sex num_co age meanbp hrt resp bili crea sod slos dzgroup hospdead when predicting ln(totcst)
    
    (b) Based on the output in part (a), decide if you need to combine some variables and decide how many knots to allocate to continuous variables.  Decide on a model that includes all of the variables and include a meaningful interaction of a continuous and categorical variable.  (Hint: interact slos and dzgroup).  Write the model in the following way (replacing ?? with the desired number of knots).
    
    E[ln totcst|-] = sex + num_co + rcs(age, ??) 
         + rcs(meanbp, ??) + rcs(hrt, ??) 
         + ...
         + ...
         + rcs(slos, ??) + rcs(slos, ??)*dzgroup 
         + dzgroup + hospdead
    
    (c) Generate residual diagnostic plots (QQ, residual by y_hat)
    
    (d) Generate the calibration plot (y by y_hat).  Add the line of identity or the scatter plot of y by y.
    
    (e) Generate the regression anova table with tests of no associations, tests of interaction, tests of linearity
    
    (f) Generate the DFFITS influence plot
    
    (g) Generate the leverage plot.
    
  25. This question is a continuation of the previous question
    (h) Calculate the mean absolute prediction error of your model.
    
    (i) Calculate a measure of rank discrimination.
    
    (j) Calculate the R^2 optimism.
    
  26. Using the 2.20.Framingham data, do the following
    1. Calculate the effective sample size. Determine the available degrees of freedom under the 15:1 rule.
    2. Perform a variable cluster analysis, and identify two variables to combine into a single principle component.
    3. Fit a cox regression model in which the the continuous variables are included as restricted cubic splines with 3 knots. Allow the effect of age to differ between males and females.
    4. Examine the proportional hazards assumption graphically by creating 6 groups of patients based on the estimated log hazards
    5. Calculate and report a rank discrimination measure (a concordance probability)
    6. Describe how you might examine the model for overly influential observations. (You do not have to report anything from this step, other than to describe what you might do.)
    7. Generate the partial effect plot of age
    8. Generate the ANOVA table of predictor effects.
    getvdata 2.20.Framingham
    drop month id
    
  27. Using the titanic3 data, do the following
    1. Calculate the effective sample size. Determine the available degrees of freedom under the 15:1 rule.
    2. Fit the model reported on page 277 (section 12.3) of the RMS course notes.
    3. Calculate one rank discrimination measure and one discrimination measure. (See output table reported in RMS notes.)
    4. Generate the calibration plot.
    5. Generate the ANOVA table of predictor effects.
    6. Replicate the partial effect plots reported on page 279 of RMS course notes.
    getvdata titanic3
    
  28. Using the MoleRat data (18-e-4), run the STATA code below which fits a regression with caste-specific intercepts and caste-specific slopes. After running the regression,
    1. Write the regression model.
      Use this notation: 
      E[lnenergy|lnmass, worker] = ...  
      V[lnenergy|lnmass, worker] = ...
      
    2. Generate a plot of the model with predicted lnenergy on the Y-axis and lnmass on the X-axis. Add a line and (population mean) confidence band for each caste.
    3. Interpret the beta coefficients and the sigma parameter .
    4. Generate the forecasted lnenergy and confidence interval for a lazy and worker mole rat that has the median lnmass.
    getvdata 18-e-4
    generate worker = 1*(caste == "worker")
    regress lnenergy worker lnmass i.worker#c.lnmass
    

This topic: Main > WebHome > Education > MSCI > MsciBiostatII > MsciBiostatIISchedule > MsciBiostatIIAssignments
Topic revision: revision 39
 
This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback