Final Exam


  • This exam is to be taken under conditions given by the Vanderbilt honor code. Do not ask for help for questions from anyone except the instructors. We will reply through the discussion board so everyone has the opportunity to benefit equally from our responses.
  • Include the statement of the Vanderbilt honor code on your work: "I pledge on my honor that I have neither given nor received unauthorized aid on this assignment."
  • Exams are due by Monday, 4/28, at 9 AM either by email (to Chris; or directly handed to me in my office (T-2319, MCN). I will be in my office from 7 to 9, but I have a meeting at 9:05 on Monday and Frank will be out of town, so exams may not be turned in late.

Problem 1

Consider the following general research question: Does increasing fiber consumption have health benefits?
  1. Refine this general research question into a specific question you believe would be feasible, interesting, ethical, and relevant to study. Write the question in a single sentence that describes the predictor, outcomes, and population you want to study.
  2. Describe the optimal study design for answering your specific research question.
  3. Explain why your chosen design is better that other designs. Make specific references to at least one other potential study design.
  4. Describe the type of preliminary information you would need to collect in order to perform a power/sample size calculation for your study. How would you get this information?

Note that there are many possible answers to each of these questions. More credit will be given for answers that demonstrate a better understanding of study design issues discussed in the course.

Problem 2

  1. Show that if: $logit(\pi) = X\beta$ then $\pi = \frac{1}{1 + \exp(-X\beta)}$
  2. Why do we use the logit transformation to model binary outcomes?

Problem 3

An investigator first computes the percent change in a measurement from baseline to steady state, for each animal. Then she computes the mean percent change over animals and used a two-sample t-test to compare mean percent changes in two groups, each containing 20 animals.
  1. Name at least three things the investigator did wrong.
  2. Write the strategy you would use for developing a good response variable, and specify an analysis to test for a difference in response between the two groups of animals.

Problem 4

Use the dataset to answer the following questions

  1. Perform a descriptive analysis of the variables in the raw dataset provided. Turn in one table, one figure, and an accompanying paragraph that describes the dataset. Identify any extreme outlying values that you feel should be removed from subsequent analyses.
  2. Is there a statistically significant difference in the tumor rate between knockout and wild-type mice? Provide a one-sentence answer that includes a P Value, the test used, and the tumor percentage for each group. Note: wild=1 for wild-type and wild=0 for knockout; tumor 0=none, 1=one or more.
  3. Is there a statistically significant difference between the baseline size of those randomized to Drug A and those randomized to Drug B? Provide 95% confidence intervals and a P value. Note SIZE1=baseline, Drug A=1, B=2.
    • Hint: You may find that rearranging the dataset will be easier to analyze in your software package. See where SIZEA = Baseline size, drug A and SIZEB = baseline size, Drug B. Similar rearrangements may be helpful on other questions.
  4. Is there a significant change in the size from baseline to the size at 30 days after baseline in this study? Note: size1=baseline, size2=at 30 days. Provide 95% CI for the change in size, a P value, and the test used.
  5. Is the LOS statistically different between wild-type and knockout mice? Provide the median for each group, a P value, and the test used. Note: LOS=a continuous marker of disease severity
  6. What is the odds ratio of wild-type vs. knockout developing a tumor? Provide a P value and 95% confidence interval for the odds ratio.
  7. Is there a statistically significant association between age and baseline size (size1)? Provide a P value and describe the test used.
  8. Based on these preliminary results, a larger study is being planned. A new drug has been developed that is thought to be able to have 50% fewer tumors than Drug A had in this study. How many mice would be required in each of 2 groups of a randomized trial to have 90% power of detecting a statistically significant difference at the 0.05 level assuming one group was expected to have the percent of tumors found with Drug A in this study and the other drug is expected to have half as many complications? Provide sufficient information in your answer so that a biostatistician could reproduce your answer.

Problem 5

An experimenter who believes in the power of biostatistics carefully designs an experiment, avoiding bias. Thirty opossums specially bred to be well-behaved were treated in accordance to guidelines of the National Opossum Society ( and were randomized into three groups: A,B,C (n=10 per group). The design called for the animals' systolic blood pressure (SBP) to be measured at baseline and weeks 1, 2, and 4. All of the animals' SBPs were measured at baseline (pre-intervention), but despite the best intentions of the investigator, not all animals were measured at all three post-randomization time points. Some animals had SBP not measured at 4 weeks and some of them had SBP not measured at 2 weeks. No animals had missing SBP at more than one time. The missing values are due to purely technical problems, i.e., the automated blood pressure monitor malfunctioned on certain days.

The experimenter is interested in estimating and testing two quantities of interest: (1) differences in the rate of change of SBP over time across groups A,B,C and (2) differences in week 4 SBP across A,B,C. The experimenter and her statistician colleague could not decide whether the baseline SBP should be treated as a baseline covariate or used as a part of the response variable but at time zero. The statistician pointed out that the baseline covariate approach is usually better but that approach would involve using a specialized regression model whose assumptions are difficult to check in small samples. For that reason, a simple summary measure approach that results in a single statistical test for differences among the 3 groups is sought.
  1. Design a graphic that would be useful for checking the assumptions of either of the analytic approaches (rate of change or week 4 response). Specify what to look for in the plot.
  2. Design a valid and fairly powerful analysis for the rate of change question and two such analyses for the week 4 question. State an assumption that was required for a valid analysis. Include in your answer how you would handle missing data.
Topic attachments
I Attachment Action Size Date Who Comment manage 113.0 K 18 Apr 2008 - 17:07 ChrisSlaughter IGP Final exam full dataset manage 117.0 K 18 Apr 2008 - 17:08 ChrisSlaughter IGP final exam dataset of baseline size on drugs A and B
Topic revision: r6 - 27 Aug 2009, ChrisSlaughter

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback