Homework 5

Problem 1

200 plants (tulipa armena) were randomly sampled to examine the association between stem length and flower color. The following data were collected:
  • 50 Short and Red
  • 50 Tall and Red
  • 40 Short and Yellow
  • 60 Tall and Yellow

  1. What is the probability of being
    • Yellow given short
    • Yellow given tall
    • Yellow
    • Short given yellow
  2. Test the null hypothesis that there is no association between flower color and stem length. Provide the name of the test used, the test statistic, degrees of freedom (if appropriate), p-value, and decision to reject or fail to reject the null hypothesis using a significance level of 0.05
  3. Calculate the odds ratio for a plant being red given short relative to red given tall. Also provide a 95% confidence interval for the odds ratio and an interpretation of your findings. What similar information is provide by the confidence interval for the odds ratio and the significance test conducted in (2)?
  4. If two additional flower colors are also observed (e.g. blue and green), what test would you use to determine if there is any association between color and stem length? What would be the degrees of freedom of this test?

Problem 2

Patient % reticulytes Lymphocytes
number   (per mm2)
1 2.6 1700
2 3.0 3078
3 1.3 1820
4 0.7 2706
5 0.3 2086
6 4.0 2299
7 0.2 676
8 1.5 2088
9 2.3 2013
  1. Create a scatter plot of percentage of reticulytes (x-axis) versus number of lymphocytes (y-axis)
  2. Fit a regression line relating the percentage of reticulytes (x) to the number of lymphocytes (y)
  3. Test for the statistical significance of this regression line using the F-test.
  4. What is $R^2$ for this problem and what is its interpretation?
  5. What is the value of $s^{2}_{y.x}$ and what is its interpretation?
  6. Test for the statistical significance of the regression line using the t test.
  7. What are the standard errors of the slope and intercept for the regression line?
  8. Obtain an approximate 0.95 confidence interval for the population slope, then obtain an exact confidence interval, both assuming normality and constant variance of the residuals.
  9. Estimate E(lymphocytes | % reticulytes=3) and compute 0.95 confidence intervals for this expected (mean) value
  10. Estimate the lymphocyte count for an individual with 3% reticulytes and compute 0.95 confidence limits corresponding to this individual's estimate
  11. Estimate the conditional $\sigma$ (the standard deviation of lymphocytes across patients with the same % reticulytes)

Problem 3

Name two substantially different statistical tests that would be useful for the each of the following hypotheses, assuming that needed assumptions hold.
  1. The population mean systolic blood pressure for treated and untreated patients is the same.
  2. The population mean systolic blood pressure for patients on placebo, drug A, and drug B are all equivalent.
  3. There is no association between systolic blood pressure and total serum cholesterol.

Problem 4

  1. Rosner 12.81
  2. Rosner 12.82

Problem 5

An investigator first computes the percent change in a measurement from baseline to steady state, for each animal. Then she computes the mean percent change over animals and used a two-sample t-test to compare mean percent changes in two groups, each containing 20 animals.
  1. Name at least three things the investigator did wrong.
  2. Write the strategy you would use for developing a good response variable, and specify an analysis to test for a difference in response between the two groups of animals.

Problem 6

Examine the following Figure in which 9 total mice were studied in 3 groups (3 animals per group). Identify as many mistakes as you can and indicate how you would correct each problem.

  • Figure for Problem 6: Ratio of gene expression by age with n = 3 different animals in each age group. * p<0.05 different from 2-day old using 2-sample t-test. fig1dyna.jpg

Notes

  • Using R-commander, the output from Fisher's exact test automatically provides an estimate of a type of odds ratio with a 95% confidence interval. However, just like Fisher's Test, this odds ratio is calculated by making the strange, restrictive assumption that the marginal totals are fixed. When calculating the odds ratio and confidence interval for Problem #1, question 3, do not use the odds ratio from Fisher's test. Instead calculate it by hand using formulas from your notes or in the Rosner text.

  • The file solutions contains information on how to solve question 2 for a different dataset. Note: The computer code provided in the solutions is meant just to show how to do certain "manual" calculations in R. You can do these calculations any way you want.

  • For problem 2, question 8, first think about how you would calculate the exact confidence interval. Then think about how you could approximate that interval. Your approximation should get closer to the exact confidence interval as n increases. The approximation involves not penalizing for having to estimate the slope and intercept from the data.
Topic revision: r3 - 04 May 2009, WikiGuest
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback