General Instructions

For this homework assignment, and all future homework assignments, provide only the results requested. Statistical programs are notorious for generating lots of output (some are worse than others), but it is up to the researcher to decide which results are useful in their analysis. If computer output is requested, it should be formatted and interpreted, not simply cut-and-pasted from the statistics program.

Questions

  1. Using the FEV dataset from the course web page or from the Rosner CD
    1. Summarize the variables age, fev, height, sex and smoke by choosing appropriate numerical summary statistics (don't report all possible statistics). Present your summaries in a small table suitable for submission to a scientific journal. Include a brief, one-paragraph narrative describing the five variables in your table.
    2. Create a box plot of fev by sex. Give the x-axis and y-axis approriate labels. Include this plot in your homework.
    3. Using the box plot, describe in words the distribution of fev for females and males. Do the females or males have a larger median FEV? Do males or females have a larger inner-quartile range? Could the spread of fev in males be adequately described using the standard deviation? Why or why not?
    4. Calculate the (a) standard deviation and (b) standard error of the mean of fev for current smokers and non-current smokers. Explain why non-current smokers have a much lower standard error of the mean than current smokers in spite of the fact that the standard deviation (or variance) is larger among non-current smokers.
  2. Name six (6) or more problems with the "spreadsheet from hell."
  3. Suppose you were interested in summarizing the income of the residents of Bellevue, Washington (home of Bill Gates) or Omaha, Nebraska (home of Warren Buffett)
    1. Which would be a more appropriate measure of central tendency, the mean income or the median income? Why?
    2. What statistic(s) would you use to describe the spread of the income distribution? Why?
  4. An investigator examining the relationship of litter size with genetics and a treatment drug asked the following question:
    • "Please clarify a statistic principle for me: I thought that units of defined counting should be cited as medians. In other words, a mouse cannot have 3.6 pups, even though that is the average litter size."
    • Write a response to the investigator explaining why the mean is the appropriate summary measure.
  5. Rosner 2.1, 2.2, and 2.3. For question 2.2, also compute the inter quartile range (IQR) and briefly indicate why the IQR is a better measure of spread than the range
  6. Rosner 2.35, 2.36, 2.37

Optional questions (ungraded, do not turn in)

  1. Rosner 2.4-2.7
  2. Rosner 2.12-2.18
  3. Rosner 2.31, 2.32

Notes

  • An R dataset and a .CSV file are available for download from the course web page. The R dataset can be loaded using the "Data... Load data" set menus in R commander, or the .CSV file can be imported as was shown in class. Alternatively import the SPSS version of the dataset on the Rosner CD.
  • For question 1.1, you may wish to use histograms, box plots, or dot plots (aka strip charts or strip plots) to decide which numerical summaries are appropriate, but do not include these plots in your homework.
Topic revision: r10 - 29 Jan 2009, ChrisSlaughter
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback