Problems will be added until the last class before the due date.
No electronic submission unless explicitly allowed.
Use your own words when answering questions. Copying from other sources (including textbooks, handouts, my blog) is strongly discouraged, because it often indicates you don't understand your answer.
Describe the difference between a continuous variable and a categorical variable, the advantages and/or disadvantages of categorizing a continuous variable.
Describe what sensitivity analysis is and what scenarios a sensitivity analysis can lead to.
Describe Simpson's paradox. A helpful reading is here.
Nashville's December 2005 daily mean temperatures had mean 37.7 degrees (Fahrenheit) and standard deviation 7.0 degrees. The formula between Fahrenheit and Celcius is F = C * 9/5 + 32. Now, do you have enough information to get the mean and SD in Celcius? If yes, what are these? If no, what else do you need?
A binary outcome can only take two possible values. Examples include coin flipping, sex of newborn babies, having a type of cancer or not, etc. We always can denote one outcome as "1" and the other as "0". Let the probability of having "1" be p. Then 0 < p < 1 and q = 1 - p is the probability of having "0". Suppose there are n outcomes. The number x of outcome "1" can vary from 0 to n, with varying probabilities. These possible outcomes together with their associated probabilities are called a binomial distribution. The parameter p can be estimated by x/n. The coefficient of variation of this estimator is √[(1 - p)/(np)] x 100%.
Calculate the CV for n = 10, 100, 1000 and p = .01, .05, .1, .3, .5.
Comment on how CV changes as n increases with p fixed and as p changes with n fixed.
Suppose you want to estimate a cancer rate with accuracy measured as CV < 10%. What sample size do you need if the real rate is about 10%? What sample size do you need if the real rate is about 1%?