Comprehensive Introduction to Clinical Investigation
Division of Biostatistics and Epidemiology
Department of Health Evaluation Sciences
July 2001
Frank Harrell PhD
James Patrie MS
Jennifer Gibson MS
Mark Conaway PhD
Rosner, B. Fundamentals of Biostatistics, 5th Edition. Pacific Grove, CA, 2000.
Series of short articles about statistical concepts by JM Bland and DG Altman et al. appearing in British Medical Journal (provided in course pack).
Cohn V: A perspective from the press: how to help reporters tell the truth (sometimes). Stat in Med 2001; 20:1341-1346.
Matthews JNS, et al.: Analysis of serial measurements in medical research. British Med J 1990; 300:230-235.

1   Section Description

This section introduces biostatistical concepts that are useful in clinical research, including the following. There are seven 3-hour sessions, some of which are split into multiple topics.

2   Session Format

Sessions will be a combination of lecture and discussion. Lectures will be informal; questions and discussion are encouraged at any time. Sessions will rely heavily on participants reading assigned sections of the text and several short articles in advance. Copies of slides will be made available to participants at the start of each session.

3   Assignments and Exercises

There will be a few short assignments and quizzes to stimulate discussion and to identify weak areas. These will not be graded.

4   Session Outlines

In what follows Rn refers to Chapter or Section n in Rosner, BAn refers to number n in the series of occasional notes on medical statistics by Bland and Altman, and ABn refers to an article in which Altman is the first author. MAn refers to articles written by Matthews and Altman.

4.1   Session One     3 July 2001

: Frank Harrell
Topics and Readings
General Overview (R1)
Descriptive Statistics and Graphics (AB8, R2)
: To
  1. understand the role of biostatistics as a science, and biostatistical methods as tools of scientific inquiry
  2. the meaning of description, estimation, hypothesis testing, and prediction
  3. understand what is meant by random variable
  4. know advantages of using continuous variables and of preserving their continuous nature in the analysis
  5. understand distributions of random variables
  6. know characteristics of distributions (central tendency, variance (variability, spread), quantiles or percentiles)
  7. be able to choose graphs that are useful for depicting data distributions
  8. be able to choose graphs that are useful for summarizing results of studies
  9. be able to make informative tables

4.2   Session Two     5 July 2001

: Jennifer Gibson
Topics and Readings
Probability (R3.1-3.6)
Estimation (R6.1-6.2,6.4-6.7.1)
: To
  1. understand the meaning of probability
  2. understand what it means to say that two events are independent
  3. be able to compute the probability of the union of two events
  4. be able to compute the probability of the intersection of two independent events
  5. understand conditional probability
  6. know the meaning of population and a sample from that population
  7. know how to estimate population quantities such as mean, median and other quantiles, and standard deviation from sample values
  8. obtain an initial understanding of interval estimates and how to construct a confidence interval for the unknown mean of a normal-shaped population
  9. understand how to estimate a population probability from a sample of events and non-events
  10. know a simple approximate formula for a confidence interval for an unknown population probability
  11. memorize and understand the 3/n rule

4.3   Session Three     10 July 2001

: Frank Harrell
Topics and Readings
Hypothesis Testing: One-sample inference (R7(except 7.4.1,7.8,7.9.2,7.10),BA8,AB1)
Two-sample inference (R8(except 8.6,8.7,8.9,8.11),MA25)
: To
  1. understand the fundamentals of hypothesis testing and assembling evidence using classical statistics
  2. know the meanings of type I and II errors, P-values, and power
  3. know the general structure of a t statistic in general
  4. know one basis for estimating the required sample size
  5. understand the construction and interpretation of a confidence interval for an unknown mean from a normal population
  6. know the relationshop between confidence intervals and P-values
  7. know how to carry out and interpret a one-sample t-test for paired (R8.2) or unpaired data from a normal distribution
  8. understand how P-values are ``backwards'' and how to avoid errors in interpreting them
  9. learn how to compute and interpret confidence intervals for the difference in two population means when the data are normal
  10. understand the setup for a two-sample problem
  11. be able to carry out a two-sample (unpaired) t-test for normally distributed data
  12. be able to construct and interpret a confidence interval for the difference in two means
  13. know how to compute power or the sample size to achieve a given power for comparing two means
  14. know how to compute the sample size to achieve a given precision for estimating a probability, a mean, and a difference in two means
  15. understand pitfalls in interpreting P-values

4.4   Session Four     11 July 2001

: Frank Harrell
Topics and Readings
Comparing two proportions (R10.1-10.2,10.5.1)
Nonparametric methods (R9.1,9.3-9.6)
Hypothesis testing review (R7,R8)
: To
  1. learn how to do an approximate test for the difference in two proportions by hand
  2. learn to use approximate methods for computing sample size or power for comparing two population probabilities
  3. learn the advantages of nonparametric tests for continuous responses without assuming a distribution
  4. understand the nonparametric counterpart of the one-sample t-test, the Wilcoxon signed-rank test
  5. understand the nonparametric counterpart of the two-sample t-test, the Wilcoxon-Mann-Whitney two-sample rank-sum test
  6. review ``big picture'' concepts of hypothesis testing and interval estimation

4.5   Session Five     17 July 2001

: Jim Patrie
Topics and Readings
Regression and Correlation (R11.1-11.7,11.9-11.10)
: To
  1. understand in detail the simple linear regression model and how its slope and intercept are estimated
  2. understand interval estimation of the slope and of a prediction
  3. know the assumptions made by regression
  4. understand multiple regression, especially interpreting regression coefficients and what it means to adjust for the effects of certain variables
  5. know what the linear correlation coefficient measures
  6. understand the correspondence between testing for nonzero correlation and testing for nonzero slope in simple regression
  7. be able to interpret R2
  8. know the assumptions made by standard linear multiple regression

4.6   Session Six     18 July 2001

: Frank Harrell
Topics and Readings
Regression Review (R11)
Rank correlation (R11.12)
One-way analysis of variance and the Kruskal-Wallis test (R12.1,AB20,R12.7)
Heterogeneity of effects (BA23,AM24,MA25,MA26,R12.6)
Analysis of covariance (R12.5.3)
Multiple significance tests (BA10)
: To
  1. further understand the most important issues related to regression analysis, and hazards of multiple regression
  2. know how to estimate the sample size needed to estimate a correlation coefficient to a certain precision
  3. know the advantages of the nonparametric counterpart to the linear correlation coefficient and test
  4. understand principles involved in comparing k groups using analysis of variance
  5. know a method for pairwise comparisons of means
  6. understand how the Kruskal-Wallis test generalizes the Wilcoxon test from 2 to k samples
  7. understand advantages of the Kruskal-Wallis test over parametric analysis of variance
  8. know when a two-way ANOVA is appropriate
  9. be introduced to methods for assessing differential treatment effects
  10. know the purpose of analysis of covariance
  11. be introduced to methods (such as Bonferroni) for keeping the probability of a false positive result at an acceptable level when many hypotheses are tested

4.7   Session Seven     19 July 2001

: Mark Conaway and Frank Harrell
Topics, Readings, and Presenter
Measuring change (Harrell) (TBD)
Repeated Measurements (Conaway) (BA1,BA12,BA13,Matthews et al.)
Experimental Design (Conaway)
: To
  1. know problems with percent change
  2. understand one basis for choosing a measure of change
  3. understand some of the most common experimental designs used in experiments to compare therapies
  4. be introduced to factorial designs and their advantages and disadvantages
  5. know why multiple measurements from the same patient cannot be analyzed as if they were measurements from separate patients
  6. be introduced to simple methods for analyzing such serial data

``Absence of evidence'' paper