Comprehensive Introduction to Clinical Investigation
Biostatistics
Division of Biostatistics and Epidemiology
Department of Health Evaluation Sciences
924-8712
July 2001

Instructors:
Frank Harrell PhD
fharrell@virginia.edu
James Patrie MS
jpatrie@virginia.edu
Jennifer Gibson MS
jjgibson@virginia.edu
Mark Conaway PhD
mconaway@virginia.edu
Text:
Rosner, B. Fundamentals of Biostatistics, 5^th Edition. Pacific Grove, CA, 2000.
Articles:
Series of short articles about statistical concepts by JM Bland and DG Altman et al. appearing in British Medical Journal (provided in course pack).
Cohn V: A perspective from the press: how to help reporters tell the truth (sometimes). Stat in Med 2001; 20:1341-1346.
Matthews JNS, et al.: Analysis of serial measurements in medical research. British Med J 1990; 300:230-235.

1 Section Description

This section introduces biostatistical concepts that are useful in clinical research, including the following.

Experimental design
Various types of random variables
Data distributions and descriptive statistics
Graphical presentation of data and results
Probability
Data analysis for description, estimation, hypothesis testing, and prediction
Linear regression models
Dealing with repeated measurements in one patient and how to measure change
Avoiding pitfalls in interpreting statistical analyses

There are seven 3-hour sessions, some of which are split into multiple topics.

2 Session Format

Sessions will be a combination of lecture and discussion. Lectures will be informal; questions and discussion are encouraged at any time. Sessions will rely heavily on participants reading assigned sections of the text and several short articles in advance. Copies of slides will be made available to participants at the start of each session.

3 Assignments and Exercises

There will be a few short assignments and quizzes to stimulate discussion and to identify weak areas. These will not be graded.

4 Session Outlines

In what follows Rn refers to Chapter or Section n in Rosner, BAn refers to number n in the series of occasional notes on medical statistics by Bland and Altman, and ABn refers to an article in which Altman is the first author. MAn refers to articles written by Matthews and Altman.

4.1 Session One 3 July 2001

Presenter

: Frank Harrell

Topics and Readings

:
General Overview (R1)
Descriptive Statistics and Graphics (AB8, R2)

Objectives

: To

understand the role of biostatistics as a science, and biostatistical methods as tools of scientific inquiry
the meaning of description, estimation, hypothesis testing, and prediction
understand what is meant by random variable
know advantages of using continuous variables and of preserving their continuous nature in the analysis
understand distributions of random variables
know characteristics of distributions (central tendency, variance (variability, spread), quantiles or percentiles)
be able to choose graphs that are useful for depicting data distributions
be able to choose graphs that are useful for summarizing results of studies
be able to make informative tables

4.2 Session Two 5 July 2001

Presenter

: Jennifer Gibson

Topics and Readings

:
Probability (R3.1-3.6)
Estimation (R6.1-6.2,6.4-6.7.1)

Objectives

: To

understand the meaning of probability
understand what it means to say that two events are independent
be able to compute the probability of the union of two events
be able to compute the probability of the intersection of two independent events
understand conditional probability
know the meaning of population and a sample from that population
know how to estimate population quantities such as mean, median and other quantiles, and standard deviation from sample values
obtain an initial understanding of interval estimates and how to construct a confidence interval for the unknown mean of a normal-shaped population
understand how to estimate a population probability from a sample of events and non-events
know a simple approximate formula for a confidence interval for an unknown population probability
memorize and understand the 3/n rule

4.3 Session Three 10 July 2001

Presenter

: Frank Harrell

Topics and Readings

:
Hypothesis Testing: One-sample inference (R7(except 7.4.1,7.8,7.9.2,7.10),BA8,AB¹)
Two-sample inference (R8(except 8.6,8.7,8.9,8.11),MA25)

Objectives

: To

understand the fundamentals of hypothesis testing and assembling evidence using classical statistics
know the meanings of type I and II errors, P-values, and power
know the general structure of a t statistic in general
know one basis for estimating the required sample size
understand the construction and interpretation of a confidence interval for an unknown mean from a normal population
know the relationshop between confidence intervals and P-values
know how to carry out and interpret a one-sample t-test for paired (R8.2) or unpaired data from a normal distribution
understand how P-values are ``backwards'' and how to avoid errors in interpreting them
learn how to compute and interpret confidence intervals for the difference in two population means when the data are normal
understand the setup for a two-sample problem
be able to carry out a two-sample (unpaired) t-test for normally distributed data
be able to construct and interpret a confidence interval for the difference in two means
know how to compute power or the sample size to achieve a given power for comparing two means
know how to compute the sample size to achieve a given precision for estimating a probability, a mean, and a difference in two means
understand pitfalls in interpreting P-values

4.4 Session Four 11 July 2001

Presenter

: Frank Harrell

Topics and Readings

:
Comparing two proportions (R10.1-10.2,10.5.1)
Nonparametric methods (R9.1,9.3-9.6)
Hypothesis testing review (R7,R8)

Objectives

: To

learn how to do an approximate test for the difference in two proportions by hand
learn to use approximate methods for computing sample size or power for comparing two population probabilities
learn the advantages of nonparametric tests for continuous responses without assuming a distribution
understand the nonparametric counterpart of the one-sample t-test, the Wilcoxon signed-rank test
understand the nonparametric counterpart of the two-sample t-test, the Wilcoxon-Mann-Whitney two-sample rank-sum test
review ``big picture'' concepts of hypothesis testing and interval estimation

4.5 Session Five 17 July 2001

Presenter

: Jim Patrie

Topics and Readings

:
Regression and Correlation (R11.1-11.7,11.9-11.10)

Objectives

: To

understand in detail the simple linear regression model and how its slope and intercept are estimated
understand interval estimation of the slope and of a prediction
know the assumptions made by regression
understand multiple regression, especially interpreting regression coefficients and what it means to adjust for the effects of certain variables
know what the linear correlation coefficient measures
understand the correspondence between testing for nonzero correlation and testing for nonzero slope in simple regression
be able to interpret R²
know the assumptions made by standard linear multiple regression

4.6 Session Six 18 July 2001

Presenter

: Frank Harrell

Topics and Readings

:
Regression Review (R11)
Rank correlation (R11.12)
One-way analysis of variance and the Kruskal-Wallis test (R12.1,AB20,R12.7)
Heterogeneity of effects (BA23,AM24,MA25,MA26,R12.6)
Analysis of covariance (R12.5.3)
Multiple significance tests (BA10)

Objectives

: To

further understand the most important issues related to regression analysis, and hazards of multiple regression
know how to estimate the sample size needed to estimate a correlation coefficient to a certain precision
know the advantages of the nonparametric counterpart to the linear correlation coefficient and test
understand principles involved in comparing k groups using analysis of variance
know a method for pairwise comparisons of means
understand how the Kruskal-Wallis test generalizes the Wilcoxon test from 2 to k samples
understand advantages of the Kruskal-Wallis test over parametric analysis of variance
know when a two-way ANOVA is appropriate
be introduced to methods for assessing differential treatment effects
know the purpose of analysis of covariance
be introduced to methods (such as Bonferroni) for keeping the probability of a false positive result at an acceptable level when many hypotheses are tested

4.7 Session Seven 19 July 2001

Presenters

: Mark Conaway and Frank Harrell

Topics, Readings, and Presenter

:
Measuring change (Harrell) (TBD)
Repeated Measurements (Conaway) (BA1,BA12,BA13,Matthews et al.)
Experimental Design (Conaway)

Objectives

: To

know problems with percent change
understand one basis for choosing a measure of change
understand some of the most common experimental designs used in experiments to compare therapies
be introduced to factorial designs and their advantages and disadvantages
know why multiple measurements from the same patient cannot be analyzed as if they were measurements from separate patients
be introduced to simple methods for analyzing such serial data

1: ``Absence of evidence'' paper