# IGP 304: Statistics for Biomedical Research

Instructor: Frank Harrell
Course page: http://biostat.mc.vanderbilt.edu/StatBiomedRes

## Course Goals

1. understand basic concepts, ideas, and techniques often used in statistics, especially biostatistics;
2. develop appreciation of (i) variation, (ii) importance of design to the overall quality of a study, (iii) impact of assumptions on data analysis and interpretation, and (iv) artifacts and caveats in data analysis and interpretation;
3. carry out simple exploratory/graphical/formal/diagnostic analysis; and
4. know when and where to seek statisticians’ help.
5. not to emphasize software but rather concepts and interpretation of statistical results

Statistics is best learned by analyzing real data. Please bring your problems to the class. The problems can be issues in study design or data analysis or result interpretation, and they can be from your own research, papers you read, news reports, etc. Remove any subject identifiers before presenting to the class or sending to the instructors. See also DataTransmissionProcedures.

Your grade will be based on homework assignments, a mid-term exam, quizzes, a final project, and quality patricipation in class discussions. Before each lecture, read the relevant chapters in Rosner and other assigned readings, and come prepared to discuss.

## Other Useful Books (not required; asterisks ― highly recommended to read):

• Kirkwood BR, Sterne JAC (2003) Essential Medical Statistics, 2nd ed. Blackwell Publishers. \$58.95. ISBN: 0865428719. Corrections and four datasets are available at book web site http://www.blackwellpublishing.com/essentialmedstats.
• Altman DG (1990) Practical Statistics for Medical Research. Chapman & Hall/CRC. [A very good book on medical statistics. The second edition should come out soon.]
• Altman DG, Machin D, Bryant TN, Gardner MJ (2000) Statistics with Confidence, 2nd ed. Blackwell Publishers. [Short essays on the advantages of using confidence intervals. The book comes with software Confidence Interval Analysis (CIA).]
• Armitage P, Berry G, Matthews JNS (2001) Statistical Methods in Medical Research, 4th ed. Blackwell Publishers. [This book is quite comprehensive, covering more materials than a semester’s course. It may serve as a reference book, but definitely not a cookbook.]
• Bland M (2000) An Introduction to Medical Statistics, 3rd ed. Oxford University Press. [Another popular introductory book on medical statistics.]
• Motulsky H (1995) Intuitive Biostatistics. Oxford University Press. [This book covers basic materials in biostatistics and explains the basic concepts very well.]
• Rosner B (2005) Fundamentals of Biostatistics, 6th ed. Duxbury Press. [Old style of teaching statistics. Lots of examples from many medical fields.]
• *Freedman D, Pisani R, and Purves R (1997) Statistics, 3rd ed. W. W. Norton & Company. [This is a very good introduction to statistics, without being technical.]
• Moore DS, Notz WI (2005) Statistics: Concepts and Controversies, 6th ed. W. H. Freeman. [Another very good non-technical introduction to statistics.]

## Software

• You can use any software your want other than Excel. We will be demonstrating R and giving out R code.
• R (optional, free from http://www.r-project.org): Powerful, versatile, and actively maintained and updated. It may require a longer learning curve than Stata and SPSS, but the effort will pay off later on. To get a feel, look at one of the following: 1 (and try the commands in "A Sample Session“) 2 3. The Department of Biostatistics has a free R Clinic every Thursday. Print the R reference card to get a list of the most commonly used commands.
• Under R you can use a menu to install new packages. Install the `Rcmdr` package, which provides a simple SPSS-like menu system to interact with the R language. Load `Rcmdr` and the main menu will appear. Go here or links below for more information. The first time you load `Rcmdr` it will ask you if you want to download and install packages that `Rcmdr` depends on. Answer affirmatively, and specify `CRAN` as the source. This will take a few minutes but only needs to be done the first time you try to use `Rcmdr`.
• Installation and usage notes from Robert Schaefer
• `Rcmdr` installation instructions especially for Mac
• To fix a bug causing the boxplot menu to not appear, install the R package `aplpack`.
• To update `Rcmdr` to any version newer than what's on `CRAN` run `install.packages("Rcmdr",repos="http://R-Forge.R-project.org")` in the command window.
• Many users may want to use R through RExcel. Go to http://rcom.univie.ac.at and see the video demonstration. Download RAndFriends from that site to get R, the Excel setup, and R Commander. This assumes that you already have Excel installed and have installed any updates to your version of Excel that bring Excel's functionality up to date. See RExcelPackage for installation instructions.
• See SoftwareRecs for more software recommendations
• Chun Li's Stata notes, Need-to-know commands, and R notes
• Leena Choi's Stata notes for classes, Stata Lab info, and R notes for classes

### Comments on Other Software Packages

• Stata: Powerful and good graphics with an SPSS-like menu system. A good support site is at http://www.ats.ucla.edu/stat/stata. Cost is \$89 for a year and \$145 for life, through GradPlan. Buy Small Stata for \$45 if you have to pay by yourself. Stata also is available at the College of Arts & Science Microcomputer Labs.
• SAS: The oldest survivor, with strong legacy. Hard to learn and extend, with outdated structure and the worst graphics of any major package.
• SPSS: Have “standard” methods and good graphical user interface. However, it is difficult to extend beyond the “standard” methods.
• Honorable mention: Epi Info (free from CDC), S-Plus.
• There is a long list of reasons not to use Excel. See ExcelProblems