The purpose of this text is to enable biomedical researchers to use a number of advanced statistical methods that have proven valuable in medical research. The past thirty years have seen an explosive growth in the development of biostatistics. As with so many aspects of our world, this growth has been strongly influenced by the development of inexpensive, powerful computers and the sophisticated software that has been written to run them. This has allowed the development of computationally intensive methods that can effectively model complex biomedical data sets. It has also made it easy to explore these data sets, to discover how variables are interrelated and to select appropriate statistical models for analysis. Indeed, just as the microscope revealed new worlds to the eighteenth century, modern statistical software permits us to see interrelationships in large complex data sets that would have been missed in previous eras. Also, modern statistical software has made it vastly easier for investigators to perform their own statistical analyses. Although very sophisticated mathematics underlies modern statistics, it is not necessary to understand this mathematics to properly analyze your data with modern statistical software. What is necessary is to understand the assumptions required by each method, how to determine whether these assumptions are adequately met for your data, how to select the best model, and how to interpret the results of your analyses. The goal of this text is to allow investigators to effectively use some of the most valuable multivariate methods without requiring an understanding of more than high school algebra. Much mathematical detail is avoided by focusing on the use of a specific statistical software package.

This text grew out of my second semester course in biostatistics that I teach in our Masters of Public Health program at the Vanderbilt University Medical School. All of the students take introductory courses in biostatistics and epidemiology prior to mine. Although this text is self-contained, I strongly recommend that readers acquire good introductory texts in biostatistics and epidemiology as companions to this one. Many excellent texts are available on these topics. At Vanderbilt we are currently using Pagano and Gauvreau (2000) for biostatistics and Hennekens and Buring (1987) for epidemiology.

The statistical software used in this text is Stata (2001). It was chosen for the breadth and depth of its statistical methods, for its ease of use, and for its excellent documentation. There are several other excellent packages available on the market. However, the aim of this text is to teach biostatistics through a specific software package, and length restrictions make it impractical to use more than one package. If you have not yet invested a lot of time learning a different package, Stata is an excellent choice for you to consider. If you are already attached to a different package, you may still find it easier to learn Stata than to master or teach the material covered here from other textbooks.

The topics covered in this text are linear regression, logistic regression, Poisson regression, survival analysis, and analysis of variance. Each topic is covered in two chapters: one introduces the topic with simple univariate examples and the other covers more complex multivariate models. The text makes extensive use of a number of real data sets. They all may be downloaded from my web site. This site also contains complete log files of all analyses discussed in this text.

I would like to thank Gordon R. Bernard, Jeffrey Brent, Norman E. Breslow, Graeme Eisenhofer, Cary P. Gross, Daniel Levy, Steven M. Greenberg, Fritz F. Parl, Paul Sorlie, Wayne A. Ray, and Alastair J. J. Wood for allowing me to use their data to illustrate the methods described in this text. I am grateful to William Gould and the employees of Stata Corporation for publishing their elegant and powerful statistical software and for providing excellent documentation. I would also like to thank the students in our Master of Public Health program who have taken my course. Their energy, intelligence and enthusiasm have greatly enhanced my enjoyment in preparing this material. Their criticisms and suggestions have profoundly influenced this work. I am grateful to David L. Page, my friend and colleague of 24 years, with whom I have learnt much about the art of teaching epidemiology and biostatistics to clinicians. My appreciation goes to Sarah K. Meredith for introducing me to Cambridge University Press, to William Schaffner, my chairman, who encouraged and facilitated my spending the time needed to complete this work, to W. Dale Plummer for technical support, to Patrick G. Arbogast for proofreading the entire manuscript, and to my mother and sisters for their support during six critical months of this project. Finally, I am especially grateful to my wife and family for their love and support, and for their cheerful tolerance of the countless hours that I spent on this project. Lac des Seize Īles Quebec, Canada W.D.D. 2001
Topic revision: r1 - 21 May 2004, DalePlummer

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback