You are here:
Vanderbilt Biostatistics Wiki
>
Main Web
>
Education
>
IntroBiostatCourse2007
>
LeenaStataNotes
(23 Feb 2007,
LeenaChoi
)
(raw view)
E
dit
A
ttach
<font size="3"><span style="font-family: times new roman,times,serif;"> ---+++!! Stata Notes for Classes * Some of STATA codes for classes are listed. * [EMS] refers to [[http://www.blackwellpublishing.com/essentialmedstats][Essential Medical Statistics]]. * Some of materials were copied/modified from course materials of Biostatistics, M.P.H. program at Vanderbilt University, and the course textbook, "Statistical Modeling for Biomedical Researchers", 2nd Ed., _in press_, by William Dupont. [WD] refers to William Dupont's book <!--- * Notes for Biostatistics II, M.P.H. program, instructed by Patrick Arbogast ---> %TOC% ---++++ Some basics for Stata *%RED% need to know* %BLACK% * four windows: *Results* , *Command* , *Review* , *Variables* * command line interface * pulldown menus * log file: keep track what you are doing * go to menus, File --> Log --> Begin, save as .log * use icon in the tool bar * creat/open a dataset: * use =input x y= --> enter data --> =end= * open *Data Editor* --> enter data * use menus or =infile= command to import data file * explore the dataset: *Data Browser* and *Data Editor* * basic commands: * =list=, =codebook=, =describe=, =summarize= * =set memory= * graphs: * use menus * use commands: very good summary of Stata commands can be found in [WD] * exit Stata: menus --> File --> save or save as * getting help: syntax * example: graph box _fev1_, over(_respsymptoms_) * *qualifier* and *options* : there must be a comma between the last qualifier (_fev1_) and the first option (over(_respsymptoms_)) * *command prefix* : precedes the command, separated from the main command by a colon, e.g. =by group: egen avg = mean(dbp)= * abbreviations: the minimum abbreviation is underlined in Stata reference manuals or Help * do file: rerun previous analyses * go to menus, File --> Do --> save as .do * use icon in the tool bar * save review contents as .do ---++++ [EMS] Chapter 3 Displaying the data *Frequencies (categorical variables)*: *%RED% need to know* %BLACK% * Table 3.1 [[%ATTACHURL%/delivery.dta][STAT data format]] [[%ATTACHURL%/delivery.txt][ASCII format]] * label data: =label data "The method of delivery recoreded for 600 births in a hospital"= * make and delete notes: first note =notes: "Data from EMS Table. 3.1"= ; second notes =notes: edited on Jan. 15, 2007"= <!-- delete notes =notes drop _dta= * notes on a variable: =notes delivery: "category of delivery method"= --> * define label: =label define deliverylab 1 "Normal" 2 "Forceps" 3 "Caesarean section"= * put label: =label values delivery deliverylab= * generate table: =tabulate delivery= * Fig. 3.1 Bar chart: =input Normal Forceps Caesarean=, =478 65 57=, =end=, =graph hbar Normal Forceps Caesarean= or =gen y = 1=, then =graph hbar (count) y, over(delivery)= * Fig. 3.2 Pie chart: =graph pie y, over(delivery)= *Frequency distributions (numerical variables)*: * Table 3.2 [[%ATTACHURL%/haemoglobin.txt][ASCII format]] * =infile id hemo using "C:\Teaching\IGP\data\haemoglobin.txt", clear= * Table 3.2 (b): * =egen hemocat = cut(hemo), at(8, 9, 10, 11, 12, 13, 14, 15, 16)=, or =egen hemocat = cut(hemo), at(8(1)16)= * =tabulate hemocat= * =stem hemo, lines(1)= * Fig. 3.3 Histogram: =histogram hemo, width(1) start(8) frequency xtitle("Haemoglobin level (g/100ml)")= *%RED% need to know* %BLACK% *Shapes of frequency distributions* [[LeenaRNotes][R notes for classes]] *Cumulative frequency distributions, quantiles and percentiles*: *%RED% need to know* %BLACK% * Fig. 3.8 Boxplot: =graph box hemo= * =codebook hemo=, and =summarize hemo= *Displaying the association between two variables*: *%RED% need to know* %BLACK% * Table 3.4 [[%ATTACHURL%/water.dta][STAT data format]]: =tabulate village source [weight=freq]= , use option row, col * Fig. 3.9 - 3.12 Peru lung study data, which can be obtained [[http://www.blackwellpublishing.com/essentialmedstats/datasets.htm][EMS official web site]] under "perulung_ems". * Fig. 3.9 Scatter plots: =twoway (scatter fev1 age), ylabel(0(1)3) ytick(0 1 2 3) ymtick(0(0.5)3) ytitle("FEV1 (litres)")= * Fig. 3.10 Scatter plots: =twoway (scatter fev1 respsymptoms)= * Fig. 3.11 Scatter plots: =twoway (scatter fev1 respsymptoms, jitter(10))= * Fig. 3.12 Box and whiskers plots: =graph box fev1, over(respsymptoms)= * another way: use =dotplot fev1, over(respsymptoms) median center= *Displaying time trends*: * Fig. 3.13 [[%ATTACHURL%/timetrend.dta][Time Trend data Stata format]] and [[%ATTACHURL%/timetrend.xls][Time Trend data Excel format]]: bonus point for HW1 ---++++ [EMS] Chapter 4 Means, standard deviations and standard errors * Calculating means and standard deviations: [[%ATTACHURL%/plasmaVolume.dta][Plasm Volume data]] <verbatim> egen meanvol = mean(volume) display meanvol gen dev = volume - meanvol gen dev2 = dev^2 gen vol2 = volume^2 egen volsum= total(volume) egen vol2sum= total(vol2) display vol2sum - volsum^2/8 egen dev2sum = total(dev2) di _N di dev2sum di sqrt(dev2sum/(_N-1)) summarize volume collapse (mean) mean_vol=volume (sd) sd_volume=volume list mean_vol sd_volume </verbatim> * Sampling variations and standard errors: * Example 4.4 [[LeenaRNotes][R notes for classes]] * %T% Read, Read and Read [EMS] page 41 ---++++ [EMS] Chapter 5 The normal distribution * Normal distributions and standard normal distributions: [[LeenaRNotes][R notes for classes]] * Calculating area under the curve of the normal distribution and finding percentage points (z-score) of the normal distribution <verbatim> help density functions * AUC of normal density function *find probability % below the specified z-score di normal(1.31) * AUC in upper tail of distribution di 1-normal(1.31) * AUC in lower tail of distribution di 1-normal(1.77) * AUC between two z values di normal(0.54) - normal(-1) * value corresponding to specified tail area input mu sigma z 171.5 6.5 1.64 end di mu + z*sigma drop mu sigma z * percentage points of normal density function (find z value corresponding %) di invnormal(.95) di invnormal(.975) </verbatim> ---++++ [EMS] Chapter 6 Confidence interval for a mean * Section 6.2 Large sample case (normal distribution): Example 6.1 <verbatim> input mu sd n 24.2 5.9 100 end *find 5% percent point gen z = invnormal(.975) gen se = sd/sqrt(n) gen l_ci = mu - z*se gen u_ci = mu + z*se list cii n mu sd drop mu-u_ci *find 10%, 1% percent point: invnormal(.95); invnormal(.995) </verbatim> * Section 6.3 Interpretation of confidence interval: [[LeenaRNotes][R notes for classes]] * Section 6.4 Smaller samples: * [[LeenaRNotes][R notes for normal vs. _t_ distribution]] * Confidence interval using _t_ distributions: <verbatim> * n is d.f. in Stata invttail(n, p) command drop n gen n=7 gen t = invttail(n, .025) gen se = sd/sqrt(n) gen l_ci = mu - t*se gen u_ci = mu + t*se list </verbatim> ---++++ [EMS] Chapter 7 Comparison of two means: confidence intervals, hypothesis tests and p-values * Section 7.4 [[%ATTACHURL%/birthweight.dta][Table 7.2 data]] * Section 7.6 [[%ATTACHURL%/sleepingdrug.dta][Table 7.3 data]] * [[%ATTACHURL%/chapter7.two.sample.means.CI.do][Chapter 7 Stata do file]] * [[%ATTACHURL%/chapter7HW1help.do][HW1 Chapter 7 part help Stata do file]] *%RED% need to know* %BLACK% ---++++ [EMS] Chapter 9 Analysis of variance * Section 9.2 [[%ATTACHURL%/hemoANOVA.dta][Table 9.1 data]] * [[%ATTACHURL%/chapter9anova.do][Chapter 9 ANOVA Stata do file]] *%RED% need to know* %BLACK% ---++++ [EMS] Chapter 10 Linear regression and correlation * Section 10.2 [[%ATTACHURL%/plasmaVolume.dta][Table 10.1 data]] * [[%ATTACHURL%/chapter10simple.do][Chapter 10 Stata do file]] *%RED% need to know* %BLACK% ---++++ [EMS] Chapter 11 Multiple regression * Use the following two data sets [[%ATTACHURL%/perulung.dta][Peru lung data]] and [[%ATTACHURL%/hemoANOVA.dta][Table 9.1 data]] * [[%ATTACHURL%/chapter11multiple.do][Chapter 11 Stata do file]] *%RED% need to know* %BLACK% ---++++ [EMS] Chapter 12 Goodness of fit and regression diagnostics * Use the following two data sets [[%ATTACHURL%/haemoglobin.dta][Haemoglobin data in Table 3.2]] and [[%ATTACHURL%/cookD.dta][Table 12.2 data]] * [[%ATTACHURL%/chapter12diagnostics.do][Chapter 12 Stata do file]] *%RED% need to know* %BLACK% ---++++ [EMS] Chapter 13 Transformation * Use the following data set [[%ATTACHURL%/betaTG.dta][beta-TG data in Table 13.1]] * [[%ATTACHURL%/chapter13transformation.do][Chapter 13 Stata do file]] *%RED% need to know* %BLACK% ---++++ [EMS] Chapter 16-17 * Use the following data sets [[%ATTACHURL%/ex1chapter16.dta][Example 1]] and [[%ATTACHURL%/ex2chapter17.dta][Example 3]] * [[%ATTACHURL%/chapter16_17chisqaure.do][Chapter 16-17 Stata do file]] *%RED% need to know* %BLACK% </span></font>
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r34
<
r33
<
r32
<
r31
|
B
acklinks
|
V
iew topic
|
Edit
w
iki text
|
M
ore topic actions
Topic revision: r34 - 23 Feb 2007,
LeenaChoi
Main
Department Home Page
Biostatistics Graduate Program
Vanderbilt University Medical Center
Main Web
Main Web Home
Search
Recent Changes
Changes
Topic list
Biostatistics Webs
Archive
Main
Sandbox
System
Register
|
Log In
Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki?
Send feedback