LeenaStataNotes < Main < Vanderbilt Biostatistics Wiki

You are here: Vanderbilt Biostatistics Wiki>Main Web>Education>IntroBiostatCourse2007>LeenaStataNotes (23 Feb 2007, LeenaChoi) (raw view)EditAttach
<font size="3"><span style="font-family: times new roman,times,serif;">
---+++!! Stata Notes for Classes
   * Some of STATA codes for classes are listed.
   * [EMS] refers to [[http://www.blackwellpublishing.com/essentialmedstats][Essential Medical Statistics]].
   * Some of materials were copied/modified from course materials of Biostatistics, M.P.H. program at Vanderbilt University, and the course textbook, "Statistical Modeling for Biomedical Researchers", 2nd Ed., _in press_, by William Dupont. [WD] refers to William Dupont's book
<!---   * Notes for Biostatistics II, M.P.H. program, instructed by Patrick Arbogast --->
%TOC%

---++++ Some basics for Stata *%RED% need to know*  %BLACK% 
   * four windows: *Results* , *Command* , *Review* , *Variables*
   * command line interface
   * pulldown menus
   * log file: keep track what you are doing
      * go to menus, File --> Log --> Begin, save as .log
      * use icon in the tool bar
   * creat/open a dataset:
      * use =input x y= --> enter data --> =end=
      * open *Data Editor* --> enter data
      * use menus or =infile= command to import data file
   * explore the dataset: *Data Browser* and *Data Editor*
   * basic commands:
      * =list=, =codebook=, =describe=, =summarize=
      * =set memory=
   * graphs:
      * use menus
      * use commands: very good summary of Stata commands can be found in [WD]
   * exit Stata: menus --> File --> save or save as
   * getting help: syntax
      * example: graph box _fev1_, over(_respsymptoms_)
      * *qualifier* and *options* : there must be a comma between the last qualifier (_fev1_) and the first option (over(_respsymptoms_))
      * *command prefix* : precedes the command, separated from the main command by a colon, e.g. =by group: egen avg = mean(dbp)=
      * abbreviations: the minimum abbreviation is underlined in Stata reference manuals or Help
   * do file: rerun previous analyses
      * go to menus, File --> Do --> save as .do
      * use icon in the tool bar
      * save review contents as .do
---++++ [EMS] Chapter 3 Displaying the data
*Frequencies (categorical variables)*:  *%RED% need to know*  %BLACK%
   * Table 3.1 [[%ATTACHURL%/delivery.dta][STAT data format]] [[%ATTACHURL%/delivery.txt][ASCII format]]
   * label data: =label data "The method of delivery recoreded for 600 births in a hospital"=
   * make and delete notes: first note =notes: "Data from EMS Table. 3.1"= ; second notes =notes: edited on Jan. 15, 2007"=   
    <!-- delete notes =notes drop _dta= * notes on a variable: =notes delivery: "category of delivery method"= -->
   * define label: =label define deliverylab 1 "Normal" 2 "Forceps" 3 "Caesarean section"=
   * put label: =label values delivery deliverylab=
   * generate table: =tabulate delivery=
   * Fig. 3.1 Bar chart: =input Normal Forceps Caesarean=, =478 65 57=, =end=, =graph hbar Normal Forceps Caesarean= or =gen y = 1=, then =graph hbar (count) y, over(delivery)=
   * Fig. 3.2 Pie chart: =graph pie y, over(delivery)=
*Frequency distributions (numerical variables)*:
   * Table 3.2 [[%ATTACHURL%/haemoglobin.txt][ASCII format]]
   * =infile id hemo using "C:\Teaching\IGP\data\haemoglobin.txt", clear=
   * Table 3.2 (b):
      * =egen hemocat = cut(hemo), at(8, 9, 10, 11, 12, 13, 14, 15, 16)=, or =egen hemocat = cut(hemo), at(8(1)16)=
      * =tabulate hemocat=
      * =stem hemo, lines(1)=
   * Fig. 3.3 Histogram: =histogram hemo, width(1) start(8) frequency xtitle("Haemoglobin level (g/100ml)")= *%RED% need to know*  %BLACK%
*Shapes of frequency distributions* [[LeenaRNotes][R notes for classes]] 

*Cumulative frequency distributions, quantiles and percentiles*:  *%RED% need to know*  %BLACK%
   * Fig. 3.8 Boxplot: =graph box hemo=
   * =codebook hemo=, and =summarize hemo=
*Displaying the association between two variables*:  *%RED% need to know*  %BLACK%
   * Table 3.4 [[%ATTACHURL%/water.dta][STAT data format]]:  =tabulate village source [weight=freq]= , use option row, col
   * Fig. 3.9 - 3.12 Peru lung study data, which can be obtained [[http://www.blackwellpublishing.com/essentialmedstats/datasets.htm][EMS official web site]] under "perulung_ems".
   * Fig. 3.9 Scatter plots: =twoway (scatter fev1 age), ylabel(0(1)3) ytick(0 1 2 3) ymtick(0(0.5)3) ytitle("FEV1 (litres)")=
   * Fig. 3.10 Scatter plots: =twoway (scatter fev1 respsymptoms)=
   * Fig. 3.11 Scatter plots: =twoway (scatter fev1 respsymptoms, jitter(10))=
   * Fig. 3.12 Box and whiskers plots: =graph box fev1, over(respsymptoms)=
   * another way: use =dotplot fev1, over(respsymptoms) median center=
*Displaying time trends*:
   * Fig. 3.13 [[%ATTACHURL%/timetrend.dta][Time Trend data Stata format]] and  [[%ATTACHURL%/timetrend.xls][Time Trend data Excel format]]: bonus point for HW1

---++++ [EMS] Chapter 4 Means, standard deviations and standard errors
   * Calculating means and standard deviations: [[%ATTACHURL%/plasmaVolume.dta][Plasm Volume data]] 
<verbatim>
egen meanvol = mean(volume)
display meanvol
gen dev = volume - meanvol
gen dev2 = dev^2
gen vol2 = volume^2

egen volsum= total(volume)
egen vol2sum= total(vol2)

display vol2sum - volsum^2/8

egen dev2sum = total(dev2)

di _N
di dev2sum
di sqrt(dev2sum/(_N-1))

summarize volume

collapse (mean) mean_vol=volume (sd) sd_volume=volume
list mean_vol sd_volume
</verbatim>
   * Sampling variations and standard errors:
      * Example 4.4 [[LeenaRNotes][R notes for classes]]
      * %T% Read, Read and Read [EMS] page 41
---++++ [EMS] Chapter 5 The normal distribution
   * Normal distributions and standard normal distributions: [[LeenaRNotes][R notes for classes]]
   * Calculating area under the curve of the normal distribution and finding percentage points (z-score) of the normal distribution
<verbatim>
help density functions
* AUC of normal density function 
*find probability % below the specified z-score
di normal(1.31)
* AUC in upper tail of distribution
di 1-normal(1.31)
* AUC in lower tail of distribution
di 1-normal(1.77)
* AUC between two z values
di normal(0.54) - normal(-1)
* value corresponding to specified tail area
input mu sigma z
171.5 6.5 1.64
end
di mu + z*sigma
drop mu sigma z
* percentage points of normal density function (find z value corresponding %)
di invnormal(.95)
di invnormal(.975)
</verbatim>
---++++ [EMS] Chapter 6 Confidence interval for a mean
   * Section 6.2 Large sample case (normal distribution): Example 6.1 
<verbatim>
input mu sd n
24.2 5.9 100
end
*find 5% percent point
gen z = invnormal(.975)
gen se = sd/sqrt(n)
gen l_ci = mu - z*se
gen u_ci = mu + z*se
list
cii n mu sd
drop mu-u_ci
*find 10%, 1% percent point: invnormal(.95); invnormal(.995)
</verbatim>
   * Section 6.3 Interpretation of confidence interval: [[LeenaRNotes][R notes for classes]]
   * Section 6.4 Smaller samples:
      * [[LeenaRNotes][R notes for normal vs. _t_ distribution]]
      * Confidence interval using _t_ distributions:
<verbatim>
* n is d.f. in Stata invttail(n, p) command
drop n
gen n=7
gen t = invttail(n, .025)
gen se = sd/sqrt(n)
gen l_ci = mu - t*se
gen u_ci = mu + t*se
list
</verbatim>
---++++ [EMS] Chapter 7 Comparison of two means: confidence intervals, hypothesis tests and p-values 
   * Section 7.4 [[%ATTACHURL%/birthweight.dta][Table 7.2 data]]
   * Section 7.6 [[%ATTACHURL%/sleepingdrug.dta][Table 7.3 data]]
   * [[%ATTACHURL%/chapter7.two.sample.means.CI.do][Chapter 7 Stata do file]]
   * [[%ATTACHURL%/chapter7HW1help.do][HW1 Chapter 7 part help Stata do file]] *%RED% need to know*  %BLACK%
---++++ [EMS] Chapter 9 Analysis of variance
   * Section 9.2 [[%ATTACHURL%/hemoANOVA.dta][Table 9.1 data]]
   * [[%ATTACHURL%/chapter9anova.do][Chapter 9 ANOVA Stata do file]] *%RED% need to know*  %BLACK%
---++++ [EMS] Chapter 10 Linear regression and correlation
   * Section 10.2 [[%ATTACHURL%/plasmaVolume.dta][Table 10.1 data]]
   * [[%ATTACHURL%/chapter10simple.do][Chapter 10 Stata do file]] *%RED% need to know*  %BLACK%
---++++ [EMS] Chapter 11 Multiple regression
   * Use the following two data sets [[%ATTACHURL%/perulung.dta][Peru lung data]] and [[%ATTACHURL%/hemoANOVA.dta][Table 9.1 data]]
   * [[%ATTACHURL%/chapter11multiple.do][Chapter 11 Stata do file]] *%RED% need to know*  %BLACK%
---++++ [EMS] Chapter 12 Goodness of fit and regression diagnostics
   * Use the following two data sets [[%ATTACHURL%/haemoglobin.dta][Haemoglobin data in Table 3.2]] and [[%ATTACHURL%/cookD.dta][Table 12.2 data]]
   * [[%ATTACHURL%/chapter12diagnostics.do][Chapter 12 Stata do file]] *%RED% need to know*  %BLACK%
---++++ [EMS] Chapter 13 Transformation
   * Use the following data set [[%ATTACHURL%/betaTG.dta][beta-TG  data in Table 13.1]]
   * [[%ATTACHURL%/chapter13transformation.do][Chapter 13 Stata do file]] *%RED% need to know*  %BLACK%
---++++ [EMS] Chapter 16-17
   * Use the following data sets [[%ATTACHURL%/ex1chapter16.dta][Example 1]] and [[%ATTACHURL%/ex2chapter17.dta][Example 3]]
   * [[%ATTACHURL%/chapter16_17chisqaure.do][Chapter 16-17 Stata do file]] *%RED% need to know*  %BLACK%

</span></font>
Topic revision: r34 - 23 Feb 2007, LeenaChoi
Main
Department Home Page
Biostatistics Graduate Program
Vanderbilt University Medical Center
Biostatistics Webs
- Archive
- Main
- Sandbox
- System
Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback