********************************************************************************** *** This program samples observations from FILEPATH and returns the mean and count *** of VAR for REPNUM samples of size SAMPSIZE ********************************************************************************** **** Change these to try on other datasets/variables/sample sizes/number of samples global FILEPATH "Y:\Slides\Courses\IntroBiostat\Datasets\EMS\perulung_ems.dta" global VAR "fev1" global SAMPSIZE 40 local REPNUM=1000 *** NO NEED TO CHANGE ANYTHING BELOW UNLESS YOU KNWO WHAT YOU ARE DOING *** *** Obtain the true population mean and SD use $FILEPATH, clear summarize $VAR local realmean=r(mean) local realsd=r(sd) *** Define program mymean capture program drop mymean program define mymean, rclass syntax [varname] use $FILEPATH, clear sample $SAMPSIZE, count return local varname 'varlist' quietly { summarize $VAR return scalar mean = r(mean) return scalar sd = r(sd) return scalar N = r(N) } end *** Start simulation *** simulate "mymean $VAR" mean=r(mean) sd=r(sd) N=r(N), reps(`REPNUM') dots *** Summarize simulation results *** label variable mean "Mean" label variable sd "SD" label variable N "Number of observations drawn for each sampling replication" histogram mean, frequency xlabel(1.2(.1)2) ylabel(#10) /// ytitle(, margin(small)) xtitle(, alignment(middle) margin(medsmall)) /// title(Samping Variation) normal name(Histogram, replace) summarize mean, detail *** 95% CIs *** gen lower=mean - invttail($SAMPSIZE-1, 0.025) * sd / sqrt($SAMPSIZE) gen upper=mean + invttail($SAMPSIZE-1, 0.025) * sd / sqrt($SAMPSIZE) scatter sd mean if (lower <= `realmean' & upper >= `realmean'), /// xline(`realmean') yline(`realsd') name(MEANvsSD, replace) /// legend(label(1 "inside") label(2 "outside")) || /// scatter sd mean if (lower > `realmean' | upper < `realmean'), mcolor(red) **********************************************************************************