version 9.2 log using C:\Teaching\IGP\log\10.simple.log, replace * Chapter 10 Linear regression and correlation * modified on Feb. 04, 2007 by Leena Choi * data from [EMS] Table 10.1 use "C:\Teaching\IGP\data\plasmaVolume.dta", clear * put label label variable weight "Body weight (kg)" label variable volume "Plasma volume (liters)" * do exploratory analysis first * draw a scatter plot for volume vs. weight to check linear relationship scatter volume weight, xlabel(55(5)75) ylabel(2.5(.5)3.5) * make the ylable to be horizontal, and change the default xtitle scatter volume weight, xlabel(55(5)75) ylabel(2.5(.5)3.5, angle(0)) xtitle("x, Body weight (kg)") * regress volume on weight regress volume weight * centered weight egen weightavg = mean(weight) gen weightcen = weight - weightavg * regress volume on centered weight regress volume weightcen * compare with the mean of volume in the summary of the variables summarize weight volume * calculate F-value using the MS display .390684335/.047877614 * calculate r-squared using the SS : r-squared = SSmodel/SStotal display .390684335/.677950016 * get predicted values of volume predict yhat, xb * draw the scatter plot with the predicted line (two differenct ways making the same plots) scatter volume weight || line yhat weight, xlabel(55(5)75) ylabel(2.5(.5)3.5, angle(0)) twoway (scatter volume weight) (line yhat weight), xlabel(55(5)75) ylabel(2.5(.5)3.5, angle(0)) * The same graph can be generated without calculating yhat * use lfit instead of line scatter volume weight || lfit volume weight, xlabel(55(5)75) ylabel(2.5(.5)3.5, angle(0)) * add 95% confidence interval bands * use twoway lfitci * compare the following two plots twoway lfitci volume weight || scatter volume weight, xlabel(55(5)75) ylabel(2.5(.5)3.5, angle(0)) twoway scatter volume weight || lfitci volume weight, xlabel(55(5)75) ylabel(2.5(.5)3.5, angle(0)) * add 95% prediction interval bands for the response of a new subject * use twoway lfitci and option stdf twoway lfitci volume weight, stdf || scatter volume weight, xlabel(55(5)75) ylabel(2.5(.5)3.5, angle(0)) * ciplot(rline) cause this interval to be denoted by two lines (range lines) rather than a shaded region; * color(gray) gives an option for line color "gray", and lpattern(dash) gives an option for line type "dash" * legend(off) is an option letting the legend off twoway lfitci volume weight, stdf ciplot(rline) color(gray) lpattern(dash) /// || lfitci volume weight /// || scatter volume weight /// , xlabel(55(5)75) ylabel(2.5(.5)3.5, angle(0)) legend(off) log close