version 9.2 log using C:\Teaching\IGP\log\12.diagnostics.log, replace * Chapter 12 Goodness of fit and regression diagnostics * modified on Feb. 11, 2007 by Leena Choi *************************************************************************************** * Section 12.2 *************************************************************************************** *** hemoglobin data in Table 3.2 use "C:\Teaching\IGP\data\haemoglobin.dta", clear * histogram with normal curve hist hemo, normal * inverse normal plot qnorm hemo * check skewness and kurtosis summarize hemo, detail * Shapiro-Wilk test swilk hemo *************************************************************************************** * Section 12.3 *************************************************************************************** use "C:\Teaching\IGP\data\EMS\perulung.dta", clear regress fev1 age height male * get residuals from the previous regression predict r, residual * histogram of residuals with normal curve hist r, normal * inverse normal plot of residuals qnorm r * Shapiro-Wilk test for residuals swilk r * Scatter plot of standardized residuals against fitted values predict yhat, xb scatter r yhat, yline(0) * You can draw the same plot without calculating residuals and fitted values using rvfplot command rvfplot, yline(0) * if you want to get stadardized residuals, use: predict rs, rstandard * Fig. 12.4, Table 12.2 use "C:\Teaching\IGP\data\cookD.dta", clear regress y x scatter y x || lfit y x predict r, residual * influence statistics: calculate Cook's distance (D) predict cook, cooksd * how big is big: Cook's D > 4/n list id y x cook r * sensitivity analysis regress y x regress y x if id!=10 log close