********************************** * Import data and only keep 1995 * ********************************** clear set memory 1g use "http://biostat.mc.vanderbilt.edu/wiki/pub/Main/CourseBios312/salary.dta" table year keep if year==95 ************************************************** * Convert string variables to numeric indicators * ************************************************** * Make a male indicator variable; Female is reference group gen male=. replace male=1 if sex=="M" replace male=0 if sex=="F" table sex male ************************************ * Salary by Sex: Unadjusted models * ************************************ graph box salary, by(sex) regress salary male * Fitted values and residuals predict fitted, xb predict epsilon, resid list salary sex male fitted epsilon if _n <=5 sdtest epsilon, by(sex) * Residuals versus fitted (predicted) values rvfplot, jitter(20) graph box epsilon, by(sex) * Robust standard errors regress salary male regress salary male, robust ttest salary, by(male) unequal * Log-transformed salary gen lnsalary = log(salary) regress lnsalary male, robust lincom male, eform ******************************** * Salary versus year of degree * ******************************** scatter salary yrdeg || lfit salary yrdeg lowess salary yrdeg, bwidth(.2) addplot((lfit salary yrdeg)) pwcorr salary yrdeg, sig regress salary yrdeg * Fitted values and residuals predict fitted2, xb predict epsilon2, resid list salary yrdeg fitted2 epsilon2 if _n <=5 summ epsilon2 * Residuals versus fitted (predicted) values rvfplot, yline(10) * Residuals versus predictor yrdeg rvpplot yrdeg, yline(10) * Classical and regression with roubst standard errors regress salary yrdeg regress salary yrdeg, robust