********************************** * Import data and only keep 1995 * ********************************** clear set memory 1g use "http://biostat.mc.vanderbilt.edu/wiki/pub/Main/CourseBios312/salary.dta" table year keep if year==95 ************************************************** * Convert string variables to numeric indicators * ************************************************** * Make a male indicator variable; Female is reference group gen male=. replace male=1 if sex=="M" replace male=0 if sex=="F" table sex male ************************************************* * Analysis of dichotomized salary: Male gender * ************************************************* summ salary, de gen salhigh = salary > 7600 * The 2x2 table analysis tabulate salhigh male, chi2 lrchi2 di 368*365 / (41*823) di (1/368 + 1/823 + 1/41 + 1/365)^.5 di 365/1188 di 41/409 * Results obtained using logistic regression logit salhigh male di exp(-2.194511) / (1 + exp(-2.194511)) di exp(-2.194511+1.381452) / (1 + exp(-2.194511+1.381452)) di 7.84^2 predict phat, pr list male phat if _n <= 5 lincom male, eform logistic salhigh male * Female gender gen female = abs(1-male) logit salhigh female *************************************************** * Analysis of dichotomized salary: Year of degree * *************************************************** tabstat salhigh, by(yrdeg) logit salhigh yrdeg lincom yrdeg, eform logistic salhigh yrdeg gen yrsince95 = 95-yrdeg logit salhigh yrsince95 lincom yrsince95, or predict phat2, pr gen odds2 = phat2 / (1 - phat2) gen lnodds2 = log(odds2) scatter lnodds2 yrsince95 scatter odds2 yrsince95 scatter phat2 yrsince95 logistic salhigh yrsince95 logit salhigh di (-794.28018 + 905.38906)*2 logit salhigh yrdeg estimates store model1 logit salhigh estimates store model2 lrtest model1 model2