log using SplinesWithInteraction.log, replace *SplinesWithInteraction.log * * This program illustrates how to fit a restricted cubic spline model * with interaction for two continuous variables and how to * plot the expected value of the response variable for specific values * of one of these variables. We use Stata's auto data set that is * available on the web. * set more on sysuse auto, clear * * Regress price agaist mpg with a 3-knot model * mkspline _Smpg = mpg, cubic nknots(3) displayknots regress price _S* predict yhat sort mpg * * Graph a scatter plot of price by mpg together with the regression line. * This model appears reasonable. * twoway (scatter price mpg) (line yhat mpg) /// , ytitle(Price) name(price_mpg, replace) more graph save price_mpg price_mpg, replace * * To get a feel for the data lets fit separate models of price against mpg * for cars above and below the median weight, respectively * centile weight, centile(25 50 75) drop _S* mkspline _Smpg = mpg if weight < 3190, cubic nknots(3) regress price _S* if weight < 3190 predict yhat_low drop _S* mkspline _Smpg = mpg if weight >=3190, cubic nknots(3) regress price _S* if weight >= 3190 predict yhat_high * * Plot these regression lines with color-coded observed values * twoway (scatter price mpg if weight < 3190, mcolor(blue) ) /// (scatter price mpg if weight >= 3190, msymbol(circle_hollow) mcolor(red) ) /// (line yhat_low mpg, lcolor(blue)) (line yhat_high mpg, lcolor(red)) /// , legend(order(1 "Weight < 3190 lb" 2 "Weight >= 3190 lb")) ytitle(Price) /// name(price_mpgByWeight, replace) /// note(Separate spline models fit to cars above and below the median weight) more graph save price_mpgByWeight price_mpgByWeight, replace drop _S* mkspline _Smpg = mpg, cubic nknots(3) displayknots * * Lets look at an additive model of weight and splined mpg. * regress price _Smpg* weight * * To investigate this model graphically it makes sense to graph it at different values * of weight -- say at the 25th and 75th percentile. But the following list command * indicates that we do not have any observations with these weights. * list if weight == 3617.5 | weight == 2237.5 * * To fix this we need to define these values in new records. We will leave price * missing so that if we re-run our models they will not be affected by these * extranious values of weight and bmi. * The following code creats 31 new records with weight = 3617.5 and 31 new records * with weight = 2237.5. for each value of weight the values of bmi run from 12 to 42. * display _N set obs 105 replace mpg = 11 + _n -74 if _n > 74 replace weight = 2237.5 if _n > 74 set obs 136 replace mpg = 11 + _n -105 if _n > 105 replace weight = 3617.5 if _n > 105 drop _S* * * In regenerating the spline covariates we use the previous knot values. The knot * values don't make too much difference, but we do not want them to be affected * by our new extranious values, which would be the case if we used the default * knot locations. * mkspline _Smpg = mpg, cubic knots( 14 20 29.5 ) * * The following list command shows the new values of weight mpg and the spline * covariates that we have added to the model. * list price weight mpg _S* if _n > 74 * * Rerun the additive model. Note that the extranious values have not changed anything * regress price _Smpg* weight drop yhat predict yhat * * Plot the expected price as a function of mpg for weights at the 25th and 75th * percentile. Our model requires that these curves be parrellel. * twoway (scatter price mpg if weight < 3190, mcolor(blue) ) /// (scatter price mpg if weight >= 3190, msymbol(circle_hollow) mcolor(red)) /// (line yhat mpg if weight ==2237.5 , lcolor(blue)) /// (line yhat mpg if weight ==3617.5 , lcolor(red)) /// , legend(order(1 "Weight < 3190 lb" 2 "Weight >= 3190 lb" /// 3 "Weight = 2237.5 lb" 4 "Weight = 3617.5" )) /// ytitle(Price) name(additive, replace) /// note(Additive model of the effects of weight and mpg on price) more graph save additive additive, replace * * Add interaction terms into the model. The simplest way to do this is as follows * regress price weight _Smpg* c.weight#c._Smpg1 c.weight#c._Smpg2 * * The following test does not reject the hypothesis that the interaction parameters * are simultaniously zero. * test c.weight#c._Smpg1 c.weight#c._Smpg2 * * If we don't want to use the c. notation we can fit the same model calculating the * interaction terms explicitly as follows * gen wtmpg1 = weight*_Smpg1 gen wtmpg2 = weight*_Smpg2 regress price weight _Smpg* wtmpg* drop yhat predict yhat * * Plot the expected price as a function of mpg for cars at the 25th and 75th percentile * by weight. * twoway (scatter price mpg if weight < 3190, mcolor(blue) ) /// (scatter price mpg if weight >= 3190, msymbol(circle_hollow) mcolor(red)) /// (line yhat mpg if weight ==2237.5 , lcolor(blue)) /// (line yhat mpg if weight ==3617.5 , lcolor(red)) /// , legend(order(1 "Weight < 3190 lb" 2 "Weight >= 3190 lb")) ytitle(Price) more * * The preceding graph extrapolates the price-mpg curve for heavy cars excessively * Lets plot more reasonable ranges * sum mpg if weight >= 3190 & _n <75 sum mpg if weight < 3190 & _n <75 twoway (scatter price mpg if weight < 3190, mcolor(blue)) /// (scatter price mpg if weight >= 3190, msymbol(circle_hollow) mcolor(red)) /// (line yhat mpg if weight ==2237.5 & mpg>=17 , lcolor(blue)) /// (line yhat mpg if weight ==3617.5 & mpg <=28 , lcolor(red)) /// , legend(order(1 "Weight < 3190 lb" 2 "Weight >= 3190 lb" /// 3 "Weight =2237.5" 4 "Weight = 3617.5" )) /// ytitle(Price) name(interaction, replace) /// note(Simple model with interaction terms) graph save interaction interaction, replace * * The preceding graph should be compared with the separate spline models fitted * to cars above and below the median weight. In this data set there is no * evidence to justify a model with interaction terms. That being said, the fit * of these two models is remarkably similar. log close