You are here:
Vanderbilt Biostatistics Wiki
>
Main Web
>
Seminars
>
RClinic
>
CategorizingContinuousCovariates
(15 Nov 2006,
TheresaScott
)
(raw view)
E
dit
A
ttach
---+ A plot that illustrates the effect of categorizing continuous covariates in a logistic regression model The idea behind this problem is the fact that fitting a categorized continuous variable in a logistic regression model can produce very misleading estimates when compared to fitting the original continuous variable. To illustrate this, a plot was generated that plotted the effects of a specific continuous variable from two separate logistic regresison models: (1) fitting the categorized version of the continuous variable in the model; and (2) keeping the continuous variable, continuous. Let's first read in the data set (=data.csv=). The data set is a simulated data set that was designed for our first Puzzler --- the data was simulated such that there is a nonlinear interaction between sex and age. * [[%ATTACHURL%/data.csv][data.csv]] <highlight> library(Design) x <- csv.get("data.csv", header=TRUE) </highlight> Now let's define the logistic regression model in which the continuous variables are kept continuous (and are fitted with restricted cubic splines). <highlight> ff <- lrm(dz ~ sex*rcs(age, 4) + rcs(sysbp, 4), data=x) anova(ff) # Total D.F. = 10 # sex*age interaction is significant (p < 0.05) </highlight> Now let's created categorized versions of our continuous variables in order to fit the second model. <highlight> x <- upData(x, agecut = cut(age, breaks=quantile(subset(age, dz==0), probs=c(0, 0.25, 0.50, 0.75, 1), na.rm=TRUE)), sysbpcut = cut(sysbp, breaks=quantile(subset(sysbp, dz==0), probs=c(0, 0.25, 0.50, 0.75, 1), na.rm=TRUE)) ) fcut <- lrm(dz ~ sex*agecut + sysbpcut, data=x) anova(fcut) # TOTAL D.F. = 10 # sex*age interaction no longer significant (p > 0.05) </highlight> Just to see the results of each model, let's plot the predicted effect of age across sex in each model. <highlight> dd <- datadist(x) options (datadist = 'dd') par(mfrow=c(1,2)) plot(ff, age=NA, sex=NA, conf.int=FALSE) plot(fcut, age=NA, sex=NA, conf.int=FALSE) </highlight> We can use the =Hmisc= package's =print.plot.Design()= function to return what was plotted: <highlight> print.plot.Design(plot(ff, age=NA, sex=NA, conf.int=FALSE)) # Adjusted to medians: sysbp = 120 print.plot.Design(plot(fcut, age=NA, sex=NA, conf.int=FALSE)) # Adjusted to most frequent level: sysbpcut=(117, 128] </highlight> It is important to make sure that both model are adjusted to the same things. For example, if =fcut= wasn't adjusted to the quantile that spanned 120, we would use: <highlight> par(mfrow=c(1,2)) plot(ff, age=NA, sex=NA, conf.int=FALSE) plot(fcut, age=NA, sex=NA, conf.int=FALSE, sysbpcut="(117,128]") </highlight> Now let's put the two plots together. <highlight> plff <- plot(ff, age=NA, sex=NA, conf.int=FALSE) plfcut <- plot(fcut, age=NA, sex=NA, conf.int=FALSE) par (mfrow =c(1,1), mar=c(5,5,4,2)+0.1, lwd=2) # First plot the spline regression results: age across sex plot(ff, age=NA, sex=NA, conf.int=FALSE, # ylim = range(c(range(plff$x.xbeta[["log odds"]]), # range(plfcut$x.xbeta[["log odds"]]))), ylim = c(-2, 2), label.curves=FALSE, lty=1, col=c("gray", "black")) # Now plot the categorical regression results: age across sex Fstep <- stepfun(x=with(x, quantile(subset(age, dz==0), prob=c(0.25, 0.50, 0.75), na.rm=TRUE)), y = subset(plfcut$x.xbeta, sex=="female")$"log odds") par(new=TRUE, col="gray", lty=2) lines(Fstep, xlim=c(0, 150)) Mstep <- stepfun(x=with(x, quantile(subset(age, dz==0), prob=c(0.25, 0.50, 0.75), na.rm=TRUE)), y = subset(plfcut$x.xbeta, sex=="male")$"log odds") par(new=TRUE, col="black", lty=2) lines(Mstep, xlim=c(0, 150)) par(new=TRUE, col="black", lty=1, lwd=1) legend(x=40, y=2, # y=max(c(range(plff$x.xbeta[["log odds"]]), # range(plfcut$x.xbeta[["log odds"]]))), xjust=0, yjust=1, legend = c("Female - Spline Regression", "Male - Spline Regression", "Female - Categorical Regression", "Male - Categorical Regression"), lty=rep(1:2, each=2), col = rep(c("gray", "black"), 2), lwd=2) </highlight>
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r2
<
r1
|
B
acklinks
|
V
iew topic
|
Edit
w
iki text
|
M
ore topic actions
Topic revision: r2 - 15 Nov 2006,
TheresaScott
Main
Department Home Page
Biostatistics Graduate Program
Vanderbilt University Medical Center
Main Web
Main Web Home
Search
Recent Changes
Changes
Topic list
Biostatistics Webs
Archive
Main
Sandbox
System
Register
|
Log In
Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki?
Send feedback