Generating multiple regression models in a for
loop
Often, we wish to generate multiple regression models that are all similar, but all different. For example, all of the models have the same outcome and main covariate, but each has a different second covariate.
We could assign each individual model, copying and pasting the model expression and appropriately change the name of the second covariate, but this could take a lot of time and typing.
Instead, consider assigning the models using a
for
loop, where the loop iterates over the values of the second covariate.
For this example, let's use our old standby -- the
samplefile.txt
data file.
Let's first load the necessary packages, read in our data file, and make some changes to our data frame -- add some variable labels, level labels, and units.
library(Hmisc)
library(Design)
x<-read.table("samplefile.txt", header=TRUE)
x<-upData(x,
labels=c(age="Age", race="Race", sex="Sex",
weight="Weight", visits="No. of Visits",
tx="Treatment"),
levels=list(sex=c("Female", "Male"),
race=c("Black", "Caucasian", "Other"),
tx=c("Drug X", "Placebo")),
units=c(age="years", weight="lbs."))
contents(x)
Now let's create a binary outcome based on the number of visits --- we're interested in what predicts a patient will have more than 10 visits.
x$gt10visits <- factor(ifelse(x$visits <= 10, 0, 1),
labels = c("<= 10 visits", "> 10 visits"))
describe(x$gt10visits)
In this example, each model will have the same basic form:
lrm(gt10visits ~ tx + covariate2, data = x)
, where
covariate2
can have the value
age
,
race
,
sex
, and
weight
. We would also like to keep the model separate by incorporating the value of
covariate2
into the name.
We can do this using the following code:
varnames <- Cs(age, race, sex, weight)
for(i in varnames) {
modelformula <- paste("gt10visits ~ tx +", i)
assign(x = paste("m", i, sep = "."), value = lrm(as.formula(modelformula), data = x))
}
In this code, we supply the name of the second covariate as a text string (e.g.,
"age"
). Because of this, we can use the
paste()
function to past together our formula, which is in the form of a character string. Unfortunately, we can not supply this character representation into the
Design
package's
lrm()
function; it won't recognize it. Instead, we use the
as.formula()
function to coerce the pasted together formula character string (e.g.,
"gt10visits ~ tx + age"
) to a formula class, which the
lrm()
function will recognize.
Another problem with character strings arises when we try to assign the resulting model to a name. Normally, we assign a name to an object using the assignment operator
<-
. For example,
m1 <- lrm(gt10visits ~ tx + age, data = x)
. However, in this example, we want to incorporate the value of the second covariate into the name of the object (e.g.,
m.age
). We can generate these unique model names using
paste("m", i, sep = ".")
, but if you tried to evaluate the following code, you would get an error:
paste("m", i, sep = ".") <- lrm(as.formula(modelformula), data = x)
The reason you would get an error is because the assignment operator
<-
will not accept a character string on its left hand side. An alternative is to use the
assign()
function, which the assignment operator is a shortcut for. With the
assign()
function, the name of the object is given as a character string.
Now that we've figured out how to correctly generate and assign each of the models, let's think ahead to what we might want to do with them. Most likely, we'll want to generate a summary and/or plot of each model. Based on the
for
loop we used above, we now have N defined
lrm
objects (model fits), where N is the number of variables in
varnames
. We can see this using the
ls()
function. So, if we want to manipulate all of the defined
lrm
objects, we'll need to reference them all, most likely, in another
for
loop.
An alternative would be to generate an object that contains each of the model fits. In R, the easiest way to do this would be to generate a
list, where each component of the list would be one of the fitted models --- that is, we would have a list of lists (the result of the
lrm
function is a list). Having a list of lists would allow us to use one the
apply
functions to generate a summary and/or plot of each component of the list (i.e., each fitted model).
To do this, we can use the following code as an alternative to the code above:
varnames <- Cs(age, race, sex, weight)
modelfits <- vector(length(varnames), mode = "list")
names(modelfits) <- varnames
for(i in varnames) {
modelformula <- paste("gt10visits ~ tx +", i)
modelfitsi <- lrm(as.formula(modelformula), data = x)
}
In this code, we generate a list with the correct number of components with the
vector()
function. We then name each component of the list, so we can assign the corect model fit to each slot in the
for
loop. We can use the
names()
function and
str()
(structure) function to look at what
modelfits
contains --- BE AWARE, the
str()
function returns a lot of output.
names(modelfits)
str(modelfits"age")
str(modelfits)