Survival analysis with the survival
and Design
packages
The following are some examples of how to perform survival analysis using the
survival
and
Design
packages.
The survival
package
Let's go through some examples using the
survival
package. Therefore, let's
load the necessary packages using the
library()
function -- make sure all of the packages have already been
installed. The
ISwR
package contains the data sets and R code for text examples and exercises in Peter Dalgaard's '_Introductory Statistics with R_' book. We need to load the
ISwR
package because we'll be using the
melanom
built-in data set in all of the examples, which we load with the
data()
function. We'll also load the
Hmisc
package, which contains some useful functions (e.g., the
contents()
and
upData()
functions) that we'll want to use.
library(ISwR)
library(survival)
library(Hmisc)
data(melanom)
The
melanom
data frame contains the following columns:
-
no
: patient ID
-
status
: survival status (1
: dead from melanoma, 2
: alive, 3
: dead from other cause)
-
days
: observation time in days
-
ulc
: ulceration (1
: present, 2
: absent)
-
thick
: tumor thickness (1/100 mm).
-
sex
: gender (1
: female, 2
: male)
Let's update the
melanom
data frame by adding appropriate labels and units of measurements to the variables. Let's also modify the categorical variables currently coded with numeric values to be
factors with character
level labels.
REMEMBER, order matters when defining the level labels.
melanom <- upData(melanom,
labels=c(status="Patient's Status",
days="Observation time",
ulc="Tumor Ulceration",
thick="Tumor Thickness",
sex="Gender"),
levels=list(status=c("Dead from Malignant Melanoma",
"Alive on Jan 1, 1978", "Dead from Other Causes"),
ulc=c("Present", "Absent"), sex=c("Female", "Male")),
units=c(days="days", thick="1/100 mm"))
contents(melanom)
Now let's generate Kaplan-Meier (K-M) estimates from our
melanom
data frame. The
Surv()
function defines our survival outcome, based on the time-to-event and the event of interest (in this case, dead from malignant melanoma). We can generate overall K-M estimates, and estimates across gender.
# Overall estimates
surv.all <- with(melanom, survfit(Surv(days, status=="Dead from Malignant Melanoma")))
summary(surv.all)
summary(surv.all, censored=TRUE)
plot(surv.all, xlab = "Days", ylab = "Survival Probability",
main = "K-M plot for melanoma data")
# Estimates across gender
surv.bysex <- with(melanom,
survfit(Surv(days, status=="Dead from Malignant Melanoma") ~ sex))
summary(surv.bysex)
summary(surv.bysex, censored=TRUE)
plot(surv.bysex)
plot(surv.bysex, conf.int=TRUE, col=c("black", "gray"),
xlab = "Days", ylab = "Survival Probability",
main = "K-M plot for melanoma data, grouped by gender")
legend(250, 0, lty = 1, col = c("black", "gray"), legend = levels(melanom$gender),
xjust = 0, yjust = 0)
Notice that tick-marks are shown on the K-M plots, which represent
censored observation times. The bands give approximate confidence intervals. Also, you noticed that, by default, the different curves are not labeled.
Now let's generate a Cox proportional hazards (PH) model using the
coxph()
function. The
strata()
identifies stratification variables in the model. In this example, we stratify across the levels of tumor ulceration (
ulc
).
coxph.m1 <- coxph(Surv(days, status=="Dead from Malignant Melanoma") ~
log(thick) + sex + strata(ulc), data=melanom)
plot(survfit(coxph.m1), conf.int=TRUE, col=c("black", "gray"),
xlab = "Days", ylab = "Survival Probability",
main = paste("Baseline suvival curves (ulcerated and nonulcerated tumors)",
"in stratified Cox regression", sep = "\n"))
legend(100, 0, lty = 1, col = c("black", "gray"), legend = rev(levels(melanom$ulc)),
xjust = 0, yjust = 0)
The Design
package
BE AWARE, when loaded, the
Design
package
overwrites the
Surv()
and
survfit()
(and
cox.zph()
) functions from the
survival
package.
In turn, we build our expression to calculate our K-M in the same way as we did with the
survival
package. We do have to modify where we place the
with()
function, though.
# Overall estimates
surv.all <- survfit(with(melanom, Surv(days, status=="Dead from Malignant Melanoma")))
# Estimates across gender
surv.bysex <- survfit(with(melanom,
Surv(days, status=="Dead from Malignant Melanoma") ~ sex))
The main difference between the
Design
and
survival
packages for K-M estimates is the function you use to plot them. With the
survival
package, you are actually using the
class-specific plot.survfit()
function. On the other hand, with the
Design
package, you use the
survplot()
function, which actually calls the
class-specific survplot.survfit()
function when you plot a
survfit()
object. By default, confidence intervals are displayed using bars (i.e.,
conf = "bars"
), but they can be modified to the bands we saw with the
plot.survfit()
function by specifying
conf = "bands"
, and they can be suppressed by specifying
conf = "none"
. Unlike the output of a
plot.survfit()
funciton invocation, the
survplot.survfit()
function does not display censored times using tick-marks, but it does automatically label the individual survival curves if the
survfit()
object contains a stratifying variable.
survplot(surv.all, conf = "bands",
xlab = "Days", ylab = "Survival Probability")
title(main = "K-M plot for melanoma data")
survplot(surv.bysex, conf = "none",
xlab = "Days", ylab = "Survival Probability")
title(main = "K-M plot for melanoma data, grouped by gender")
Two other useful arguments of the
survplot.survfit()
function to know about are the
n.risk=
and
time.inc=
arguments. Specify
n.risk = TRUE
to add the number of subjects at risk for each curve to the plot. The
time.inc=
argument allows you to modify the time increment used for labeling the x-axis.
Now let's generate a Cox PH model using the
Design
package's
cph()
function. Notice, that we use the
strat()
function instead of the
strata()
function to define the stratifying variable. We alsp specify
time.inc = 2*365.25
,
surv = TRUE
, and
y = TRUE
in order to specify that the survival estimates should be displayed every two years, to (invisibly) return the calculated survival estimates.
f <- cph(Surv(days, status=="Dead from Malignant Melanoma") ~ log(thick) + sex +
strat(ulc), data=melanom, time.inc=2*365.25, surv=TRUE)
Unlike the
survival
package, in order to graphically display the results of the Cox PH model that has been fit using the
Design
package's
cph()
function, we have to calculate and define the '_data distribution_' of the variables in the
melanom
data frame. The
Design
package's model plotting functions use these defined '_data distrutions_' (i.e., variable labels, units of measurement, summary statistics, and/or level labels) to automatically define things such as plotting ranges and labels.
dd<-datadist(melanom)
options(datadist="dd")
Now we can plot the results of our Cox regression. As before, we use the generic
survplot()
function, but this time we will be actually using the
class-specific survplot.Design()
function. With the
survplot.Design()
function, we specify that we want to produce two survival curves, one for each level of tumor ulceration, using
ulc = NA
.
survplot(f, ulc=NA)
# A lot more fancy graph
survplot(f, ulc=NA, ylim=c(0.5, 1.0),
xlab="Observation Time (days)",
ylab="Cumulative Proportion Without Death from Malignant Melanoma",
lwd=2, time.inc=2*365.25,
n.risk=TRUE, cex.n.risk=0.85, adj.n.risk=0.5)
box(bty="l")
We can also generate a plot of the calculated hazard ratios with 95% confidence intervals from our Cox regression model, using the
class-specific plot.summary.Design()
. In order to do this, we need to redefine our model by removing the
strat()
function from the tumor ulceration. Like before, we need to make sure that the 'data distribution' of our data frame has been calculated and defined.
op <- par(read.only = TRUE)
f1 <- cph(Surv(days, status=="Dead from Malignant Melanoma") ~ log(thick) + sex + ulc,
data=melanom)
summary(f1)
par(mar=c(5, 4, 4, 2)+0.1)
plot(summary(f1), q = 0.95, col = "gray")
options(datadist=NULL)
par(op)