DesignSurvivalAnalysis < Main < Vanderbilt Biostatistics Wiki

You are here: Vanderbilt Biostatistics Wiki>Main Web>Seminars>RClinic>DesignSurvivalAnalysis (10 Nov 2006, TheresaScott) (raw view)EditAttach
---+ Survival analysis with the =survival= and =Design= packages

The following are some examples of how to perform survival analysis using the =survival= and =Design= packages.

---++ The =survival= package

Let's go through some examples using the =survival= package.  Therefore, let's _load_ the necessary packages using the =library()= function -- make sure all of the packages have already been _installed_.  The =ISwR= package contains the data sets and R code for text examples and exercises in Peter Dalgaard's '_Introductory Statistics with R_' book.  We need to load the =ISwR= package because we'll be using the =melanom= _built-in_ data set in all of the examples, which we load with the =data()= function.  We'll also load the =Hmisc= package, which contains some useful functions (e.g., the =contents()= and =upData()= functions) that we'll want to use.
<highlight>
library(ISwR)
library(survival)
library(Hmisc)

data(melanom)
</highlight>

The =melanom= data frame contains the following columns:
   * =no=: patient ID
   * =status=: survival status (=1=: dead from melanoma, =2=: alive, =3=: dead from other cause)
   * =days=: observation time in days
   * =ulc=: ulceration (=1=: present, =2=: absent)
   * =thick=: tumor thickness (1/100 mm).
   * =sex=: gender (=1=: female, =2=: male)
<highlight>
contents(melanom)
</highlight>

Let's update the =melanom= data frame by adding appropriate labels and units of measurements to the variables.  Let's also modify the categorical variables currently coded with numeric values to be _factors_ with character _level labels_. _REMEMBER_, order matters when defining the level labels.
<highlight>
melanom <- upData(melanom,
   labels=c(status="Patient's Status",
      days="Observation time",
      ulc="Tumor Ulceration",
      thick="Tumor Thickness",
      sex="Gender"),
   levels=list(status=c("Dead from Malignant Melanoma",
      "Alive on Jan 1, 1978", "Dead from Other Causes"),
      ulc=c("Present", "Absent"), sex=c("Female", "Male")),
   units=c(days="days", thick="1/100 mm"))
contents(melanom)
</highlight>

Now let's generate Kaplan-Meier (K-M) estimates from our =melanom= data frame.  The =Surv()= function defines our survival outcome, based on the time-to-event and the event of interest (in this case, dead from malignant melanoma).  We can generate overall K-M estimates, and estimates across gender.
<highlight>
# Overall estimates
surv.all <- with(melanom, survfit(Surv(days, status=="Dead from Malignant Melanoma")))
summary(surv.all)
summary(surv.all, censored=TRUE)
plot(surv.all, xlab = "Days", ylab = "Survival Probability",
   main = "K-M plot for melanoma data")

# Estimates across gender
surv.bysex <- with(melanom, 
   survfit(Surv(days, status=="Dead from Malignant Melanoma") ~ sex))
summary(surv.bysex)
summary(surv.bysex, censored=TRUE)
plot(surv.bysex)
plot(surv.bysex, conf.int=TRUE, col=c("black", "gray"), 
   xlab = "Days", ylab = "Survival Probability",
   main = "K-M plot for melanoma data, grouped by gender")
legend(250, 0, lty = 1, col = c("black", "gray"), legend = levels(melanom$gender),
   xjust = 0, yjust = 0)
</highlight>
Notice that tick-marks are shown on the K-M plots, which represent _censored_ observation times.  The bands give approximate confidence intervals.  Also, you noticed that, by default, the different curves are not labeled.

Now let's generate a Cox proportional hazards (PH) model using the =coxph()= function.  The =strata()= identifies stratification variables in the model.  In this example, we stratify across the levels of tumor ulceration (=ulc=).
<highlight>
coxph.m1 <- coxph(Surv(days, status=="Dead from Malignant Melanoma") ~ 
   log(thick) + sex + strata(ulc), data=melanom)
plot(survfit(coxph.m1), conf.int=TRUE, col=c("black", "gray"), 
   xlab = "Days", ylab = "Survival Probability",
   main = paste("Baseline suvival curves (ulcerated and nonulcerated tumors)",
      "in stratified Cox regression", sep = "\n"))
legend(100, 0, lty = 1, col = c("black", "gray"), legend = rev(levels(melanom$ulc)),
   xjust = 0, yjust = 0)
</highlight>

---++ The =Design= package

_BE AWARE_, when loaded, the =Design= package _overwrites_ the =Surv()= and =survfit()= (and =cox.zph()=) functions from the =survival= package.
<highlight>
library(Design)
</highlight>

In turn, we build our expression to calculate our K-M in the same way as we did with the =survival= package.  We do have to modify where we place the =with()= function, though.
<highlight>
# Overall estimates
surv.all <- survfit(with(melanom, Surv(days, status=="Dead from Malignant Melanoma")))
# Estimates across gender
surv.bysex <- survfit(with(melanom, 
   Surv(days, status=="Dead from Malignant Melanoma") ~ sex))
</highlight>

The main difference between the =Design= and =survival= packages for K-M estimates is the function you use to plot them.  With the =survival= package, you are actually using the _class-specific_ =plot.survfit()= function.  On the other hand, with the =Design= package, you use the =survplot()= function, which actually calls the _class-specific_ =survplot.survfit()= function when you plot a =survfit()= object.  By default, confidence intervals are displayed using bars (i.e., =conf = "bars"=), but they can be modified to the bands we saw with the =plot.survfit()= function by specifying =conf = "bands"=, and they can be suppressed by specifying =conf = "none"=.  Unlike the output of a =plot.survfit()= funciton invocation, the =survplot.survfit()= function does not display censored times using tick-marks, but it does automatically label the individual survival curves if the =survfit()= object contains a stratifying variable.
<highlight>
survplot(surv.all, conf = "bands",
   xlab = "Days", ylab = "Survival Probability")
title(main = "K-M plot for melanoma data")

survplot(surv.bysex, conf = "none",
   xlab = "Days", ylab = "Survival Probability")
title(main = "K-M plot for melanoma data, grouped by gender")
</highlight>
Two other useful arguments of the =survplot.survfit()= function to know about are the =n.risk== and =time.inc== arguments.  Specify =n.risk = TRUE= to add the number of subjects at risk for each curve to the plot.  The =time.inc== argument allows you to modify the time increment used for labeling the x-axis.

Now let's generate a Cox PH model using the =Design= package's =cph()= function.  Notice, that we use the =strat()= function instead of the =strata()= function to define the stratifying variable.  We alsp specify =time.inc = 2*365.25=, =surv = TRUE=, and =y = TRUE= in order to specify that the survival estimates should be displayed every two years, to (invisibly) return the calculated survival estimates. 
<highlight>
f <- cph(Surv(days, status=="Dead from Malignant Melanoma") ~ log(thick) + sex + 
   strat(ulc), data=melanom, time.inc=2*365.25, surv=TRUE)
</highlight>

Unlike the =survival= package, in order to graphically display the results of the Cox PH model that has been fit using the =Design= package's =cph()= function, we have to calculate and define the '_data distribution_' of the variables in the =melanom= data frame.  The =Design= package's model plotting functions use these defined '_data distrutions_' (i.e., variable labels, units of measurement, summary statistics, and/or level labels) to automatically define things such as plotting ranges and labels.
<highlight>
dd<-datadist(melanom)
options(datadist="dd")
</highlight>

Now we can plot the results of our Cox regression.  As before, we use the generic =survplot()= function, but this time we will be actually using the _class-specific_ =survplot.Design()= function.  With the =survplot.Design()= function, we specify that we want to produce two survival curves, one for each level of tumor ulceration, using =ulc = NA=.
<highlight>
survplot(f, ulc=NA)

# A lot more fancy graph
survplot(f, ulc=NA, ylim=c(0.5, 1.0), 
   xlab="Observation Time (days)",
   ylab="Cumulative Proportion Without Death from Malignant Melanoma",
   lwd=2, time.inc=2*365.25,
   n.risk=TRUE, cex.n.risk=0.85, adj.n.risk=0.5)
box(bty="l")
</highlight>

We can also generate a plot of the calculated hazard ratios with 95% confidence intervals from our Cox regression model, using the _class-specific_ =plot.summary.Design()=.  In order to do this, we need to redefine our model by removing the =strat()= function from the tumor ulceration.  Like before, we need to make sure that the 'data distribution' of our data frame has been calculated and defined.
<highlight>
op <- par(read.only = TRUE)

f1 <- cph(Surv(days, status=="Dead from Malignant Melanoma") ~ log(thick) + sex + ulc,
   data=melanom)
summary(f1)
par(mar=c(5, 4, 4, 2)+0.1)
plot(summary(f1), q = 0.95, col = "gray")

options(datadist=NULL)
par(op)
</highlight>
Topic revision: r2 - 10 Nov 2006, TheresaScott
Main
Department Home Page
Biostatistics Graduate Program
Vanderbilt University Medical Center
Biostatistics Webs
- Archive
- Main
- Sandbox
- System
Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback