You are here:
Vanderbilt Biostatistics Wiki
>
Main Web
>
Seminars
>
RClinic
>
DesignSurvivalAnalysis
(10 Nov 2006,
TheresaScott
)
(raw view)
E
dit
A
ttach
---+ Survival analysis with the =survival= and =Design= packages The following are some examples of how to perform survival analysis using the =survival= and =Design= packages. ---++ The =survival= package Let's go through some examples using the =survival= package. Therefore, let's _load_ the necessary packages using the =library()= function -- make sure all of the packages have already been _installed_. The =ISwR= package contains the data sets and R code for text examples and exercises in Peter Dalgaard's '_Introductory Statistics with R_' book. We need to load the =ISwR= package because we'll be using the =melanom= _built-in_ data set in all of the examples, which we load with the =data()= function. We'll also load the =Hmisc= package, which contains some useful functions (e.g., the =contents()= and =upData()= functions) that we'll want to use. <highlight> library(ISwR) library(survival) library(Hmisc) data(melanom) </highlight> The =melanom= data frame contains the following columns: * =no=: patient ID * =status=: survival status (=1=: dead from melanoma, =2=: alive, =3=: dead from other cause) * =days=: observation time in days * =ulc=: ulceration (=1=: present, =2=: absent) * =thick=: tumor thickness (1/100 mm). * =sex=: gender (=1=: female, =2=: male) <highlight> contents(melanom) </highlight> Let's update the =melanom= data frame by adding appropriate labels and units of measurements to the variables. Let's also modify the categorical variables currently coded with numeric values to be _factors_ with character _level labels_. _REMEMBER_, order matters when defining the level labels. <highlight> melanom <- upData(melanom, labels=c(status="Patient's Status", days="Observation time", ulc="Tumor Ulceration", thick="Tumor Thickness", sex="Gender"), levels=list(status=c("Dead from Malignant Melanoma", "Alive on Jan 1, 1978", "Dead from Other Causes"), ulc=c("Present", "Absent"), sex=c("Female", "Male")), units=c(days="days", thick="1/100 mm")) contents(melanom) </highlight> Now let's generate Kaplan-Meier (K-M) estimates from our =melanom= data frame. The =Surv()= function defines our survival outcome, based on the time-to-event and the event of interest (in this case, dead from malignant melanoma). We can generate overall K-M estimates, and estimates across gender. <highlight> # Overall estimates surv.all <- with(melanom, survfit(Surv(days, status=="Dead from Malignant Melanoma"))) summary(surv.all) summary(surv.all, censored=TRUE) plot(surv.all, xlab = "Days", ylab = "Survival Probability", main = "K-M plot for melanoma data") # Estimates across gender surv.bysex <- with(melanom, survfit(Surv(days, status=="Dead from Malignant Melanoma") ~ sex)) summary(surv.bysex) summary(surv.bysex, censored=TRUE) plot(surv.bysex) plot(surv.bysex, conf.int=TRUE, col=c("black", "gray"), xlab = "Days", ylab = "Survival Probability", main = "K-M plot for melanoma data, grouped by gender") legend(250, 0, lty = 1, col = c("black", "gray"), legend = levels(melanom$gender), xjust = 0, yjust = 0) </highlight> Notice that tick-marks are shown on the K-M plots, which represent _censored_ observation times. The bands give approximate confidence intervals. Also, you noticed that, by default, the different curves are not labeled. Now let's generate a Cox proportional hazards (PH) model using the =coxph()= function. The =strata()= identifies stratification variables in the model. In this example, we stratify across the levels of tumor ulceration (=ulc=). <highlight> coxph.m1 <- coxph(Surv(days, status=="Dead from Malignant Melanoma") ~ log(thick) + sex + strata(ulc), data=melanom) plot(survfit(coxph.m1), conf.int=TRUE, col=c("black", "gray"), xlab = "Days", ylab = "Survival Probability", main = paste("Baseline suvival curves (ulcerated and nonulcerated tumors)", "in stratified Cox regression", sep = "\n")) legend(100, 0, lty = 1, col = c("black", "gray"), legend = rev(levels(melanom$ulc)), xjust = 0, yjust = 0) </highlight> ---++ The =Design= package _BE AWARE_, when loaded, the =Design= package _overwrites_ the =Surv()= and =survfit()= (and =cox.zph()=) functions from the =survival= package. <highlight> library(Design) </highlight> In turn, we build our expression to calculate our K-M in the same way as we did with the =survival= package. We do have to modify where we place the =with()= function, though. <highlight> # Overall estimates surv.all <- survfit(with(melanom, Surv(days, status=="Dead from Malignant Melanoma"))) # Estimates across gender surv.bysex <- survfit(with(melanom, Surv(days, status=="Dead from Malignant Melanoma") ~ sex)) </highlight> The main difference between the =Design= and =survival= packages for K-M estimates is the function you use to plot them. With the =survival= package, you are actually using the _class-specific_ =plot.survfit()= function. On the other hand, with the =Design= package, you use the =survplot()= function, which actually calls the _class-specific_ =survplot.survfit()= function when you plot a =survfit()= object. By default, confidence intervals are displayed using bars (i.e., =conf = "bars"=), but they can be modified to the bands we saw with the =plot.survfit()= function by specifying =conf = "bands"=, and they can be suppressed by specifying =conf = "none"=. Unlike the output of a =plot.survfit()= funciton invocation, the =survplot.survfit()= function does not display censored times using tick-marks, but it does automatically label the individual survival curves if the =survfit()= object contains a stratifying variable. <highlight> survplot(surv.all, conf = "bands", xlab = "Days", ylab = "Survival Probability") title(main = "K-M plot for melanoma data") survplot(surv.bysex, conf = "none", xlab = "Days", ylab = "Survival Probability") title(main = "K-M plot for melanoma data, grouped by gender") </highlight> Two other useful arguments of the =survplot.survfit()= function to know about are the =n.risk== and =time.inc== arguments. Specify =n.risk = TRUE= to add the number of subjects at risk for each curve to the plot. The =time.inc== argument allows you to modify the time increment used for labeling the x-axis. Now let's generate a Cox PH model using the =Design= package's =cph()= function. Notice, that we use the =strat()= function instead of the =strata()= function to define the stratifying variable. We alsp specify =time.inc = 2*365.25=, =surv = TRUE=, and =y = TRUE= in order to specify that the survival estimates should be displayed every two years, to (invisibly) return the calculated survival estimates. <highlight> f <- cph(Surv(days, status=="Dead from Malignant Melanoma") ~ log(thick) + sex + strat(ulc), data=melanom, time.inc=2*365.25, surv=TRUE) </highlight> Unlike the =survival= package, in order to graphically display the results of the Cox PH model that has been fit using the =Design= package's =cph()= function, we have to calculate and define the '_data distribution_' of the variables in the =melanom= data frame. The =Design= package's model plotting functions use these defined '_data distrutions_' (i.e., variable labels, units of measurement, summary statistics, and/or level labels) to automatically define things such as plotting ranges and labels. <highlight> dd<-datadist(melanom) options(datadist="dd") </highlight> Now we can plot the results of our Cox regression. As before, we use the generic =survplot()= function, but this time we will be actually using the _class-specific_ =survplot.Design()= function. With the =survplot.Design()= function, we specify that we want to produce two survival curves, one for each level of tumor ulceration, using =ulc = NA=. <highlight> survplot(f, ulc=NA) # A lot more fancy graph survplot(f, ulc=NA, ylim=c(0.5, 1.0), xlab="Observation Time (days)", ylab="Cumulative Proportion Without Death from Malignant Melanoma", lwd=2, time.inc=2*365.25, n.risk=TRUE, cex.n.risk=0.85, adj.n.risk=0.5) box(bty="l") </highlight> We can also generate a plot of the calculated hazard ratios with 95% confidence intervals from our Cox regression model, using the _class-specific_ =plot.summary.Design()=. In order to do this, we need to redefine our model by removing the =strat()= function from the tumor ulceration. Like before, we need to make sure that the 'data distribution' of our data frame has been calculated and defined. <highlight> op <- par(read.only = TRUE) f1 <- cph(Surv(days, status=="Dead from Malignant Melanoma") ~ log(thick) + sex + ulc, data=melanom) summary(f1) par(mar=c(5, 4, 4, 2)+0.1) plot(summary(f1), q = 0.95, col = "gray") options(datadist=NULL) par(op) </highlight>
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r2
<
r1
|
B
acklinks
|
V
iew topic
|
Edit
w
iki text
|
M
ore topic actions
Topic revision: r2 - 10 Nov 2006,
TheresaScott
Main
Department Home Page
Biostatistics Graduate Program
Vanderbilt University Medical Center
Main Web
Main Web Home
Search
Recent Changes
Changes
Topic list
Biostatistics Webs
Archive
Main
Sandbox
System
Register
|
Log In
Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki?
Send feedback