Predictive Accuracy and Model Validation

UVA Biostatistics Discussion Board: Regression Modeling Strategies: Predictive Accuracy and Model Validation
By Frank E Harrell Jr (Feh3k) on Friday, May 02, 2003 - 08:22 am:

Brier score: Arie Dayagi writes:

I'm a PhD student in Tel Aviv University in Israel. My research deals with survival analysis using fully parametric models. In order to find a criteria for comparing models, I noticed your S plus function val.prob. I wonder if you can help me with the following issues:

  1. What is the correct expression for Brier's score in the case of fully parametric models?
  2. Could (and how) the Gini coefficient be used for evaluating fully parametric models?
  3. How can I get the code for the function?


Answers:
  1. val.prob is for external validation of models for binary responses, and it does not handle censoring. A new experimental function val.surv may be useful for external validation of survival models with right censoring. Brier's score (provided by val.prob) is only valid for binary responses. Calculation of Brier's score does not need to take into account the type of parametric model used (e.g., logit, probit, recursive partitioning); it is just the mean squared error in predicted probability, or one minus this if you want larger scores to be good.
  2. I do not have experience with the Gini measure.

By Osman Al-Radi on Thursday, May 15, 2003 - 06:57 pm:

1- What is the interpretation of -ve Dxy?
2- What is an acceptable value of Dxy, U, D, and Q statistics?
3- Dose the interpretation change for stratified models or models with time--dependent covariates (created with the large data set (start, stop]?

I am refering to the Dxy, U, D, and Q statistics in the last two chapters of Dr. Harrells book on Multiple regression stratigies.

By Frank E Harrell Jr (Feh3k) on Friday, May 16, 2003 - 09:49 am:

Dxy is 2*(C-0.5) or the difference in the probability that predictions are concordant with outcomes minus the probability they are discordant.

There is no acceptable values on any of the measures; it's all relative to what you are trying to do. For example, predicting day of death is very difficult (thank goodness) so you expect small Dxy values for that. For U (unreliability index) you want values close to zero or less than zero but the calibration plot is more interpretable.

For stratified Cox models, predictions have to be the probability of survival at a fixed time, to be able to compute indexes such as Dxy. In other words, log relative hazard can't be used because strata are "subtracted out" of the calculations.

To my knowledge no one has worked out indexes when there are time-dependent covariables.

By Osman Al-Radi on Friday, May 16, 2003 - 05:03 pm:

Thanks prof. Harrell.. That was simple enough for a non-statisticain like me to understand..

I have a follow-up question:

For a stratifed cox model I could still use the Dxy and the calibrartion plots ? but not the D, U, and Q..

For a cox model with a time dependent covatiate.. would the calibrartion plot from plot(calibrate object) be vaild?

By Frank E Harrell Jr (Feh3k) on Saturday, May 17, 2003 - 04:00 pm:

Correct. calibrate works because you must specify a time point when the analysis is stratified, and calibrate checks the reliability of the predicted probability of surviving past that time point. Estimation of the underlying survival curves for all strata is part of what is being checked in the calibration plot.

The S calibrate function does not work when time-dependent covariates are present.

By Osman Al-Radi on Tuesday, May 27, 2003 - 10:14 am:

Dear Prof. Harrell,

In an effort to validate a cph() model with a time-dependent covariate and using the new resample library (beta version) and the formule of the discrimination statistics from your book I wrote the following code :

discrimination<-function (fit) {
lr <- -2*(fit$loglik[1]-fit$loglik[2])
LL0 <- -2*(fit$loglik[1])
D <- (lr-1)*LL0
U <- -2/LL0
Q <- D - U
R2 <- fit$stats[8]
dis <- c(R2, Likelihood ratio=lr,Discrimination index=D,Unreliability index=U,Quality index=Q)
dis
}
# subject allows resampling all rows from a single patient as a single unit.

boot.obj<-bootstrap(fit2, discrimination, subject=ptid, B=1000)


One can print and plot the boot.object (I will send the output and graph to your email)

I would like to know what are you thoughts about this method..

Osman

By Frank E Harrell Jr (Feh3k) on Wednesday, May 28, 2003 - 01:00 pm:

You have to be very careful about in-sample and out-of-sample predictions when validating. I don't know if your method handles that correctly.

It may be possible for validate.cph to work properly when there are time-dependent covariates; I just haven't checked it. validate will definitely not work in that situation.