Department of Biostatistics Seminar/Workshop Series
Assessment of the P-values for Survival Risk Score Derived from High Dimensional Data
Heidi Chen, PhD
Assistant in Biostatistics, Department of Biostatistics, Vanderbilt University School of Medicine
Wednesday, October 26, 1:30-2:30pm, MRBIII Conference Room 1220
The goal of many high dimensional clinical studies is to develop a model to predict patient disease outcome. In general, there are three steps to building a prediction model: (1) feature selection, (2) model building, and (3) model validation. The feature selection step identifies the features that best distinguish the clinical outcome of interest. Building the prediction model from a risk score that is derived from a linear combination of the intensities of selected features with different weights is a common and intuitive approach to make a prediction for a future observation. The final step of model validation is to evaluate the generalizability of the prediction model. Applying the prediction model to an independent test set is the gold standard in validating the prediction ability of the selected features, but the preliminary validation of the prediction model usually uses the training set to assess the statistical significance of the association between the risk score and clinical outcome. In this talk, I focus on the prediction of survival outcome. The likelihood ratio test and the log-rank test are two conventional methods used to assess whether the risk score is significantly associated with survival. The p-values calculated from the distribution of these two conventional methods, however, are not valid for the training set because of the dependence between the risk score and survival time. Claiming significant findings from the training set without the external validation from an independent test set can lead to serious flaws. This talk will address the issue of p-values calculated from the naïve test procedures and illustrate the procedure needed to calculate the correct p-values without inflating the type 1 error.