### Department of Biostatistics Seminar/Workshop Series

# Comparison of internal validation methods on the procedure of constructing a survival prediction model for proteomic studies

## Chiu-Lan "Heidi" Chen, PhD

### Assistant in Biostatistics, Department of Biostatistics, Cancer Division

Vanderbilt University Medical Center

### Wednesday, March 12, 2008, 1:30-2:30pm, MRBIII Conference Room 1220

### Intended Audience: Persons interested in applied statistics, statistical theory, epidemiology, health services research, clinical trials methodology, statistical computing, statistical graphics, R users or potential users

MALDI-TOF mass spectrometry is one of the leading techniques in proteomics research. It allows direct measurement of the protein signatures of tissue, blood and other biological samples. One goal of studies using mass spectomety data is to build a prediction model for future outcomes based on the features extracted from such data. In general, there are three steps to building a prediction model: feature selection, model building, and model validation. While both external and internal validations are common methods for model validation, in this research we focus on internal validation to assess the predictive power of a prediction model for survival outcomes. Measurement of predictive accuracy can be difficult for survival data in the presence of censoring. To counter this, the C-index measures the probability of concordance (agreement) between the predicted and observed outcomes in terms of lengths of survival for any two subjects. We use the C-index to measure predictive accuracy. With a focus on prediction assessment, we conducted Monte Carlo simulation studies to compare several internal validation methods such as split sample, Bootstrap and k-fold cross validation of the estimation of true predictive accuracy.