Predictive Accuracy and Model Validation Discussion Board
How to validate the predictive accuracy of an ordinal logistic regrsesion model?
Colin Robertson (
colinr23@gmail.com) asked the following on 5Jan07:
I am trying to assess the prediction accuracy of an ordinal model fit with
lrm
in the
Design
package. I used
predict.lrm
to predict on an independent
dataset and am now attempting to assess the accuracy of these predictions.
From what I have read, the AUC is good for this because it is threshold
independent. I obtained the AUC for the fit model output from the c score (c
= 0.78). For the predicted values and independent data, for each level of
the response I used the ROCR functions to get the AUC (i.e., probability y>=class1, y>=class2, y>=class3 etc) and plotted the ROC curves for each. The AUC values are all higher (AUC = 0.80 - 0.93) for the predicted
values than what I got from the fit model in lrm.
I am not sure whether I have misinterpreted the use of the AUC for ordinal
models or whether the prediction results are actually better than the model
results.
Reply: Unless the independent dataset and the training dataset are both huge, splitting the data is inefficient and gives a low-precision estimate of predictive accuracy (when compared to bootstrapping or 50-fold repeats of 10-fold cross-validation).
lrm
computes a quick approximate AUC which you can confirm by running
rcorr.cens(predict(fit), Y)
(from the
Hmisc
package) and using Dxy=2(C-.5). The C index printed by
lrm
is for predicting all categories of Y; it is easier to predict whether Y>=j for a given j than to predict an ordinal Y over the whole set of categories. Note that Somers' D and the AUC (C) do not penalize for ties in Y.
For independent model validation you can use the val.prob function for each Y-cutoff j.