UVA Biostatistics Discussion Board

UVA Biostatistics Discussion Board: Regression Modeling Strategies: Binary Logistic Regression

By Frank E Harrell Jr (Feh3k) on Thursday, October 24, 2002 - 10:34 am:

This topic is for discussions related specifically to the binary logistic regression model.

By Frank E Harrell Jr (Feh3k) on Thursday, October 24, 2002 - 10:49 am:

Omnibus goodness of fit test

Eric Rescorla posted the following question related to the goodness of fit test provided by the residuals function for logistic models:

Using the original Hosmer and Lemeshow data from the example in the Design documenation I get:

> resid(f, 'gof')
Sum of squared errors Expected value|H0 SD
36.90136426 36.45215657 0.26054626
Z P
1.72409955 0.08468987

I'm assuming that what I'm interested in here is the P value. Should it be interpreted as in the ordinary Hosmer-Lemeshow test, that is to say that high P means a good fit?

Part of my confusion stems from the following observation. Using Agresti's Death Penalty data[1]. I see:

> resid(lrm(PENALTY~VIC*DEF,x=T,y=T),'gof')
Sum of squared errors Expected value|H0 SD
3.133912e+01 3.134142e+01 1.387697e-14
Z P
-1.654873e+11 0.000000e+00

> resid(lrm(PENALTY~DEF+VIC,x=T,y=T),'gof')
Sum of squared errors Expected value|H0 SD
31.35431961 31.33009679 0.03952735
Z P
0.61281159 0.54000093

This may be a stupid question, but I don't understand why the saturated model would have a worse P statistic than the unsaturated model. I must be missing soemthing but I'm not sure what it is.

[1] Agresti, A., Categorical Data Analysis, Wiley 1990.

Answer: In general P-values for goodness of fit tests may be high for any of 3 reasons: (1) insufficient sample size; (2) the fit is good; and (3) the test does not have power to detect the particular aspect of lack of fit that is present. In your situation though, the standard deviation of the test statistic appears to be estimating zero when the model is saturated. If the model is saturated, it has to fit perfectly. My guess is that the paper by Hosmer, Hosmer, Le Cessie, and Lemeshow (Stat in Med 16:965-980, 1997) did not study the test statistic for saturated models. We also need to clarify what we mean by saturated models. Thus usually means that there is a parameter for each possible cross-classified cell in the space of the predictors. In this case there may be in addition to this a cell with all values of the response being constant. This may cause other problems in the computation of the GOF statistic.

Note that the P-value for a GOF test in general may be small because the model is imperfect and not necessarily because the model is bad.