Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysis

1 / 27

# Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysis - PowerPoint PPT Presentation

Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysis. BMTRY 701 Biostatistical Methods II. Goodness of Fit. A test of how well the model explains the data Applies to linear models and generalized linear models How to do it?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Lecture 16:Logistic Regression: Goodness of Fit Information Criteria ROC analysis

BMTRY 701Biostatistical Methods II

Goodness of Fit
• A test of how well the model explains the data
• Applies to linear models and generalized linear models
• How to do it?
• It is simply a comparison of the “current” model to a perfect model
• What would the estimated likelihood function be in a perfect model?
• What would the estimated log-likelihood function be in a perfect model
Set up as a hypothesis test
• Ho: current model
• H1: perfect model
• Recall the G2 statistic comparing models:

G2 = Dev(0) - Dev(1)

• How many parameters are there in the null model?
• How many parameters are there in the perfect model?
Goodness of Fit test
• Perfect model: Assumed to be ‘saturated’ in most cases
• That is, there is a parameter for each combination of predictors
• In our model? that is likely to be close to N due to the number of continuous variables
• Define c = number of parameters in saturated model
• Deviance goodness of fit: Dev(0)
Goodness of Fit test
• Deviance goodness of fit: Dev(0)
• If Dev(Ho) < χ2(c-p),1-α, conclude H0
• If Dev(Ho) > χ2(c-p),1-αconclude H1
• Why arent we subtracting deviances?
GoF test for Prostate Cancer Model

> mreg1 <- glm(cap.inv ~ gleason + log(psa) + vol + factor(dpros),

+ family=binomial)

> mreg0 <- glm(cap.inv ~ gleason + log(psa) + vol, family=binomial)

> mreg1

Coefficients:

(Intercept) gleason log(psa) vol

-8.31383 0.93147 0.53422 -0.01507

factor(dpros)2 factor(dpros)3 factor(dpros)4

0.76840 1.55109 1.44743

Degrees of Freedom: 378 Total (i.e. Null); 372 Residual

(1 observation deleted due to missingness)

Null Deviance: 511.3

Residual Deviance: 377.1 AIC: 391.1

Test Statistic: 377.1 ~ χ2(380 - 7)

Threshold: χ2(373),1-α, = 419.0339

p-value = 0.43

More Goodness of Fit
• There are a lot of options!
• Deviance GoF is just one
• Pearson Chi-square
• Hosmer-Lemeshow
• etc
• Principles, however, are essentially the same
• GoF is not that commonly seen in medical research because it is rarely very important
Information Criteria
• Information criterion is a measure of the goodness of fit of an estimated statistical model.
• It is grounded in the concept of entropy,
• offers a relative measure of the information lost
• describes the tradeoff precision and complexity of the model.
• An IC is not a test on the model in the sense of hypothesis testing
• it is a tool for model selection.
• Given a data set, several competing models may be ranked according to their IC
• The model with the lowest IC is chosen as the “best”
Information Criteria
• IC rewards goodness of fit, but also includes a penalty that is an increasing function of the number of estimated parameters.
• This penalty discourages overfitting.
• The IC methodology attempts to find the model that best explains the data with a minimum of free parameters.
• More traditional approaches such as LRT start from a null hypothesis.
• IC judges a model by how close its fitted values tend to be to the true values.
• the AIC value assigned to a model is only meant to rank competing models and tell you which is the best among the given alternatives.
Akaike Information Criteria (AIC)

Akaike, Hirotugu (1974). "A new look at the statistical model identification".

IEEE Transactions on Automatic Control19 (6): 716–723..

Bayesian Information Criteria

Schwarz, Gideon E. (1978). "Estimating the dimension of a model".

Annals of Statistics6 (2): 461–464.

AIC versus BIC
• BIC and AIC are similar
• Different penalty for number of parameters
• The BIC penalizes free parameters more strongly than does the AIC.
• Implications: BIC tends to choose smaller models
• The larger the N, the more likely that AIC and BIC will disagree on model selection
Prostate cancer models
• We looked at different forms for volume:

A: volume as continuous

B: volume as binary (detectable vs. undetectable)

C: 4 categories of volume

D: 3 categories of volume

E: linear + squared term for volume

ROC curve analysis
• Receiver Operating Characteristic Curve Analysis
• Traditionally, looks at the sensitivity and specificity of a ‘model’ for predicting an outcome
• Question: based on our model, can we accurately predict if a prostate cancer patient has capsular penetration?
ROC curve analysis
• Associations between predictors and outcomes is not enough
• Need ‘stronger’ relationship
• Classic interpretation of sens and spec
• a binary test and a binary outcome
• sensitivity = P(test + | true disease)
• specificity = P(test - |true no disease)
• What is test + in our dataset?
• What does the model provide for us?
Fitted probabilities
• The fitted probabilities are the probability that a NEW patient with the same ‘covariate profile’ will be a “case” (e.g., capsular penetration, disease, etc.)
• We select a probability ‘threshold’ to determine whether a patient is defined as a case or not
• Some options:
• high sensitivity (e.g., cancer screens)
• high specificity (e.g., PPD skin test for TB)
• maximize the sum of sens and spec
ROC curve

. xi: logit capsule i.dpros detected gleason logpsa

i.dpros _Idpros_1-4 (naturally coded; _Idpros_1 omitted)

Iteration 0: log likelihood = -255.62831

Iteration 1: log likelihood = -193.51543

Iteration 2: log likelihood = -188.23598

Iteration 3: log likelihood = -188.04747

Iteration 4: log likelihood = -188.0471

Logistic regression Number of obs = 379

LR chi2(6) = 135.16

Prob > chi2 = 0.0000

Log likelihood = -188.0471 Pseudo R2 = 0.2644

------------------------------------------------------------------------------

capsule | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

_Idpros_2 | .7801903 .3573241 2.18 0.029 .079848 1.480533

_Idpros_3 | 1.606646 .3744828 4.29 0.000 .8726729 2.340618

_Idpros_4 | 1.504732 .4495287 3.35 0.001 .6236723 2.385793

detected | -.5719155 .2570359 -2.23 0.026 -1.075697 -.0681344

gleason | .9418179 .1648245 5.71 0.000 .6187677 1.264868

logpsa | .5152153 .1547649 3.33 0.001 .2118817 .8185488

_cons | -8.275811 1.056036 -7.84 0.000 -10.3456 -6.206018

------------------------------------------------------------------------------

How to interpret?
• Every point represents a patient(s) in the dataset
• Question: if we use that person’s fitted probability as the threshold, what are the sens and spec values?
• Empirically driven based on the fitted probabilities
• Choosing the threshold:
• high sens or spec
• maximize both? the point on ROC curve closest to the upper left corner
AUC of ROC curve
• AUC = Area Under the Curve
• 0.5 < AUC < 1
• AUC = 1 if the model is perfect
• AUC = 0.50 if the model is no better than chance
• “Good” AUC?
• context specific
• for some outcomes, there are already good diagnostic measures so AUC would need to be very high
• for others, if there is very little, even an AUC of 0.70 would be useful.
Utility in model selection
• If the goal of the modeling is prediction, AUC can be used to determine the ‘best’ model
• A variable may be associated with the outcome, but not add much in terms of prediction
• Example:
• Model 1: gleason + logPSA + detectable + dpros
• Model 2: gleason + logPSA + detectable
• Model 3: gleason + logPSA
Sensitivity and Specificity
• For ‘true’ use, you need to choose a cutoff.
• The AUC of the ROC curve tells you about prediction of model
• But, not directly translatable into ‘accuracy’ of a given threshold
phat = 0.50 cutoff

Logistic model for capsule

-------- True --------

Classified | D ~D | Total

-----------+--------------------------+-----------

+ | 100 39 | 139

- | 53 187 | 240

-----------+--------------------------+-----------

Total | 153 226 | 379

Classified + if predicted Pr(D) >= .5

True D defined as capsule != 0

--------------------------------------------------

Sensitivity Pr( +| D) 65.36%

Specificity Pr( -|~D) 82.74%

Positive predictive value Pr( D| +) 71.94%

Negative predictive value Pr(~D| -) 77.92%

--------------------------------------------------

False + rate for true ~D Pr( +|~D) 17.26%

False - rate for true D Pr( -| D) 34.64%

False + rate for classified + Pr(~D| +) 28.06%

False - rate for classified - Pr( D| -) 22.08%

--------------------------------------------------

Correctly classified 75.73%

--------------------------------------------------

phat = 0.25 cutoff

Logistic model for capsule

-------- True --------

Classified | D ~D | Total

-----------+--------------------------+-----------

+ | 137 96 | 233

- | 16 130 | 146

-----------+--------------------------+-----------

Total | 153 226 | 379

Classified + if predicted Pr(D) >= .25

True D defined as capsule != 0

--------------------------------------------------

Sensitivity Pr( +| D) 89.54%

Specificity Pr( -|~D) 57.52%

Positive predictive value Pr( D| +) 58.80%

Negative predictive value Pr(~D| -) 89.04%

--------------------------------------------------

False + rate for true ~D Pr( +|~D) 42.48%

False - rate for true D Pr( -| D) 10.46%

False + rate for classified + Pr(~D| +) 41.20%

False - rate for classified - Pr( D| -) 10.96%

--------------------------------------------------

Correctly classified 70.45%

--------------------------------------------------