- 228 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Validation of predictive regression models' - Patman

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Validation of predictive regression models

Ewout W. Steyerberg, PhD

Clinical epidemiologist

Frank E. Harrell, PhD

Biostatistician

Personal background

- Ewout Steyerberg: Erasmus MC, Rotterdam, the Netherlands
- Frank Harrell: Health Evaluation Sciences, Univ of Virginia, Charlottesville, VA, USA

“Validation of predictions from regression models is of paramount importance”

Learning objectives: knowledge of

- common types of regression models
- fundamental assumptions of regression models
- performance criteria of predictive models
- principles of different types of validation

Performance objectives

- To be able to explain why validation is necessary for predictive models
- To be able to judge the adequacy of a validation procedure

Predictive models provide quantitative estimates of an outcome, e.g.

- Quality of life one year after surgery
- Death at 30 days after surgery
- Long term survival

Predictive models are often based on regression analysis

- y ~ a + sum(bi*xi)

y: outcome variable

a: intercept

bi: regression coefficient i

xi: predictor variable i

i in [1,many], usually 2 to 20

3 examples of regression

- Quality of life one year after surgery:

continuous outcome, linear regression

- Death at 30 days after surgery:

binary outcome, logistic regression

- Long term survival:

time-to-outcome, Cox regression

Predictive models make assumptions

- Distribution
- Linearity of continuous variables
- Additivity of effects

Example: a simple logistic regression model

- 30day mortality ~ a + b1*sex + b2*age

Assumptions:

- Distribution of 30day mortality is binomial
- Age has a linear effect
- The effects of sex and age can be added

Assessing model assumptions

- Examine model residuals
- Perform specific tests
- add nonlinear terms, e.g. age+age2
- add interaction terms, e.g. sex*age

Model assumptions and predictions

- Better predictions if assumptions are met
- Some violation inherent in empirical data
- Evaluate predictions in new data

Evaluation of predictions

- Calibration
- average of predictions correct?
- low and high predictions correct?
- Discrimination
- distinguish low risk from high risk patients?

3 types of validation

- Apparent: performance on sample used to develop model
- Internal: performance on population underlying the sample
- External: performance on related but slightly different population

Apparent validity

- Easy to calculate
- Results in optimistic performance estimates

Apparent estimates optimistic since same data used for:

- Definition of model structure: e.g. selection and coding of variables
- Estimation of model parameters: e.g. regression coefficients
- Evaluation of model performance: e.g. calibration and discrimination

Internal validity

- More difficult to calculate
- Test model in new data, random from underlying population

Why internal validation?

- Honest estimate of performance should be obtained, at least for a population similar to the development sample
- Internal validated performance sets an upper limit to what may be expected in other settings (external validity)

External validity

- Moderately easy to calculate when new data are available
- Test model in new data, different from development population

Why external validation?

- Various factors may differ from development population, including
- different selection of patients
- different definitions of variables
- different diagnostic or therapeutic procedures

Internal validation techniques

- Split-sample:
- development / validation
- Cross-validation:
- alternating development / validation
- extreme: n-1 develop / 1 validate (‘jack-knife’)
- Bootstrap

Bootstrap is the preferred internal validation technique

- bootstrap sample for model development: n patients drawn with replacement
- original sample for validation: n patients
- difference: optimism
- efficiency: development and validation on n patients

Example: bootstrap results for logistic regression model

- 30-day mortality ~ a + b1*sex + b2*age

Apparent area under the ROC curve: 0.77

Mean area of 200 bootstrap samples:0.772

Mean area of 200 tests in original: 0.762

Optimism in apparent performance: 0.01

Optimism-corrected area: 0.76

External validation techniques

- Temporal validation: same investigators, validate in recent years
- Spatial validation (other place): same investigators, cross-validate in centers
- Fully external: other investigators, other centers

Example: external validity of logistic regression model

- 30-day mortality ~ a + b1*sex + b2*age

Apparent area in 785 patients: 0.77

Tested in 20,318 other patients: 0.74

Tested by other investigators: ?

Summary

- Apparent validity gives an optimistic estimate of model performance
- Internal validity may be estimated by bootstrapping
- External validity should be determined in other populations

Key references

- tutorial and book on multivariable models(Harrell 1996, Stat Med 15:361-87; Harrell: regression modeling strategies, Springer 2001)
- empirical evaluations of strategies(Steyerberg 2000: Stat Med19: 1059-79)
- internal validation (Steyerberg 2001:JCE 54: 774-81)
- external validation (Justice 1999: Ann Intern Med 130:515-24; Altman 2000: Stat Med 19: 453-73)

Links

- Interactive text book on predictive modelinghttp://www.neri.org/symptom/mockup/Chapter_8/
- Harrell’s Regression modeling strategieshttp://hesweb1.med.virginia.edu/biostat/rms/

Download Presentation

Connecting to Server..