Loading in 5 sec....

Validation of predictive regression modelsPowerPoint Presentation

Validation of predictive regression models

- 228 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Validation of predictive regression models' - Patman

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Validation of predictive regression models

Ewout W. Steyerberg, PhD

Clinical epidemiologist

Frank E. Harrell, PhD

Biostatistician

Personal background

- Ewout Steyerberg: Erasmus MC, Rotterdam, the Netherlands
- Frank Harrell: Health Evaluation Sciences, Univ of Virginia, Charlottesville, VA, USA
“Validation of predictions from regression models is of paramount importance”

Learning objectives: knowledge of

- common types of regression models
- fundamental assumptions of regression models
- performance criteria of predictive models
- principles of different types of validation

Performance objectives

- To be able to explain why validation is necessary for predictive models
- To be able to judge the adequacy of a validation procedure

Predictive models provide quantitative estimates of an outcome, e.g.

- Quality of life one year after surgery
- Death at 30 days after surgery
- Long term survival

Predictive models are often based on regression analysis outcome, e.g.

- y ~ a + sum(bi*xi)
y: outcome variable

a: intercept

bi: regression coefficient i

xi: predictor variable i

i in [1,many], usually 2 to 20

3 examples of regression outcome, e.g.

- Quality of life one year after surgery:
continuous outcome, linear regression

- Death at 30 days after surgery:
binary outcome, logistic regression

- Long term survival:
time-to-outcome, Cox regression

Predictive models make assumptions outcome, e.g.

- Distribution
- Linearity of continuous variables
- Additivity of effects

Example: a simple logistic regression model outcome, e.g.

- 30day mortality ~ a + b1*sex + b2*age
Assumptions:

- Distribution of 30day mortality is binomial
- Age has a linear effect
- The effects of sex and age can be added

Assessing model assumptions outcome, e.g.

- Examine model residuals
- Perform specific tests
- add nonlinear terms, e.g. age+age2
- add interaction terms, e.g. sex*age

Model assumptions and predictions outcome, e.g.

- Better predictions if assumptions are met
- Some violation inherent in empirical data
- Evaluate predictions in new data

Evaluation of predictions outcome, e.g.

- Calibration
- average of predictions correct?
- low and high predictions correct?

- Discrimination
- distinguish low risk from high risk patients?

Example: predicted probabilities outcome, e.g.

3 types of validation outcome, e.g.

- Apparent: performance on sample used to develop model
- Internal: performance on population underlying the sample
- External: performance on related but slightly different population

Apparent validity outcome, e.g.

- Easy to calculate
- Results in optimistic performance estimates

Apparent estimates optimistic since same data used for: outcome, e.g.

- Definition of model structure: e.g. selection and coding of variables
- Estimation of model parameters: e.g. regression coefficients
- Evaluation of model performance: e.g. calibration and discrimination

Internal validity outcome, e.g.

- More difficult to calculate
- Test model in new data, random from underlying population

Why internal validation? outcome, e.g.

- Honest estimate of performance should be obtained, at least for a population similar to the development sample
- Internal validated performance sets an upper limit to what may be expected in other settings (external validity)

External validity outcome, e.g.

- Moderately easy to calculate when new data are available
- Test model in new data, different from development population

Why external validation? outcome, e.g.

- Various factors may differ from development population, including
- different selection of patients
- different definitions of variables
- different diagnostic or therapeutic procedures

Internal validation techniques outcome, e.g.

- Split-sample:
- development / validation

- Cross-validation:
- alternating development / validation
- extreme: n-1 develop / 1 validate (‘jack-knife’)

- Bootstrap

Bootstrap is the preferred internal validation technique outcome, e.g.

- bootstrap sample for model development: n patients drawn with replacement
- original sample for validation: n patients
- difference: optimism
- efficiency: development and validation on n patients

Example: bootstrap results for logistic regression model outcome, e.g.

- 30-day mortality ~ a + b1*sex + b2*age
Apparent area under the ROC curve: 0.77

Mean area of 200 bootstrap samples:0.772

Mean area of 200 tests in original: 0.762

Optimism in apparent performance: 0.01

Optimism-corrected area: 0.76

External validation techniques outcome, e.g.

- Temporal validation: same investigators, validate in recent years
- Spatial validation (other place): same investigators, cross-validate in centers
- Fully external: other investigators, other centers

Example: external validity of logistic regression model outcome, e.g.

- 30-day mortality ~ a + b1*sex + b2*age
Apparent area in 785 patients: 0.77

Tested in 20,318 other patients: 0.74

Tested by other investigators: ?

Example: external validation outcome, e.g.

Summary outcome, e.g.

- Apparent validity gives an optimistic estimate of model performance
- Internal validity may be estimated by bootstrapping
- External validity should be determined in other populations

Key references outcome, e.g.

- tutorial and book on multivariable models(Harrell 1996, Stat Med 15:361-87; Harrell: regression modeling strategies, Springer 2001)
- empirical evaluations of strategies(Steyerberg 2000: Stat Med19: 1059-79)
- internal validation (Steyerberg 2001:JCE 54: 774-81)
- external validation (Justice 1999: Ann Intern Med 130:515-24; Altman 2000: Stat Med 19: 453-73)

Links outcome, e.g.

- Interactive text book on predictive modelinghttp://www.neri.org/symptom/mockup/Chapter_8/
- Harrell’s Regression modeling strategieshttp://hesweb1.med.virginia.edu/biostat/rms/

Download Presentation

Connecting to Server..