validation of predictive regression models
Download
Skip this Video
Download Presentation
Validation of predictive regression models

Loading in 2 Seconds...

play fullscreen
1 / 29

Validation of predictive regression models - PowerPoint PPT Presentation


  • 228 Views
  • Uploaded on

Validation of predictive regression models. Ewout W. Steyerberg, PhD Clinical epidemiologist Frank E. Harrell, PhD Biostatistician. Personal background. Ewout Steyerberg: Erasmus MC, Rotterdam, the Netherlands

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Validation of predictive regression models' - Patman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
validation of predictive regression models

Validation of predictive regression models

Ewout W. Steyerberg, PhD

Clinical epidemiologist

Frank E. Harrell, PhD

Biostatistician

personal background
Personal background
  • Ewout Steyerberg: Erasmus MC, Rotterdam, the Netherlands
  • Frank Harrell: Health Evaluation Sciences, Univ of Virginia, Charlottesville, VA, USA

“Validation of predictions from regression models is of paramount importance”

learning objectives knowledge of
Learning objectives: knowledge of
  • common types of regression models
  • fundamental assumptions of regression models
  • performance criteria of predictive models
  • principles of different types of validation
performance objectives
Performance objectives
  • To be able to explain why validation is necessary for predictive models
  • To be able to judge the adequacy of a validation procedure
predictive models provide quantitative estimates of an outcome e g
Predictive models provide quantitative estimates of an outcome, e.g.
  • Quality of life one year after surgery
  • Death at 30 days after surgery
  • Long term survival
predictive models are often based on regression analysis
Predictive models are often based on regression analysis
  • y ~ a + sum(bi*xi)

y: outcome variable

a: intercept

bi: regression coefficient i

xi: predictor variable i

i in [1,many], usually 2 to 20

3 examples of regression
3 examples of regression
  • Quality of life one year after surgery:

continuous outcome, linear regression

  • Death at 30 days after surgery:

binary outcome, logistic regression

  • Long term survival:

time-to-outcome, Cox regression

predictive models make assumptions
Predictive models make assumptions
  • Distribution
  • Linearity of continuous variables
  • Additivity of effects
example a simple logistic regression model
Example: a simple logistic regression model
  • 30day mortality ~ a + b1*sex + b2*age

Assumptions:

  • Distribution of 30day mortality is binomial
  • Age has a linear effect
  • The effects of sex and age can be added
assessing model assumptions
Assessing model assumptions
  • Examine model residuals
  • Perform specific tests
    • add nonlinear terms, e.g. age+age2
    • add interaction terms, e.g. sex*age
model assumptions and predictions
Model assumptions and predictions
  • Better predictions if assumptions are met
  • Some violation inherent in empirical data
  • Evaluate predictions in new data
evaluation of predictions
Evaluation of predictions
  • Calibration
    • average of predictions correct?
    • low and high predictions correct?
  • Discrimination
    • distinguish low risk from high risk patients?
3 types of validation
3 types of validation
  • Apparent: performance on sample used to develop model
  • Internal: performance on population underlying the sample
  • External: performance on related but slightly different population
apparent validity
Apparent validity
  • Easy to calculate
  • Results in optimistic performance estimates
apparent estimates optimistic since same data used for
Apparent estimates optimistic since same data used for:
  • Definition of model structure: e.g. selection and coding of variables
  • Estimation of model parameters: e.g. regression coefficients
  • Evaluation of model performance: e.g. calibration and discrimination
internal validity
Internal validity
  • More difficult to calculate
  • Test model in new data, random from underlying population
why internal validation
Why internal validation?
  • Honest estimate of performance should be obtained, at least for a population similar to the development sample
  • Internal validated performance sets an upper limit to what may be expected in other settings (external validity)
external validity
External validity
  • Moderately easy to calculate when new data are available
  • Test model in new data, different from development population
why external validation
Why external validation?
  • Various factors may differ from development population, including
    • different selection of patients
    • different definitions of variables
    • different diagnostic or therapeutic procedures
internal validation techniques
Internal validation techniques
  • Split-sample:
    • development / validation
  • Cross-validation:
    • alternating development / validation
    • extreme: n-1 develop / 1 validate (‘jack-knife’)
  • Bootstrap
bootstrap is the preferred internal validation technique
Bootstrap is the preferred internal validation technique
  • bootstrap sample for model development: n patients drawn with replacement
  • original sample for validation: n patients
  • difference: optimism
  • efficiency: development and validation on n patients
example bootstrap results for logistic regression model
Example: bootstrap results for logistic regression model
  • 30-day mortality ~ a + b1*sex + b2*age

Apparent area under the ROC curve: 0.77

Mean area of 200 bootstrap samples:0.772

Mean area of 200 tests in original: 0.762

Optimism in apparent performance: 0.01

Optimism-corrected area: 0.76

external validation techniques
External validation techniques
  • Temporal validation: same investigators, validate in recent years
  • Spatial validation (other place): same investigators, cross-validate in centers
  • Fully external: other investigators, other centers
example external validity of logistic regression model
Example: external validity of logistic regression model
  • 30-day mortality ~ a + b1*sex + b2*age

Apparent area in 785 patients: 0.77

Tested in 20,318 other patients: 0.74

Tested by other investigators: ?

summary
Summary
  • Apparent validity gives an optimistic estimate of model performance
  • Internal validity may be estimated by bootstrapping
  • External validity should be determined in other populations
key references
Key references
  • tutorial and book on multivariable models(Harrell 1996, Stat Med 15:361-87; Harrell: regression modeling strategies, Springer 2001)
  • empirical evaluations of strategies(Steyerberg 2000: Stat Med19: 1059-79)
  • internal validation (Steyerberg 2001:JCE 54: 774-81)
  • external validation (Justice 1999: Ann Intern Med 130:515-24; Altman 2000: Stat Med 19: 453-73)
links
Links
  • Interactive text book on predictive modelinghttp://www.neri.org/symptom/mockup/Chapter_8/
  • Harrell’s Regression modeling strategieshttp://hesweb1.med.virginia.edu/biostat/rms/
ad