Validation of predictive regression models
Download
1 / 29

Validation of predictive regression models - PowerPoint PPT Presentation


  • 228 Views
  • Uploaded on

Validation of predictive regression models. Ewout W. Steyerberg, PhD Clinical epidemiologist Frank E. Harrell, PhD Biostatistician. Personal background. Ewout Steyerberg: Erasmus MC, Rotterdam, the Netherlands

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Validation of predictive regression models' - Patman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Validation of predictive regression models

Validation of predictive regression models

Ewout W. Steyerberg, PhD

Clinical epidemiologist

Frank E. Harrell, PhD

Biostatistician


Personal background
Personal background

  • Ewout Steyerberg: Erasmus MC, Rotterdam, the Netherlands

  • Frank Harrell: Health Evaluation Sciences, Univ of Virginia, Charlottesville, VA, USA

    “Validation of predictions from regression models is of paramount importance”


Learning objectives knowledge of
Learning objectives: knowledge of

  • common types of regression models

  • fundamental assumptions of regression models

  • performance criteria of predictive models

  • principles of different types of validation


Performance objectives
Performance objectives

  • To be able to explain why validation is necessary for predictive models

  • To be able to judge the adequacy of a validation procedure


Predictive models provide quantitative estimates of an outcome e g
Predictive models provide quantitative estimates of an outcome, e.g.

  • Quality of life one year after surgery

  • Death at 30 days after surgery

  • Long term survival


Predictive models are often based on regression analysis
Predictive models are often based on regression analysis outcome, e.g.

  • y ~ a + sum(bi*xi)

    y: outcome variable

    a: intercept

    bi: regression coefficient i

    xi: predictor variable i

    i in [1,many], usually 2 to 20


3 examples of regression
3 examples of regression outcome, e.g.

  • Quality of life one year after surgery:

    continuous outcome, linear regression

  • Death at 30 days after surgery:

    binary outcome, logistic regression

  • Long term survival:

    time-to-outcome, Cox regression


Predictive models make assumptions
Predictive models make assumptions outcome, e.g.

  • Distribution

  • Linearity of continuous variables

  • Additivity of effects


Example a simple logistic regression model
Example: a simple logistic regression model outcome, e.g.

  • 30day mortality ~ a + b1*sex + b2*age

    Assumptions:

  • Distribution of 30day mortality is binomial

  • Age has a linear effect

  • The effects of sex and age can be added


Assessing model assumptions
Assessing model assumptions outcome, e.g.

  • Examine model residuals

  • Perform specific tests

    • add nonlinear terms, e.g. age+age2

    • add interaction terms, e.g. sex*age


Model assumptions and predictions
Model assumptions and predictions outcome, e.g.

  • Better predictions if assumptions are met

  • Some violation inherent in empirical data

  • Evaluate predictions in new data


Evaluation of predictions
Evaluation of predictions outcome, e.g.

  • Calibration

    • average of predictions correct?

    • low and high predictions correct?

  • Discrimination

    • distinguish low risk from high risk patients?



3 types of validation
3 types of validation outcome, e.g.

  • Apparent: performance on sample used to develop model

  • Internal: performance on population underlying the sample

  • External: performance on related but slightly different population


Apparent validity
Apparent validity outcome, e.g.

  • Easy to calculate

  • Results in optimistic performance estimates


Apparent estimates optimistic since same data used for
Apparent estimates optimistic since same data used for: outcome, e.g.

  • Definition of model structure: e.g. selection and coding of variables

  • Estimation of model parameters: e.g. regression coefficients

  • Evaluation of model performance: e.g. calibration and discrimination


Internal validity
Internal validity outcome, e.g.

  • More difficult to calculate

  • Test model in new data, random from underlying population


Why internal validation
Why internal validation? outcome, e.g.

  • Honest estimate of performance should be obtained, at least for a population similar to the development sample

  • Internal validated performance sets an upper limit to what may be expected in other settings (external validity)


External validity
External validity outcome, e.g.

  • Moderately easy to calculate when new data are available

  • Test model in new data, different from development population


Why external validation
Why external validation? outcome, e.g.

  • Various factors may differ from development population, including

    • different selection of patients

    • different definitions of variables

    • different diagnostic or therapeutic procedures


Internal validation techniques
Internal validation techniques outcome, e.g.

  • Split-sample:

    • development / validation

  • Cross-validation:

    • alternating development / validation

    • extreme: n-1 develop / 1 validate (‘jack-knife’)

  • Bootstrap


Bootstrap is the preferred internal validation technique
Bootstrap is the preferred internal validation technique outcome, e.g.

  • bootstrap sample for model development: n patients drawn with replacement

  • original sample for validation: n patients

  • difference: optimism

  • efficiency: development and validation on n patients


Example bootstrap results for logistic regression model
Example: bootstrap results for logistic regression model outcome, e.g.

  • 30-day mortality ~ a + b1*sex + b2*age

    Apparent area under the ROC curve: 0.77

    Mean area of 200 bootstrap samples:0.772

    Mean area of 200 tests in original: 0.762

    Optimism in apparent performance: 0.01

    Optimism-corrected area: 0.76


External validation techniques
External validation techniques outcome, e.g.

  • Temporal validation: same investigators, validate in recent years

  • Spatial validation (other place): same investigators, cross-validate in centers

  • Fully external: other investigators, other centers


Example external validity of logistic regression model
Example: external validity of logistic regression model outcome, e.g.

  • 30-day mortality ~ a + b1*sex + b2*age

    Apparent area in 785 patients: 0.77

    Tested in 20,318 other patients: 0.74

    Tested by other investigators: ?



Summary
Summary outcome, e.g.

  • Apparent validity gives an optimistic estimate of model performance

  • Internal validity may be estimated by bootstrapping

  • External validity should be determined in other populations


Key references
Key references outcome, e.g.

  • tutorial and book on multivariable models(Harrell 1996, Stat Med 15:361-87; Harrell: regression modeling strategies, Springer 2001)

  • empirical evaluations of strategies(Steyerberg 2000: Stat Med19: 1059-79)

  • internal validation (Steyerberg 2001:JCE 54: 774-81)

  • external validation (Justice 1999: Ann Intern Med 130:515-24; Altman 2000: Stat Med 19: 453-73)


Links
Links outcome, e.g.

  • Interactive text book on predictive modelinghttp://www.neri.org/symptom/mockup/Chapter_8/

  • Harrell’s Regression modeling strategieshttp://hesweb1.med.virginia.edu/biostat/rms/


ad