Bootstrap and Model Validation

Bootstrap and Model Validation

Outline Introduction: Model validation Bootstrap method Predictive performance Use bootstrap and other methods for model validation

Regression Analysis Demonstrate association: Evaluation the relationship between an outcome and covariates e.g, Association Between Helicopter vs Ground Emergency Medical Services and Survival for Adults With Major Trauma JAMA. 2012;307(15):1602-1610.

Association We are interested in the beta coefficient of the regression model, e.g., In the multivariable regression model, for patients transported to level I trauma centers, helicopter transport was associated with an improved odds of survival compared with ground transport (odds ratio [OR], 1.16; 95% CI, 1.14-1.17; P < .001).

Regression Analysis Prediction and forecasting: e.g., Regression Tree Analysis. Decompensated Heart Failure: Classification and Risk Stratification for In-Hospital Mortality in Acutely. JAMA. 2005;293(5):572-580

Predictive score Predictive score construction: e.g., score (H) is generally based on the results of regression model: H=(β1×covariate A )+(β2×covariate B)+(β3×covariate C), and so on, where β1, β2, and β3 denote the estimates of beta coefficients for covariates A, B, and C and were obtained by fitting the regression model for the outcome of interest.

Model validation Model validation is applied to regression models for prediction purpose.

Model Validation MODEL VALIDATION in general has at least two parts: 1. Model selection: to choose the best model based on model performance. 2. Model assessment: to estimate performance for a final chosen model.

Model Validation • Here we study various methods for model assessment ( how well the model is to predict a future outcome?) Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis E W. Steyerberg et. al Journal of Clinical Epidemiology 54 (2001) 774–781

What is bootstrap? Randomly sampling, with replacement, from an original dataset for use in obtaining statistical inference.

Example: 95% CI of Sample mean

Bootstrap Bootstrap theory says that the distance between the population mean and sample mean is similar to the distance between sample mean and bootstrap ‘subsample’ mean.

Examples for bootstrap: 95% CI for: • Correlation coefficient • CV = SD/mean • AUC of ROC • Median

Model Validation External validation: use a training (derivation) data to build the model and a test (validation) data to validate the model. example: old vs new patients, one vs another dataset, Internal validation: use the same dataset for model building and validation.

Model Validation • we use regression analysis to construct the predictive model to provide an estimate of patient outcome. • The apparent performance of the model on this training set will be better than the performance in another data set, even if the latter test set consists of patients from the same population. (this is called optimism)

Statistical Inference

Model Validation • Data: GUSTO-I data gives 30-day mortality in patients with acute myocardial infarction. this data set consists of 40,830 patients, of whom 2851 (7.0%) had died at 30 days. • Response(Y): 30 day mortality • Predictors(X): age > 65 years, high risk (anterior infarct location or previous MI), diabetes, shock, hypotension (systolic blood pressure< 100 mmHg), tachycardia (pulse > 80), relief of chest pain > 1 hr, female gender.

Predictive performance: optimism • Produce training set and test set based on GUSTO-1 data (EPV: event per variable) • Example: EPV=5, 7% event rate => training data set: 5*8=40 death out of 571 patients => test data set: 2811 death out of 40259 patients

Predictive performance: 1. concordance: the c statistic. For binary outcomes, c is identical to the area under the receiver operating characteristic (ROC) curve; c varies between 0.5 and 1.0 for sensible models (the higher the better)

Predictive performance: 2. The calibration slope is the regression coefficient b in a logistic model with the predictive score as the only covariate: logit(mortality) = a+ b * predictive score. Well-calibrated models have a slope of 1, while models providing too extreme predictions have a slope less than 1.

Predictive performance: 3. The Brier score (or average prediction error) is calculated as Sum(y_i -p_i)^2/n, where y_i denotes the observed outcome and p_i the prediction for subject i in the data set of n subjects. 4. D is a scaled version of the model chi-square, which is a function of log-likelihood 5. R^2 as a measure of explained variation.

Internal Validation Method • A few methods to estimate model performance (Table 1) • Split sample: randomly split the training data in two parts: one to develop the model and another to measure its performance. The split was made once and at random.

Cross-Validation • cross-validation: With split-half cross-validation, the model is developed on one randomly drawn half and tested on the other and vice versa. The average is taken as estimate of performance. Other fractions of subjects may be left out (e.g., 10% to test a model developed on 90% of the sample). This procedure is repeated 10 times, such that all subjects have once served to test the model.

Cross Validation To improve the stability of the cross-validation, the whole procedure can be repeated several times, taking new random subsamples. The most extreme cross-validation procedure is to leave one subject out at a time, which is equivalent to the jack-knife technique.

Bootstrap • Bootstrapping replicates the process of sample generation from an underlying population by drawing samples with replacement from the original data set, of the same size as the original data set. Models may be developed in bootstrap samples and tested in the original sample.

Bootstrap • regular bootstrap: the model as estimated in the bootstrap sample was evaluated in the bootstrap sample and in the original sample. The performance in the bootstrap sample represents estimation of the apparent performance, and the performance in the original sample represents test performance. The difference between these performances is an estimate of the optimism in the apparent performance.

Bootstrap • This difference is averaged to obtain a stable estimate of the optimism. internally validated performance . • optimism= average (bootstrap performance – test performance). • Estimated performance = apparent performance – optimism.

Model Validation • Performance: (Fig2 and Table 2) • Conclusión: • split-sample approach tends to produce larger difference between estimated performance and test performance, unless a very large sample is available. • However, with a large sample size (e.g., EPV > 40), optimism is small and the apparent estimates of model performance are attractive because of their stability. • Regular bootstrapping provides better estimates of internal validity of logistic regression models constructed in smaller samples (e.g., EPV10)

Review What is bootstrap? Model validation Internal vs external model validation Optimism in internal validation Using bootstrap and other methods to correct optimism

Reference 1. Efron and Tibshirani (1993), An Introduction to the Bootstrap, Chapman &Hall/CRC 2. Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis E W. Steyerberg et. al Journal of Clinical Epidemiology 54 (2001) 774–781

Bootstrap and Model Validation

Bootstrap and Model Validation

Presentation Transcript

Bootstrap and Cross-Validation

Model Evaluation, Validation and Testing

Model Validation

Model Assessment and Validation

Model Building and Validation

Model calibration and validation

Model Validation

Model Based Validation

Model Selection and Validation

Model Calibration and Validation

Statistical Model Calibration and Validation

Model Building, Refinement, and Validation

Uncertainty analysis and Model Validation

Model Validation

Model Based Validation

Model Selection and Validation

Model Validation and Bootstrapping