1 / 13

Cross-validation for the selection of statistical models

Cross-validation for the selection of statistical models. Simon J. Mason Michael K. Tippett IRI. The Model Selection Problem. Given: Family of models M a and observations. Question: Which model to use? Goals: Maximize predictive ability given limited observations.

ccobb
Download Presentation

Cross-validation for the selection of statistical models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cross-validation for the selection of statistical models Simon J. Mason Michael K. Tippett IRI

  2. The Model Selection Problem • Given: Family of models Ma and observations. • Question: Which model to use? • Goals: • Maximize predictive ability given limited observations. • Accurately estimate predictive ability. • Example: Linear regression: • Observations (n=50); • Possible predictors sorted by correlation; • M1 uses first predictor, M2 uses first two predictors, etc.

  3. Estimating predictive ability Wrong way: Calibrate model with all data. Choose the model that best fits the data.

  4. In-sample skill estimates • Akaike information criterion (AIC). • AIC = -2 log (L) + 2p • asymptotic estimate of expected out of sample error. • Maximizing Mallows’ Cp = minimizing AIC • Bayesian information criterion (BIC) • BIC = -2 log(L)+p log(n) • Difference approximates Bayes factor. • L=likelihood, p=# parameters,n=# samples. Maximize fit, penalize complexity.

  5. AIC and BIC AIC = -2 log (L) + 2p BIC = -2 log(L)+p log(n) • BIC tends to select simpler models. • AIC is asymptotically (many obs.) inconsistent. • BIC consistent. • For constant model size, pick best fit. • Large pool of predictors leads to over-fitting.

  6. Out-of-sample skill estimates Calibrate and validate models using independent data sets. • Split data into calibration and validation data sets. • Repeatedly divide data. • Leave-1-out cross-validation; • Leave-k-out cross-validation. Properties?

  7. Leave-k-out CV is biased • Single predictor and predictand. Underestimates correlation. Increasing k reduces (increases) the bias for low (high) correlations. (Barnston & van den Dool 1993). • Multivariate linear regression. Overestimates RMS error with a bias ~ k/[n(n-k)] (Burman 1989). For a given model with significant skill, large k underestimates skill.

  8. On the other hand … Selection bias “If one begins with a very large collection of rival models, then we can be fairly sure that the winning model will have an accidentally high maximum likelihood term.” (Forster). • True predictive skill likely to be overestimated. • Impacts goals of • optimal model choice • accurate skill estimate. Ideally use an independent data set to estimate skill.

  9. In-sample and CV estimate • Leave-1-out cross-validation asymptotically equivalent to AIC (and Mallows’ Cp; Stone 1979). • Leave-k-out cross-validation asymptotically equivalent to BIC for well chosen k. • Increasing k tends to simpler models • CV with large k complex models by require them to estimate many parameters with little data.

  10. Leave-k-out cross validation • Leaving more out tends to select simpler models. • Choice of metric matters. • Correlation and rms error not simply related. • RMS error selects simpler models in numerical experiments.

  11. Impact on skill estimates • Leaving more out reduces skill estimate biases in numerical experiments.

  12. Better model selected? • If the “true” model is simple, leaving out more selects a better model.

  13. Conclusions • Increasing pool of predictors, increases chance of over-fitting and over-estimating skill. • AIC and BIC balance data-fit and model complexity. BIC chooses simpler models. • Leave-k-out cross-validation also penalizes model complexity. (Leave-1-out asymptotic to AIC). • Leaving more out • selects simpler models • reduces skill estimate bias.

More Related