1 / 17

A comparison of K-fold and leave-one-out cross-validation of empirical keys

A comparison of K-fold and leave-one-out cross-validation of empirical keys. Alan D. Mead, IIT mead@iit.edu. What is “Keying”?. Many selection tests do not have demonstrably correct answers Biodata, SJT, some simulations, etc. Keying is the constructing of a valid key

aleary
Download Presentation

A comparison of K-fold and leave-one-out cross-validation of empirical keys

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A comparison of K-fold and leave-one-out cross-validation of empirical keys Alan D. Mead, IIT mead@iit.edu

  2. What is “Keying”? • Many selection tests do not have demonstrably correct answers • Biodata, SJT, some simulations, etc. • Keying is the constructing of a valid key • What the “best” people answered is probably “correct” • Most approaches use a correlation, or something similar

  3. Correlation approach • Create 1-0 indicator variables for each response • Correlate indicators with a criterion (e.g., job performance) • If r > .01, key = 1 • If r < -.01, key = -1 • Else, key = 0 • Little loss by using 1,0,-1 key

  4. How valid is my key? • Now that I have a key, I want to compute a validity… • But I based my key on the responses of my “best” test-takers • Can/should I compute a validity in this sample? • No! Cureton (1967) showed that very high validities will result even for invalid keys • What shall I do?

  5. Validation Approaches • Charge ahead! • “Sure, .60 is an over-estimate; there will be shrinkage. But even half would still be substantial” • Split my sample into “calibration” and “cross-validation” samples • Fine if you have a large N… • Resample

  6. LOOCV procedure • Leave one out cross validation (LOOCV) resembles Tukey’s jackknife resampling procedure • Hold out one person 1 • Compute a key on remaining N-1 • Score the held-out person • Repeat with person 2, 3, 4, … • Produces N scores that do not capitalize on chance • Correlate the N scores with the criterion • (But use the total sample key for scoring)

  7. Mead & Drasgow, 2003 • Simulated test responses & criterion • Three approaches • Charge ahead • LOOCV • True cross-validation • Varying sample sizes: • N=50,100,200,500,1000

  8. LOOCV Results

  9. LOOCV Conclusions • LOOCV was much better than simply “charging ahead” • But consistently slightly worse than actual cross-validation • LOOCV has a large standard error • An elbow appeared at N=200

  10. K-fold keying • LOOCV is like using crossvalidation samples of N=1 • Break sample into K groups • E.g., N=200 and k=10 • Compute key 10 times • Each calibration sample N=190 • Each crossvalidation sample N=10 • Does not capitalize on chance • Potentially much more stable results

  11. Present study • Simulation study • Four levels of sample size • N=50, 100, 200, 500 • Several levels of K • K=2, 5, 10, 25, 50, 100, 200, 500 • K=2 is double cross validation • True validity = 0.40 • 35 item test with four responses

  12. Main Effect of Sample Size Note: Mean (Standard Error)

  13. Effect of k, N=50

  14. Effect of k, N=100

  15. Effect of k, N=200

  16. Effect of k, N=500

  17. Summary • N=50 is really too small a sample for empirical keying • Using a k that produces hold out samples of 4-5 seemed best • N=100, k= 20 • N=200, k= 50 • N=500, k= 100 • Traditional double cross validation was almost as good for N>100

More Related