1 / 12

American Lessons on Designing Reliable Impact Evaluations, from Studies of WIA and Its Predecessor Programs

American Lessons on Designing Reliable Impact Evaluations, from Studies of WIA and Its Predecessor Programs. Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates Jacob A. Klerman, Abt Associates. The early evaluations. 1960s: MDTA (pre/post) 1970s:

trudy
Download Presentation

American Lessons on Designing Reliable Impact Evaluations, from Studies of WIA and Its Predecessor Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. American Lessons on Designing Reliable Impact Evaluations, from Studies of WIA and Its Predecessor Programs Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates Jacob A. Klerman, Abt Associates

  2. The early evaluations • 1960s: MDTA (pre/post) • 1970s: • YEDPA (400+ studies; various methods) • CETA (comparison groups from national survey samples) • 1980s: • National Academy review of YEDPA studies found “little reliable information on the effectiveness of the programs”, recommended random assignment • More than a dozen CETA evaluations produced widely divergent impact estimates – with essentially the same data (Barnow, 1987) • DOL-convened expert panel recommended random assignment for evaluation of new Job Training Partnership Act (JTPA)

  3. Evaluating the econometric evaluations • LaLonde (1986) and Maynard and Fraker (1987) applied a variety of nonexperimental methods to data from a randomized trial, were unable to replicate the experimental estimates • Since then, a number of replication studies have been conducted (see summaries in Glazerman et al., 2003; Bloom et al., 2005; and Pirog et al., 2009). No nonexperimental method has consistently replicated experimental results.

  4. The current consensus • No known nonexperimental method can reliably produce unbiased estimates of the impact of training programs – this means that you can never know ex post whether you have a good estimate or not • Randomized trials are the strongly preferred method of estimating training program impacts on technical grounds • Randomized trials are also more intuitively understandable to policy makers than complex econometric methods • Nonexperimental studies frequently give rise to technical controversy that detracts from their credibility and acceptance, whereas randomized trials are generally accepted by both evaluators and policy makers

  5. Why is it so hard to obtain reliable results from nonexperimental studies? • “Impact” is the difference between trainees’ actual outcomes (e.g., earnings) and what those outcomes would have been without training • The fundamental problem of evaluation is to estimate what the trainees’ outcomes would have been without training • To see how difficult that task is, consider the time path of earnings for the JTPA control group – individuals who were just like the trainees except that they didn’t get JTPA services…

  6. Time path of earnings, control group, National JTPA Study

  7. Treatment Group Control Group What is the margin for error?

  8. Time path of earnings, program and comparison groups, from Heinrich et al.

  9. Our Conclusions/Recommendations (1) • Random assignment is the only safe way to estimate the impacts of training programs • Different nonexperimental approaches yield widely varying results • In dozens of replication studies, nonexperimental methods have almost never satisfactorily replicated the experimental estimates • The stakes are too high to take the kind of risk and uncertainty entailed in nonexperimental methods • Nonexperimental evaluations inevitably shift the debate from substance to method

  10. Our Conclusions/Recommendations (2) • If the ESF does decide to use nonexperimental methods: • Need to pay close attention to timing of job loss and pre-program dynamics of earnings in matching comparison group (a necessary, but not sufficient, condition) • Before adopting any nonexperimental method, it should be demonstrated that it replicates multiple experimental results (Note that what should be tested is an algorithm that can be applied in other evaluations, not a set of estimates that are unique to a single evaluation.)

  11. Our Conclusions/Recommendations (3) Learn from our mistakes – don’t spend 40 years repeating them!

  12. For copies of these slides, contact… larry.orr@comcast.net

More Related