When Is a Program Ready for Rigorous Impact Analysis?

[G]overnment should be seeking out creative, results-oriented programs like the ones here today and helping them replicate their efforts across America. President Barack Obama, 6/30/2009http://www.nationalservice.gov/about/ newsroom/ statements_detail.asp?tbl_pr_id=1828

When Is a Program Ready for Rigorous Impact Analysis? Diana Epstein (CAP/Center for American Progress)and Jacob Alex Klerman (Abt Associates) APPAM/HSE Conference“Improving the Quality of Public Services”,Moscow, June 2011

Outline • The Basic Argument • On Logic Models • Some Examples • Some Broader Implications • Discussion

The Goal • Identify program ideas that can successfully address pressing social problems • Roll them out nationally Program Idea Broad Rollout

Require Rigorous Impact Evaluation • Many apparently plausible programs “don’t’ work” • Many that work in one site, don’t work in another site • So, require Impact Evaluation “tollgate” • Usually random assignment • Saving money • This is the “New Orthodoxy” • Coalition for Effective Policy • OMB (2009) Program Idea Efficacy Trial Replication Effectiveness Trial Broad Rollout

Many Programs Fail RA, So Pilot • We argue that a rush to “random assignment evaluation” has two problems • Some programs clearly will not pass the Impact Evaluation “tollgate” • Some of those programs, would pass the Impact Evaluation with more “development” • A “pilot” would help with both problems • i.e., run the program for a while • Then, if the program is promising … • Start the Impact Analysis Program Idea Pilot Efficacy Trial Replication Effectiveness Trial Broad Rollout

Formative Evaluation/Process Evaluation • Formative Evaluation to improve the program • Process Evaluation to screen out programs that are unlikely to show impact Program Idea Formative Evaluation Process Evaluation Efficacy Trial Replication Effectiveness Trial Broad Rollout

Formative Evaluation/Process Evaluation • Formative Evaluation to improve the program • Process Evaluation to screen out programs that are unlikely to show impact • But, how do you do that? • New Orthodoxy: Only random assignment can reliably detect impact • So, how can a Process Evaluation screen? Program Idea Formative Evaluation Process Evaluation Efficacy Trial Replication Effectiveness Trial Broad Rollout

“Falsifiable Logic Models” Can Screen Program Idea Require Falsifiable Logic Model Formative Evaluation Process Evaluation Efficacy Trial Replication Effectiveness Trial Broad Rollout

“Falsifiable Logic Models” Can Screen Program Idea Require Falsifiable Logic Model Formative Evaluation Revise program and Falsifiable Logic Model Process Evaluation Efficacy Trial Replication Effectiveness Trial Broad Rollout

“Falsifiable Logic Models” Can Screen Program Idea Require Falsifiable Logic Model Formative Evaluation Revise program and Falsifiable Logic Model Process Evaluation Only proceed if program satisfies it’s own Falsifiable Logic Model Efficacy Trial Replication Effectiveness Trial Broad Rollout

Why Might this Work? • Logic Models explicate the path from resources to impacts • All but the “impact” step occur • In the treatment group • During (or at the end of) treatment Program Idea Formative Evaluation Process Evaluation Efficacy Trial Replication Effectiveness Trial Broad Rollout

Why Might this Work? • Logic Models explicate the path from resources to impacts • All but the “impact” step occur • In the treatment group • During (or at the end of) treatment • So, verifying the logic model does not require • Random assignment • Or even a control group • Long program follow-up • And expensive post-program survey tracking efforts Program Idea Formative Evaluation Process Evaluation Efficacy Trial Replication Effectiveness Trial Broad Rollout

But Will this Screen? Need Examples of … • Intermediate benchmarks • Resources/inputs, Activities, Outputs, Outcomes • … that were (should have been/could have been) specified in a Falsifiable Logic Model • And, that could be detected • Using only the treatment group • Without an expensive follow-up survey • Before (or perhaps shortly after) the end of treatment • Here goes …

Forms of Logic Model Failures: 1-3 • Acquire Resources: Form partnerships, acquire and retain staff (with target qualifications) • Salem ERA: Very high staff turnover • Recruit Cases: Fill the program • Portland ERA: Recruited only a third of target enrollees • Sustain Participation • Rural WTW Strategies Evaluation; SC Moving Up ERA; Cleveland Achieve ERA

Forms of Logic Model Failures: 4-5 • Implement with Fidelity • Mathematica Supplemental Reading Evaluation; Abt Mentoring Evaluation • Pre/Post Progress • MDRC NEWWS HCD program’s academic testing (but see GED)

Currently, program developers have an incentive to over-promise More likely to be funded But, underpowered Impact Evaluations and null results Inducing “Truth Telling”

Currently, program developers have an incentive to over-promise More likely to be funded But, underpowered Impact Evaluations and null results Process Evaluation tollgate gives an incentive to under-promise More likely to pass the Process Evaluation tollgate, but Less likely to fund Pilot And, less likely to fund Impact Evaluation Inducing “Truth Telling”

Currently, program developers have an incentive to over-promise More likely to be funded But, underpowered Impact Evaluations and null results Process Evaluation tollgate gives an incentive to under-promise More likely to pass the Process Evaluation tollgate, but Less likely to fund Pilot And, less likely to fund Impact Evaluation And if developing a Falsifiable Logic Model forces program developers to more thoroughly and realistically think through their program models, that’s good too! Inducing “Truth Telling”

For Program Operator Otherwise, an implicit expectation of proceeding E.g., ED i3, CNCS SIF, Orszag (2009) For Evaluator And probably different contractors Otherwise, an implicit expectation of proceeding And contractual considerations lean towards doing so Key Innovation: Separate Contracts

For Program Operator Otherwise, an implicit expectation of proceeding E.g., ED i3, CNCS SIF, Orszag (2009) For Evaluator And probably different contractors Otherwise, an implicit expectation of proceeding And contractual considerations lean towards doing so Key Innovation: Separate Contracts Current practice often runs Process Evaluation simultaneously with Impact Evaluation

Approach Seems Infeasible: Timeline • Evaluation timelines are already long • Inconsistent with • Pressing problems • Short-term attention to (and funding for) specific problems • This approach would make evaluation timelines much longer • Additional piloting • Additional contracting between the steps

Implicit Assumption: Programs are willing to subject themselves to: Long and burdensome evaluation Possibility (likelihood) of failure Plausible if: Program’s goal is broad scale rollout Rigorous evaluation is the only way to get there Programs are confident of “passing” Some positive examples (Nurse-Family Partnership; Teen Pregnancy Prevention Program; Orszag, 2009) But they are the exception, rather than the rules Approach Seems Infeasible: Willingness

When Is a Program Ready for Rigorous Impact Analysis? Diana Epstein (CAP/Center for American Progress)and Jacob Alex Klerman (Abt Associates) APPAM/HSE Conference “Improving the quality of Public Services”,Moscow, June 2011 When Is a Program Ready for Rigorous Impact Evaluation?

The Need for Impact Evaluation • Many apparently plausible programs “don’t’ work” • So, require Impact Evaluation “tollgate” • Usually random assignment • Saving money Program Idea Random Assignment Trial Broad Rollout

Efficacy Trial/Replication/Effectiveness Trial • Some programs that work in one site, don’t work in other sites • So: • Efficacy Evaluation (small trial, ideal conditions) • Replicate to other (and more) sites • Effectiveness trial at the replicated sites (larger trial, real world conditions) Program Idea Efficacy Trial Replication Effectiveness Trial Broad Rollout

This Is Hardly New • It’s the “New Orthodoxy” • Coalition for Effective Policy • OMB (2009) Program Idea Efficacy Trial Replication Effectiveness Trial Broad Rollout

This Is Hardly New • It’s the “New Orthodoxy” • Coalition for Effective Policy • OMB (2009) • And, we think that’s a problem Program Idea Efficacy Trial Replication Effectiveness Trial Broad Rollout

Random Assignment Has Lots of Problems • Random assignment fits Winston Churchill’s description of “democracy” • “The worst form of government [evaluation], except for all the others that have been tried from time to time.” • Random assignment is • Expensive • Long time lines • Subjects people to programs that don’t work Program Idea Efficacy Trial Replication Effectiveness Trial Broad Rollout

Random Assignment Has Lots of Problems • Random assignment fits Winston Churchill’s description of “democracy” • “The worst form of government [evaluation], except for all the others that have been tried from time to time.” • Random assignment is • Expensive • Long time lines • Subjects people to programs that don’t work • Can we do better? • Avoid evaluating programs with no impact • Improve programs so that they will have impact Program Idea Efficacy Trial Replication Effectiveness Trial Broad Rollout

Thus, Tollgate Is Implementable • Acquire Resources • Recruit Cases • Sustain Participation • Implement • with Fidelity • Pre/Post Progress • Falsifiable and specifiable in Logic Model • Measured in Treatment Group only • No expensive follow-up survey needed • Occurs during or shortly after program activities

Thus, Tollgate Is Implementable • Acquire Resources • Recruit Cases • Sustain Participation • Implement • with Fidelity • Pre/Post Progress • Falsifiable and specifiable in Logic Model • Measured in Treatment Group only • No expensive follow-up survey needed • Occurs during or shortly after program activities … with a Pilot Implementation and a Process Evaluation

When Is a Program Ready for Rigorous Impact Analysis? Diana Epstein (CAP/Center for American Progress)and Jacob Alex Klerman (Abt Associates) APPAM/HSE Conference “Improving the quality of Public Services”,Moscow, June 2011 When Is a Program Ready for Rigorous Impact Evaluation?

Logic Model Definition “The program logic model is defined as a picture of how your organization does its work – the theory and assumptions underlying the program. A program logic model links outcomes (both short- and long-term) with program activities/processes and the theoretical assumptions/principles of the program.” Source: W.K. Kellogg Foundation Logic Model Guide http://www.wkkf.org/~/media/6E35F79692704AA0ADCC8C3017200208.ashx

Logic Model: Your Planned Work … • YOUR PLANNED WORK describes what resources you think you need to implementyour program and what you intend to do. • 1. Resources include the human, financial, organizational, and community resources a program has available to direct toward doing the work. Sometimes this component is referred to as Inputs. • 2. Program Activities are what the program does with the resources. Activities are the processes, tools, events, technology, and actions that are an intentional part of the program implementation. These interventions are used to bring about the intended program changes or results. • YOUR INTENDED RESULTS include all of the program’s desired results (outputs, outcomes, and impact).

Logic Model: Your Intended Results … • YOUR PLANNED WORK describes what resources you think you need to implementyour program and what you intend to do. • YOUR INTENDED RESULTS include all of the program’s desired results (outputs, outcomes, and impact). • 3. Outputs are the direct products of program activities and may include types, levels and targets of services to be delivered by the program. • 4. Outcomes are the specific changes in program participants’ behavior, knowledge, skills, status and level of functioning. Short-term outcomes should be attainable within 1 to 3 years, while longer-term outcomes should be achievable within a 4 to 6 year timeframe. • 5. Impact is the fundamental intended or unintended change occurring in organizations, communities or systems as a result of program activities within 7 to 10 years.

If you don’t know where you’re going, how are you gonna’ know when you get there? Yogi Berra, New York Yankees Player and Manger, 1925- Happy families are all alike; every unhappy family is unhappy in its own way. Anna Karenina, Chapter 1, first lineLeo Tolstoy, Russian mystic & novelist (1828 – 1910)

Paper Status • Paper is nights and weekends work • In reaction to evaluation experience—positive and negative • Has been presented and read internally (Abt JASG) and externally (N. Campbell/ACF, Burt Barnow, Demetra Nightingale; we hope soon B. Kelly/ACF) • Probably going to try to present at ACF • Your comments—on presentation, on paper, and on ideas—much appreciated • In particular, more and better examples • (We hope) to a journal “soon”

Goal: Effective Programs

Question: How to Get There? ?

Most rigorously evaluated programs “fail” Even programs that pass initial efficacy trial often “fail” follow-on effectiveness trial And the more rigorous the evaluation, the more likely is “failure” => Evaluate before roll-out Otherwise implement ineffective programs Random Assignment is Necessary

Suggests a Random Assignment “Tollgate”

Most rigorously evaluated programs “fail” Even programs that pass initial efficacy trial, often “fail” follow-on effectiveness trial And the more rigorous the evaluation, the more likely is “failure” => Evaluate before roll-out Otherwise implement ineffective programs Random Assignment is Necessary, but Expensive • In dollars • In calendar time • In the lives of clients/participants who waste time in programs that don’t work

Random Assignment is Necessary, but Expensive • Most rigorously evaluated programs “fail” • Even programs that pass initial efficacy trial, often “fail” follow-on effectiveness trial • And the more rigorous the evaluation, the more likely is “failure” • => Evaluate before roll-out • Otherwise implement ineffective programs • In dollars • In calendar time • In the lives of clients/participants who waste time in programs that don’t work => Don’t evaluate programs that will fail. Duh!

When Is a Program Ready for Rigorous Impact Analysis?