What’s involved in “rigorous impact evaluation”?

What’s involved in “rigorous impact evaluation”? IOCE proposes more holistic perspectives Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011

Join me in a review the basics of:1. Evaluation Design2. Logic models3. Counterfactuals 4. Context (simple-complicated-complex)5. Evaluation Implementation

1. Evaluation Design

scale of major impact indicator An introduction to various evaluation designs Illustrating the need for quasi-experimental longitudinal time series evaluation design Project participants Comparison group baseline end of project evaluation post project evaluation 4

OK, let’s stop the action to identify each of the major types of evaluation (research) design … … one at a time, beginning with the most rigorous design. 5

X = Intervention (treatment), I.e. what the project does in a community O = Observation event (e.g. baseline, mid-term evaluation, end-of-project evaluation) P (top row): Project participants C (bottom row): Comparison (control) group First of all: the key to the traditional symbols: 6

Design #1: Longitudinal Quasi-experimental P1 X P2 X P3 P4 C1 C2 C3 C4 Project participants Comparison group baseline midterm end of project evaluation post project evaluation 7

Design #2: Quasi-experimental (pre+post, with comparison) P1 X P2 C1 C2 Project participants Comparison group baseline end of project evaluation 8

Design #2+: Typical Randomized Control Trial P1 X P2 C1 C2 Project participants Research subjects randomly assigned either to project or control group. Control group baseline end of project evaluation 9

Design #3: Truncated QED X P1 X P2 C1 C2 Project participants Comparison group midterm end of project evaluation 10

Design #4: Pre+post of project; post-only comparison P1 X P2 C Project participants Comparison group baseline end of project evaluation 11

Design #5: Post-test only of project and comparison X P C Project participants Comparison group end of project evaluation 12

Design #6: Pre+post of project; no comparison P1 X P2 Project participants baseline end of project evaluation 13

Design #7: Post-test only of project participants X P Project participants • Need to fill in missing data through other means: • What change occurred during the life of the project? • What would have happened without the project (counterfactual)? • How sustainable is that change likely to be? end of project evaluation 14

Note: These 7 evaluation designs are described in the RealWorld Evaluation book

What kinds of evaluation designs are actually used in the real world (of international development)? Findings from meta-evaluations of 336evaluation reports of an INGO.

Even proponents of RCTs have acknowledged that RTCs are only appropriate for perhaps 5% of development interventions. An empirical study by Forss and Bandstein, examining evaluations in the OECD/DAC DEReC database by bilateral and multilateral organisations found only 5% used even a counterfactual design. • While we recognize that experimental and quasi experimental designs have a place in the toolkit for impact evaluations, we think that more attention needs to be paid to the roughly 95% of situations where these designs would not be possible or appropriate.

2. Logic Models

Institutional and operational context Economic context in which the project operates Political context in which the project operates Socio-economic and cultural characteristics of the affected populations One form of Program Theory (Logic) Model Outputs Outcomes Design Inputs Implementation Process Impacts Sustainability Note: The orange boxes are included in conventional Program Theory Models. The addition of the blue boxes provides the recommended more complete analysis.

Consequences Consequences Consequences PROBLEM PRIMARY CAUSE 1 PRIMARY CAUSE 2 PRIMARY CAUSE 3 Secondary cause 2.3 Secondary cause 2.1 Secondary cause 2.2 Tertiary cause 2.2.1 Tertiary cause 2.2.2 Tertiary cause 2.2.3

Consequences Consequences Consequences DESIRED IMPACT OUTCOME 1 OUTCOME 2 OUTCOME 3 OUTPUT 2.3 OUTPUT 2.1 OUTPUT 2.2 Intervention 2.2.1 Intervention 2.2.2 Intervention 2.2.3

High infant mortality rate Children are malnourished Insufficient food Diarrheal disease Poor quality of food Need for improved health policies Contaminated water Unsanitary practices Flies and rodents Do not use facilities correctly People do not wash hands before eating

Reduction in poverty Women empowered Economic opportunities for women Women in leadership roles Young women educated Improved educational policies Curriculum improved Female enrollment rates increase Parents persuaded to send girls to school School system hires and pays teachers Schools built

To have synergy and achieve impact all of these need to address the same target population. Program Goal: Young women educated Advocacy Project Goal:Improved educational policies enacted Teacher Education Project Goal:Improve quality of curriculum Construction Project Goal:More classrooms built ASSUMPTION (that others will do this) OUR project PARTNER will do this Program goal at impact level

We need to recognize which evaluative process is most appropriate for measurement at various levels • Impact • Outcomes • Output • Activities • Inputs PROGRAMEVALUATION PROJECT EVALUATION PERFORMANCE MONITORING

The “Rosetta Stone of Logical Frameworks”

3. Alternative Counterfactuals

How do we know if the observed changes in the project participants or communities income, health, attitudes, school attendance, etc. are due to the implementation of the project credit, water supply, transport vouchers, school construction, etc. or to other unrelated factors? changes in the economy, demographic movements, other development programs, etc. Attribution and counterfactuals

What change would have occurred in the relevant condition of the target population if there had been no intervention by this project? The Counterfactual

Control group and comparison group • Control group = randomized allocation of subjects to project and non-treatment group • Comparison group = separate procedure for sampling project and non-treatment groups that are as similar as possible in all aspects except the treatment (intervention)

2003 2006 J-PAL is best understood as a network of affiliated researchers … united by their use of the randomized trial methodology… 2008 2010 Some recent developments in impact evaluation in international development 2009

So, are Randomized Control Trials (RCTs) are the Gold Standard and should they be used in most if not all program impact evaluations? Yes or no? Why or why not? If so, under what circumstances should they be used? If not, under what circumstances would they not be appropriate?

Evidence-based policy for simple interventions (or simple aspects): when RCTs may be appropriate

Complicated, complex programs where there are multiple interventions by multiple actors Projects working in evolving contexts (e.g. countries in transition, conflicts, natural disasters) Projects with multiple layered logic models, or unclear cause-effect relationships between outputs and higher level “vision statements” (as is often the case in the real world of international development projects) When might rigorous evaluations of higher-level “impact” indicators require muchmore than a simple RCT?

Reliable secondary data that depicts relevant trends in the population Longitudinal monitoring data (if it includes non-reached population) Qualitative methods to obtain perspectives of key informants, participants, neighbors, etc. There are other methods for assessing the counterfactual

A conventional statistical counterfactual (with random selection into treatment and control groups) is often not possible/appropriate: • When conducting the evaluation of complex interventions • When the project involves a number of interventions which may be used in different combinations in different locations • When each project location is affected by a different set of contextual factors • When it is not possible to use standard implementation procedures for all project locations • When many outcomes involve complex behavioral changes • When many outcomes are multidimensional or difficult to measure through standardized quantitative indicators. There are situations in which a statistical counterfactual is not appropriate – even when budget and time are not constraints

Some of the alternative approaches for constructing a counterfactual A: Theory based approaches Program theory / logic models Realistic evaluation Process tracing Venn diagrams and many other PRA methods Historical methods Forensic detective work Compilation of a list of plausible alternative causes … (for more details see www.RealWorldEvaluation.org)

Some of the alternative approaches for constructing a counterfactual B: Quantitatively oriented approaches Pipeline design Natural variations Creative uses of secondary data Creative creation of comparison groups Comparison with other programs Comparing different types of interventions Cohort analysis … (for more details see www.RealWorldEvaluation.org)

Some of the alternative approaches for constructing a counterfactual C: Qualitatively oriented approaches Concept mapping Creative use of secondary data Many PRA techniques Process tracing Compiling a book of possible causes Comparisons between different projects Comparisons among project locations with different combinations and levels of treatment (for more details see www.RealWorldEvaluation.org)

4. Context

Different lenses needed for different situations in the RealWorld

What’s a conscientious evaluator to do when facing such a complex world?

Consequences Consequences Consequences DESIRED IMPACT OUTCOME 1 OUTCOME 2 OUTCOME 3 A more comprehensive design OUTPUT 2.3 OUTPUT 2.1 OUTPUT 2.2 A Simple RCT Intervention 2.2.1 Intervention 2.2.2 Intervention 2.2.3

Expanding the results chain for multi-donor, multi-component program Increased rural H/H income Increased political participation Improved education performance Improved health Impacts Increased production Access to off-farm employment Increased school enrolment Intermediate outcomes Increased use of health services Outputs Credit for small farmers Health services Rural roads Schools Inputs Donor Government Other donors Attribution gets very difficult! Consider plausible contributionseach makes.

5. Evaluation Implementation

OECD-DAC (2002: 24) defines impact as “the positive and negative, primary and secondary long-term effects produced by a development intervention, directly or indirectly, intended or unintended. These effects can be economic, sociocultural, institutional, environmental, technological or of other types”. Definition of impact evaluation Is it limited to direct attribution? Or point to the need for counterfactuals or Randomized Control Trials (RCTs)?

Direct cause-effect relationship between one output (or a very limited number of outputs) and an outcome that can be measured by the end of the research project?  Pretty clear attribution. … OR … Changes in higher-level indicators of sustainable improvement in the quality of life of people, e.g. the MDGs (Millennium Development Goals)?  More significant. But assessing plausible contribution is more feasible than assessing unique direct attribution. So what should be included in a “rigorous impact evaluation”?

Rigorous impact evaluation should include (but is not limited to): thorough consultation with and involvement by a variety of stakeholders, articulating a comprehensive logic model that includes relevant external influences, getting agreement on desirable ‘impact level’ goals and indicators, adapting evaluation design as well as data collection and analysis methodologies to respond to the questions being asked, …

Rigorous impact evaluation should include (but is not limited to): 5) adequately monitoring and documenting the process throughout the life of the program being evaluated, 6) using an appropriate combination of methods to triangulate evidence being collected, 7) being sufficiently flexible to account for evolving contexts, …

What’s involved in “rigorous impact evaluation”?

What’s involved in “rigorous impact evaluation”?

Presentation Transcript

Documents

An introduction to Impact Evaluation

An introduction to Impact Evaluation

Impact Evaluation Methods: Randomization, IV, Regression Discontinuity

CIVITAS-ELAN 3.2. Study of congestion charging and dialogue on pricing

Tech Prep students are involved in rigorous high school programs

Impact Evaluation and the Project Cycle

Impact Evaluation: Presentation to DAC Evaluation Working Group, Paris, June 2, 2005

Business Registration Impact Evaluation (BRIE) IN MalaWI

2013 Statewide BIP Load Impact Evaluation

Impact Evaluation in the European Commission

Impact Evaluation: Presentation to DAC Evaluation Working Group, Paris, June 2, 2005

Issues in impact evaluation in design

Why we got involved

Impact Evaluation in Ethiopia

Impact evaluation Applying Analytical Methods: Microfinance Case Study

International impact evaluation conference, Cairo, April 2 nd 2009

Quality Impact Evaluation: an introductory workshop

Today: An introduction to impact evaluation Readings (recommended only):

ENHANCING POLICY EFFECTIVENESS: THE ROLE OF IMPACT EVALUATION

Impact Evaluation Conference, Cairo, Egypt April 2, 2009 Nyamwaya Munthali