1 / 79

Module 3: Impact Evaluation for TTLs

Module 3: Impact Evaluation for TTLs. Paul J. Gertler Chief Economist, HDN Sebastian Martinez Impact Evaluation Cluster, AFTRL HD Learning Week Washington DC November 2006. Slides by Paul Gertler and Sebastian Martinez. Measuring Impact. What makes a good impact evaluation?.

tania
Download Presentation

Module 3: Impact Evaluation for TTLs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Module 3: Impact Evaluation for TTLs Paul J. Gertler Chief Economist, HDN Sebastian Martinez Impact Evaluation Cluster, AFTRL HD Learning Week Washington DC November 2006 Slides by Paul Gertler and Sebastian Martinez

  2. Measuring Impact What makes a good impact evaluation?

  3. Motivation • “Traditional” M&E: • Is the program being implemented as designed? • Could the operations be more efficient? • Are the benefits getting to those intended? • Monitoring trends • Are indicators moving in the right direction? •  NO inherent Causality • Impact Evaluation: • What was the effect of the program on outcomes? • Because of the program, are people better off? • What would happen if we changed the program? •  Causality

  4. Motivation • Objective in evaluation is to estimate the CAUSAL effect of intervention X on outcome Y • What is the effect of a cash transfer on household consumption? • For causal inference we must understand the data generation process • For impact evaluation, this means understanding the behavioral process that generates the data • how benefits are assigned

  5. Causation versus Correlation • Recall: correlation is NOT causation • Necessary but not sufficient condition • Correlation: X and Y are related • Change in X is related to a change in Y • And…. • A change in Y is related to a change in X • Causation – if we change X how much does Y change • A change in X is related to a change in Y • Not necessarily the other way around

  6. Causation versus Correlation • Three criteria for causation: • Independent variable precedes the dependent variable. • Independent variable is related to the dependent variable. • There are no third variables that could explain why the independent variable is related to the dependent variable • External validity • Generalizability: causal inference to generalize outside the sample population or setting

  7. Motivation • The word cause is not in the vocabulary of standard probability theory. • Probability theory: two events are mutually correlated, or dependent  if we find one, we can expect to encounter the other. • Example age and income • For impact evaluation, we supplement the language of probability with a vocabulary for causality.

  8. Statistical Analysis & Impact Evaluation • Statistical analysis: Typically involves inferring the causal relationship between X and Y from observational data • Many challenges & complex statistics • Impact Evaluation: • Retrospectively: • same challenges as statistical analysis • Prospectively: • we generate the data ourselves through the program’s design  evaluation design • makes things much easier!

  9. How to assess impact • What is the effect of a cash transfer on household consumption? • Formally, program impact is: α = (Y | P=1) - (Y | P=0) • Compare same individual with & without programs at same point in time • So what’s the Problem?

  10. Solving the evaluation problem • Problem: we never observe the same individual with and without program at same point in time • Need to estimate what would have happened to the beneficiary if he or she had not received benefits • Counterfactual: what would have happened without the program • Difference between treated observation and counterfactual is the estimated impact

  11. Finding a good counterfactual • The treated observation and the counterfactual: • have identical factors/characteristics, except for benefiting from the intervention • No other explanations for differences in outcomes between the treated observation and counterfactual • The only reason for the difference in outcomes is due to the intervention

  12. Measuring Impact Tool belt of Impact Evaluation Design Options: • Randomized Experiments • Quasi-experiments • Regression Discontinuity • Difference in difference – panel data • Other (using Instrumental Variables, matching, etc) • In all cases, these will involve knowing the rule for assigning treatment

  13. Choosing your design • For impact evaluation, we will identify the “best” possible design given the operational context • Best possible design is the one that has the fewest risks for contamination • Omitted Variables (biased estimates) • Selection (results not generalizable)

  14. Case Study • Effect of cash transfers on consumption • Estimate impact of cash transfer on consumption per capita • Make sure: • Cash transfer comes before change in consumption • Cash transfer is correlated with consumption • Cash transfer is the only thing changing consumption • Example based on Oportunidades

  15. Oportunidades • National anti-poverty program in Mexico (1997) • Cash transfers and in-kind benefits conditional on school attendance and health care visits. • Transfer given preferably to mother of beneficiary children. • Large program with large transfers: • 5 million beneficiary households in 2004 • Large transfers, capped at: • $95 USD for HH with children through junior high • $159 USD for HH with children in high school

  16. Oportunidades Evaluation • Phasing in of intervention • 50,000 eligible rural communities • Random sample of of 506 eligible communities in 7 states - evaluation sample • Random assignment of benefits by community: • 320 treatment communities (14,446 households) • First transfers distributed April 1998 • 186 control communities (9,630 households) • First transfers November 1999

  17. Oportunidades Example

  18. “Counterfeit” CounterfactualNumber 1 • Before and after: • Assume we have data on • Treatment households before the cash transfer • Treatment households after the cash transfer • Estimate “impact” of cash transfer on household consumption: • Compare consumption per capita before the intervention to consumption per capita after the intervention • Difference in consumption per capita between the two periods is “treatment”

  19. Case 1: Before and After • Compare Y before and after intervention αi = (CPCit | T=1) - (CPCi,t-1| T=0) • Estimate of counterfactual (CPCi,t| T=0) = (CPCi,t-1| T=0) • “Impact” = A-B CPC Before After A B t-1 t Time

  20. Case 1: Before and After

  21. Case 1: Before and After • Compare Y before and after intervention αi = (CPCit | T=1) - (CPCi,t-1| T=0) • Estimate of counterfactual (CPCi,t| T=0) = (CPCi,t-1| T=0) • “Impact” = A-B • Does not control for time varying factors • Recession: Impact = A-C • Boom: Impact = A-D CPC Before After A D? B C? t-1 t Time

  22. “Counterfeit” CounterfactualNumber 2 • Enrolled/Not Enrolled • Voluntary Inscription to the program • Assume we have a cross-section of post-intervention data on: • Households that did not enroll • Households that enrolled • Estimate “impact” of cash transfer on household consumption: • Compare consumption per capita of those who did not enroll to consumption per capita of those who enrolled • Difference in consumption per capita between the two groups is “treatment”

  23. Case 2: Enrolled/Not Enrolled

  24. Those who did not enroll…. • Impact estimate: αi = (Yit | P=1) - (Yj,t| P=0) , • Counterfactual: (Yj,t| P=0) ≠ (Yi,t| P=0) • Examples: • Those who choose not to enroll in program • Those who were not offered the program • Conditional Cash Transfer • Job Training program • Cannot control for all reasons why some choose to sign up & other didn’t • Reasons could be correlated with outcomes • We can control for observables….. • But are still left with the unobservables

  25. Impact Evaluation Example:Two counterfeit counterfactuals • What is going on?? • Which of these do we believe? • Problem with Before-After: • Can not control for other time-varying factors • Problem with Enrolled-Not Enrolled: • Do no know why the treated are treated and the others not

  26. Possible Solutions… • We need to understand the data generation process • How beneficiaries are selected and how benefits are assigned • Guarantee comparability of treatment and control groups, so ONLY difference is the intervention

  27. Measuring Impact • Experimental design/randomization • Quasi-experiments • Regression Discontinuity • Double differences (diff in diff) • Other options

  28. Choosing the methodology….. • Choose the most robust strategy that fits the operational context • Use program budget and capacity constraints to choose a design, i.e. pipeline: • Universe of eligible individuals typically larger than available resources at a single point in time • Fairest and most transparent way to assign benefit may be to give all an equal chance of participating  randomization

  29. Randomization • The “gold standard” in impact evaluation • Give each eligible unit the same chance of receiving treatment • Lottery for who receives benefit • Lottery for who receives benefit first

  30. Population Randomization Sample Randomization Treatment Group Control Group

  31. External & Internal Validity • The purpose of the first-stage is to ensure that the results in the sample will represent the results in the population within a defined level of sampling error (external validity). • The purpose of the second-stage is to ensure that the observed effect on the dependent variable is due to some aspect of the treatment rather than other confounding factors (internal validity).

  32. Case 3: Randomization • Randomized treatment/controls • Community level randomization • 320 treatment communities • 186 control communities • Pre-intervention characteristics well balanced

  33. Baseline characteristics

  34. Case 3: Randomization

  35. Impact Evaluation Example: No Design v.s. Randomization

  36. Measuring Impact • Experimental design/randomization • Quasi-experiments • Regression Discontinuity • Double differences (diff in diff) • Other options

  37. Case 4: Regression Discontinuity • Assignment to treatment is based on a clearly defined index or parameter with a known cutoff for eligibility • RD is possible when units can be ordered along a quantifiable dimension which is systematically related to the assignment of treatment • The effect is measured at the discontinuity – estimated impact around the cutoff may not generalize to entire population

  38. Indexes are common in targeting of social programs • Anti-poverty programs  targeted to households below a given poverty index • Pension programs  targeted to population above a certain age • Scholarships  targeted to students with high scores on standardized test • CDD Programs  awarded to NGOs that achieve highest scores

  39. Example: effect of cash transfer on consumption • Target transfer to poorest households • Construct poverty index from 1 to 100 with pre-intervention characteristics • Households with a score <=50 are poor • Households with a score >50 are non-poor • Cash transfer to poor households • Measure outcomes (i.e. consumption) before and after transfer

  40. Non-Poor Poor

  41. Treatment Effect

  42. Case 4: Regression Discontinuity • Oportunidades assigned benefits based on a poverty index • Where • Treatment = 1 if score <=750 • Treatment = 0 if score >750

  43. Case 4: Regression Discontinuity Baseline – No treatment 2

  44. Case 4: Regression Discontinuity Treatment Period

  45. Potential Disadvantages of RD • Local average treatment effects – not always generalizable • Power: effect is estimated at the discontinuity, so we generally have fewer observations than in a randomized experiment with the same sample size • Specification can be sensitive to functional form: make sure the relationship between the assignment variable and the outcome variable is correctly modeled, including: • Nonlinear Relationships • Interactions

  46. Advantages of RD for Evaluation • RD yields an unbiased estimate of treatment effect at the discontinuity • Can many times take advantage of a known rule for assigning the benefit that are common in the designs of social policy • No need to “exclude” a group of eligible households/individuals from treatment

  47. Measuring Impact • Experimental design/randomization • Quasi-experiments • Regression Discontinuity • Double differences (Diff in diff) • Other options

  48. Case 5: Diff in diff • Compare change in outcomes between treatments and non-treatment • Impact is the difference in the change in outcomes • Impact = (Yt1-Yt0) - (Yc1-Yc0)

More Related