1 / 147

Evaluation Notes Bergen COURSE Spring, 2010

Evaluation Notes Bergen COURSE Spring, 2010. Petra Todd University of Pennsylvania Department of Economics. The Evaluation Problem. Will study econometric methods for evaluating effects of active labor market programs Employment, training and job search assistance programs

mahina
Download Presentation

Evaluation Notes Bergen COURSE Spring, 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation NotesBergen COURSESpring, 2010 Petra Todd University of Pennsylvania Department of Economics

  2. The Evaluation Problem • Will study econometric methods for evaluating effects of active labor market programs • Employment, training and job search assistance programs • School subsidy programs • Health interventions

  3. Key questions • Do program participants benefit from the program? • Do program benefits exceed costs? • What is the social return to the program? • Would an alternative program yield greater impact at the same cost?

  4. Goals • Understand the identifying assumptions needed to justify application of different estimators • Statistical assumptions • Behavioral assumptions • Assumptions with regard to heterogeneity in how people respond to a program intervention

  5. Potential Outcomes • Y0 – outcome without treatment • Y1 – output with treatment • D=1 if receive treatment, else D=0 • Observed outcome Y=D Y1+(1-D) Y0 • Treatment Effect Δ= Y1-Y0 • Δ not directly observed, missing data problem

  6. Parameters of Interest • Average impact of treatment on the treated (TT) E(Y1-Y0|D=1,X) • Average treatment effect (ATE) E(Y1-Y0|X) • Average effect of treatment on the untreated (UT) E(Y1-Y0|D=10,X) • ATE=Pr(D=1|X)TT+(1-Pr(D=1|X))UT

  7. Other parameters of interest • Proportion of people benefiting from the program Pr(Y1>Y0|D=1)=Pr(Δ>0|D=1) • Distribution of treatment effects F(Δ|D=1,X) • Selected quantile Inf {Δ:F(Δ|D=1,X)>q}

  8. Model for potential outcomes with and without treatment • Model: Y1=Xβ1+U1 Y0=Xβ0+U0 E(U1|X)=E(U0|X)=0 • Observed outcome: Y=Y0+E(Y1-Y0) Y= Xβ0+D(Xβ1- Xβ0)+U0+D(U1-U0)

  9. Distinction between TT and ATE • TT=E(Δ|D=1,X)=Xβ1- Xβ0+E(U1-U0|D=1,X) • ATE= E(Δ|X)=Xβ1- Xβ0 • TT depends on structural parameters as well as means of unobservables • Parameters are the same if • (A1) U1=U0 • (A2) E(U1-U0|D=1,X)=0 • Condition (A2) means that D is uninformative on U1-U0, , i.e. ex post heterogeneity but not acted on ex ante

  10. Three Commonly Made Assumptions from least to most general • Coefficient on D is fixed (given X) and is the same for everyone (most restrictive) • U1=U0 • Y=Xβ+Dα(X)+U • E(Y1-Y0|X,D)= α(X)

  11. Coefficient on D is random given X, but U1-U0 does not help predict participation in the program Pr(D=1| U1-U0 ,X)=Pr(D=1|X) which implies E(U1-U0 |D=1,X)= E(U1-U0 |X) • Coefficient on D is random given X and D helps predict program participation (least restrictive) E(U1-U0 |D=1,X)≠E(U1-U0 |X)

  12. How Can Randomization Solve the Evaluation Problem? • Comparison group selected using a randomization devise to randomly exclude some fraction of program applicants from the program • Main advantage – increase comparability between program participants and nonpartcipants • Have same distribution of observables and of unobservables • Satisfy program eligibility criteria

  13. What problems can arise in social experiments? • Randomization bias – occurs when introducing randomization changes the way the program operates • Greater recruitment needs may lead to change in acceptance standards • Individuals may decide not to apply if they know they will be subject to randomization

  14. Contamination bias – occurs when control group members seek alternative forms of treatment • Ethical considerations – there may be opposition to the experiment and some sites may refuse to participate, which poses a threat to external validity • Dropout – some of the treatment group members may drop out before completing the program • Sample attrition – may have differential attrition between the treatment and control groups

  15. At what stage should randomization be applied? • Randomization after acceptance into the program • Randomization of eligibility • Let R=1 if randomized (treatment group), • R=0 if randomized out (control group) • Let Y1* and Y0* denote outcomes • Let D* denote someone who applies to the program and is subject to randomization

  16. From treatment group, get E(Y1*|X,D*=1,R=1) • From control group, get E(Y0*|X,D*=1,R=0) • No randomization bias and random assignment implies E(Y1*|X,D*=1,R=1)=E(Y1|X,D=1) E(Y0*|X,D*=1,R=0)=E(Y0|X,D=1) • Thus, the experiment gives TT=E(Y1-Y0|X,D=1)

  17. How does program dropout affect experiments? • Can define treatment as “intent-to-treat” or “offer of treatment,” in which case dropout not a problem • If dropout occurs prior to receiving the program (i.e. dropouts do not get treatment), then could treat it like randomization on eligibility.

  18. Randomization on eligibility • Let e=1 if eligible, e=0 if not eligible • Let D=1 denote would-be participants if program were made available E(Y|X,e=1)=Pr(D=1|X,e=1)E(Y1|X,e=1,D=1) + Pr(D=0|X,e=1)E(Y0|X,e=1,D=0) E(Y|X,e=0)=Pr(D=1|X,e=0)E(Y0|X,e=0,D=1) + Pr(D=0|X,e=0)E(Y0|X,e=0,D=0)

  19. Because eligibility is randomized, Pr(D=1|X,e=1)=Pr(D=1|X,e=0) Pr(D=0|X,e=1)=Pr(D=0|X,e=0) E(Y0|X,e,D=1)= E(Y0|X,D=1) E(Y1|X,e,D=1)= E(Y1|X,D=1) • Thus, difference in previous two equations gives Pr(D=1|X,e=1){E(Y1|X,e,D=1)-E(Y0|X,D=1)}

  20. What about control group contamination? • Not necessarily a problem if willing to define benchmark state as being excluded from the program

  21. What about sample attrition? • Attrition is a problem that is common to both experimental and nonexperimental studies • Attrition occurs when some people are not followed in the data (maybe due to nonresponse) • If attrition is nonrandom with respect to treatment, then attrition requires the use of nonexperimental evaluation methods

  22. Sources of bias in estimating E(Δ|X), E(Δ|X,D=1)

  23. Traditional (Simple) Regression Estimators • Cross-section • Before-after • Difference-in-differences

  24. “Ashenfelter’s Dip” Mean Y D=1 D=0 T=0

  25. Before-after estimators

  26. Drawbacks and Advantages of before-after approach • Drawbacks • Identification breaks down in the presence of time-specific intercepts • Can be sensitive to choice of time periods because of Ashenfelter Dip pattern • Advantage • minimal data requirements - only requires data on participants.

  27. Cross-section estimators

  28. Difference-in-difference estimators

  29. Advantages • Allows for time-specific intercepts that are common across groups • Consistent under fixed effect error structure – therefore allows for time-invariant unobservables to affect participation decisions and program outcomes

  30. Matching Estimators • Assume have access to data on treated and untreated individuals (D=1 and D=0) • Assume also have access to a set of X variables whose distribution is not affected by D F(X|D,YP)=f(X|YP) where YP=(Y0,Y1) “potential outcomes”

  31. Matching estimators pair treated individuals with observably similar untreated individuals • Usually assumed that (Y0,Y1) ╨ D | X (M-1) or Pr(D=1|X, Y0,Y1) = Pr(D=1|X) and 0<Pr(D=1|X)<1 (M-2) • To justify this assumption, individuals cannot select into the program based on anticipated treatment impact

  32. Assumption (M-1) implies F(Y0|D=1,X)=F(Y0|D=0,X)=F(Y0|X) F(Y1|D=1,X)=F(Y1|D=0,X)=F(Y1|X) also E(Y0|D=1,X)=E(Y0|D=0,X)=E(Y0|X) E(Y1|D=1,X)=E(Y1|D=0,X)=E(Y1|X) • Under assumptions that justify matching, can estimate TT, ATE, and UT

  33. Let n denote number of observations in the treatment group • A typical matching estimator for TT takes the form:

  34. is an estimator for the matched no treatment outcome Recall, that (M-1) implies

  35. How does matching compare to a randomized experiment? • Distribution of observables will by construction be the same matched control group as in the treatment group • However, distribution of unobservables not necessarily balanced across groups • Experiment has full support (M-2), but with matching there can be a failure of the common support condition (when matches cannot be found)

  36. Even though matching methods assume E(Y1-Y0|D=1,X)=E(Y1-Y0|X) Could still potentially have E(Y1-Y0|D=1)≠E(Y1-Y0) E(Δ|D=1)=∫E(Δ|D=1,X)f(X|D=1)dX E(Δ)=∫E(Δ|X)f(X)dX

  37. If interest centers on TT, (M-1) can be replaced by weaker assumption E(Y0|X,D=1)=E(Y0|X,D=0)=E(Y0|X) • The weaker assumption allows selection into the program to depend on Y1 and allows E(Y1-Y0|X,D)≠E(Y1-Y0|X) • Only require Pr(D=1|X,Y0,Y1)=Pr(D=1|X,Y1)

  38. Practical problems in Matching • Problems • How to construct match when X is of high dimension • How to choose set of X values • What do to if Pr(D=1|X)=1 for some X (violation of common support condition (M-1))

  39. Rosenbaum and Rubin (1983) Theorem • Provide a solution to the problem of constructing a match when X is of high dimension • Show that (Y0,Y1) ╨ D | X Implies (Y0,Y1) ╨ D | Pr(D=1|X) • Reduces the matching problem to a univariate problem, provided Pr(D=1|X) can be parametrically estimated • Pr(D=1|X) is known as the propensity score

  40. Proof of RR theorem • Let P(X)=Pr(D=1|X) • E(D|Y0,P(X))=E(E(D|Y0,X)|Y0,P(X)) = E(P(X)|Y0,P(X)) =P(X) Where first equality holds because X is finer than P(X) • E(D|Y0,X)=E(D|X)=P(X)

  41. Matching can be implemented in two steps • Step 1: estimate a model for program participation, estimate the propensity score P(Xi) for each person • Step 2: Select matches based on the estimated propensity score

  42. Ways of constructing matched outcomes • Define a neighborhood C(Pi) for each person i Є{Di=1} • Neighbors are persons in {Dj=0} for whom PjЄ C(Pi) • Set of persons matched to i is Ai={jЄ{Di=0} such that PjЄ C(Pi)}

  43. Nearest Neighbor Matching • C(Pi)=min || Pi-Pj || j jЄ{Di=0} => Ai is a singleton set • Caliper matching Matches only made if || Pi-Pj ||<ε for some prespecified tolerance (tries to avoid bad matches)

  44. Kernel Matching • Estimate matched outcomes by nonparametric regression

  45. Local Linear Regression Matching

More Related