1 / 45

Instrumental Variables Estimation (with Examples from Criminology)

Instrumental Variables Estimation (with Examples from Criminology). Robert Apel, Ph.D. School of Criminal Justice University at Albany. Center for Social and Demographic Analysis University at Albany May 5 & 7, 2009. Vital Statistics. Ph.D., Criminology and Criminal Justice, 2004

Pat_Xavi
Download Presentation

Instrumental Variables Estimation (with Examples from Criminology)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instrumental Variables Estimation (with Examples from Criminology) Robert Apel, Ph.D. School of Criminal Justice University at Albany Center for Social and Demographic Analysis University at Albany May 5 & 7, 2009

  2. Vital Statistics • Ph.D., Criminology and Criminal Justice, 2004 • University of Maryland • Coursework in Department of Economics • Dissertation used instrumental variables • State child labor laws as instrumental variables for the causal effect of youth employment on antisocial behavior

  3. Topics That Will Be Covered in this Workshop • Why use IV? • Discussion of endogeneity bias • Statistical motivation for IV • What is an IV? • Identification issues • Statistical properties of IV estimators • How is an IV model estimated? • Software and data examples • Diagnostics: IV relevance, IV exogeneity, Hausman

  4. Review of the Linear Model • Population model: Y = α + βX + ε • Assume that the true slope is positive, so β > 0 • Sample model: Y = a + bX + e • Least squares (LS) estimator of β: bLS= (X′X)–1X′Y = Cov(X,Y) / Var(X) • Under what conditions can we speak of bLS as a causal estimate of the effect of X on Y?

  5. Review of the Linear Model • Key assumption of the linear model: E(X′e) = Cov(X,e) = E(e | X) = 0 • Exogeneity assumption = X is uncorrelated with the unobserved determinants of Y • Important statistical property of the LS estimator under exogeneity: E(bLS) =β + Cov(X,e) / Var(X) plim(bLS) =β + Cov(X,e) / Var(X) Second terms 0, so bLS unbiased and consistent

  6. Endogeneity and the Evaluation Problem • When is the exogeneity assumption violated? • Measurement error → Attenuation bias • Instantaneous causation → Simultaneity bias • Omitted variables → Selection bias • Selection bias is the problem in observational research that undermines causal inference • Measurement error and instantaneous causation can be posed as problems of omitted variables

  7. e X Y u v When Is the Exogeneity Assumption Violated? (1)Measurement error in X (u) that is correlated with M.E. in Y (v) or with the model error (e) • Classical M.E. leads to attenuation, 0 < E(bLS) < β, but non-random M.E. (or correlation between M.E. and X, Y, V, and/or e) introduces unknown biases And, if there are multiple X’s, bias contaminates the whole model, not just the coefficient on the X measured with error (a.k.a. “smearing”)

  8. X Y When Is the Exogeneity Assumption Violated? (2)Instantaneous causation of Y on X • Direction of the bias depends on what the sign is for the feedback effect, Y → X • If positive, E(bLS) > β, so overestimate true effect • If negative, E(bLS) < β, so underestimate true effect and in severe cases can even flip the sign so that E(bLS) < 0 even though β > 0 This non-recursivity complicates the relationship between price and quantity in economics

  9. X Y W When Is the Exogeneity Assumption Violated? (3) Omitted variable (W) that is correlated with both X and Y • Classic problem of omitted variables bias • Coefficient on X will absorb the indirect path through W, whose sign depends on Cov(X,W) and Cov(W,Y) Things more complicated in applied settings because there are bound to be many W’s, not to mention that the “smearing” problem applies in this context also

  10. Example #1: Police Hiring • Measurement error • Mobilization of sworn officers (M.E. in X) as well as differential victim reporting or crime recording (M.E. in Y) may be correlated with police size • Instantaneous causation • More police might be hired during a crime wave • Omitted variables • Large departments may differ in fundamental ways difficult to measure (e.g., urban, heterogeneous)

  11. Example #2: Sanction Perceptions • Measurement error • Measures of perceived sanction risk are probably “noisy” (M.E. in X), resulting in attenuation at best • Instantaneous causation • Perceptions are sensitive to the success/failure of criminal behavior, so feedback is negative • Omitted variables • Perceived risk probably correlated with unobserved determinants of crime (e.g., intelligence)

  12. Example #3: Delinquent Peers • Measurement error • Highly delinquent youth probably overestimate the delinquency of their peers (M.E. in X), and likely underestimate their own delinquency (M.E. in Y) • Instantaneous causation • If there is influence/imitation, then it is bidirectional • Omitted variables • High-risk youth probably select themselves into delinquent peer groups (“birds of a feather”)

  13. Regression EstimationIgnoring Omitted Variables • Suppose we estimate treatment effect model: Y = α + βX + ε • Let’s assume without loss of generality that X is a binary “treatment” (= 1 if treated; = 0 if untreated) • Least squares estimator: bLS = Cov(X,Y) / Var(X) = E(Y | X = 1) – E(Y | X = 0) • Simply the difference in means between “treated” units (X = 1) and “untreated” units (X = 0)

  14. Regression EstimationIgnoring Omitted Variables • But suppose the population treatment effect model is instead: Y = α + βX + (δW + ω) • Now the residual conveys information about W • Consider a plausible example • Y = crime, X = marriage, W = “marriageability” • “Marriageability” can be broadly construed to encompass earnings potential, desire for children, willingness to compromise, faithfulness, verbal communication skills,... • Including “signals” that individuals emit about these qualities

  15. Impact of marriage-ability on crime (–) Difference in marriageability between married and unmarried (+) True impact of marriage on crime (–) Regression EstimationIgnoring Omitted Variables • What does LS estimate when W is omitted? bLS = [C(X,Y)/V(X)] + [C(W,Y)/V(W)] × [C(X,W)/V(X)] = β+ δ × [E(W | X = 1) – E(W | X = 0)] • Marriage effect on crime will be overestimated • IMPORTANT: Even if β = 0, bLS < 0

  16. Regression EstimationIgnoring Omitted Variables • So... bLS = β+ δ × [E(W | X = 1) – E(W | X = 0)] • Estimate of β is unbiased if and only if 1. Marriageability is uncorrelated with crime δ = 0 or... 2. Marriageability is “balanced” (i.e., equivalent) between married and unmarried subjects E(W | X = 1) = E(W | X = 0)

  17. Omitted Variables in Criminological Research • What variables of interest to criminologists are surely endogenous? • Micro = Employment, education, marriage, military service, fertility, conviction, family structure,.... • Macro = Poverty, unemployment rate, collective efficacy, immigrant concentration,.... • Basically, EVERYTHING! • (I’m sorry to be the one to break it to you)

  18. Traditional Strategies to Deal with Omitted Variables • Randomization (physical control) • Achieves balance (in expectation) on any and all potential W’s • Control variables are technically unnecessary • Covariate adjustment (statistical control) • Control for potential W’s in a regression model • But...we have no idea how many W’s there are, so model misspecification is still a real problem here

  19. Quasi-Experimental Strategies to Deal with Omitted Variables • Difference in differences (fixed-effects model) • Requires panel data • Propensity score matching • Requires a lot of measured background variables • Similar to covariate adjustment, but only the treated and untreated cases which are “on support” are utilized • Instrumental variables estimation • Requires an exclusion restriction

  20. Z e X Y W Instrumental Variables Estimation Is a Viable Approach • An “instrumental variable” for X is one solution to the problem of omitted variables bias • Requirements for Z to be a valid instrument for X • Relevant = Correlated with X • Exogenous = Not correlated with Y but through its correlation with X

  21. Important Point about Instrumental Variables Models • I often hear...“A good instrument should not be correlated with the dependent variable” • WRONG!!! • Z has to be correlated with Y, otherwise it is useless as an instrument • It can only be correlated with Y through X • A good instrument must not be correlated with the unobserved determinants of Y

  22. X Y Z Important Point about Instrumental Variables Models • Not all of the available variation in X is used • Only that portion of X which is “explained” by Z is used to explain Y X = Endogenous variable Y = Response variable Z = Instrumental variable

  23. X Y Z X Y Z Important Point about Instrumental Variables Models Best-case scenario: A lot of X is explained by Z, and most of the overlap between X and Y is accounted for Realistic scenario: Very little of X is explained by Z, or what is explained does not overlap much with Y

  24. Important Point about Instrumental Variables Models • The IV estimator is BIASED • In other words, E(bIV) ≠β (finite-sample bias) • The appeal of IV derives from its consistency • “Consistency” is a way of saying that E(b) → β as N → ∞ • So…IV studies often have very large samples • But with endogeneity, E(bLS) ≠β and plim(bLS) ≠β anyway • Asymptotic behavior of IV plim(bIV) =β + Cov(Z,e) / Cov(Z,X) • If Z is truly exogenous, then Cov(Z,e) = 0

  25. ω ε α1 β1 Z X Y ξ δ1 Z Y Instrumental Variables Terminology • Three different models to be familiar with • First stage: X = α0 + α1Z + ω • Structural model: Y = β0 + β1X + ε • Reduced form: Y = δ0 + δ1Z + ξ • An interesting equality: δ1 = α1×β1 so… β1 = δ1 / α1

  26. Different Types of Instrumental Variables Estimators • Wald estimator for binary instrument: bWald= [E(Y | Z = 1) – E(Y | Z = 0)] / [E(X | Z = 1) – E(X | Z = 0)] • Difference in response ÷ Difference in treatment • Instrumental variables (IV) estimator: bIV= (Z′X)–1Z′Y = Cov(Z,Y) / Cov(Z,X) • Shows that bIV can be recovered from two samples • Two-stage least squares (2SLS) estimator: b2SLS= (X̃′X̃)–1X̃′Y = Cov(X̃,Y) / Var(X̃) • X̃ represents “fitted” value from first-stage model

  27. Different Types of Instrumental Variables Estimators • Single binary instrument and no control variables... bWald = bIV = b2SLS • Single instrument (binary or continuous) with or without control variables... bIV = b2SLS • Multiple instruments (binary or continuous) with or without control variables... b2SLS

  28. More on the Method of Two-Stage Least Squares (2SLS) • Step 1: X = a0 + a1Z1 + a2Z2 +  + akZk + u • Obtain fitted values (X̃) from the first-stage model • Step 2: Y = b0 + b1X̃ + e • Substitute the fitted X̃ in place of the original X • Note: If done manually in two stages, the standard errors are based on the wrong residual e = Y – b0 – b1X̃ when it should be e = Y – b0 – b1X • Best to just let the software do it for you

  29. Including Control Variables in an IV/2SLS Model • Control variables (W’s) should be entered into the model at both stages • First stage: X = a0 + a1Z + a2W + u • Second stage: Y = b0 + b1X̃ + b2W + e • Control variables are considered “instruments,” they are just not “excluded instruments” • They serve as their own instrument

  30. Functional Form Considerations with IV/2SLS • Binary endogenous regressor (X) • Consistency of second-stage estimates do not hinge on getting first-stage functional form correct • Binary response variable (Y) • IV probit (or logit) is feasible but is technically unnecessary • In both cases, linear model is tractable, easily interpreted, and consistent • Although variance adjustment is well advised

  31. Functional Form Considerations with IV/2SLS • Quadratic second stage with a continuous endogenous regressor • Entering first-stage fitted values and their square into second-stage model leads to inconsistency • The square of a linear projection is not equivalent to a linear projection on a quadratic • Squares and cross-products of IV’s should be treated as additional instruments • Kelejian (1971) • Linear and squared X’s are treated as two different endogenous regressors

  32. Technical Conditions Required for Model Identification • Order condition = At least the same # of IV’s as endogenous X’s • Just-identified model: # IV’s = # X’s • Overidentified model: # IV’s > # X’s • Rank condition = At least one IV must be significant in the first-stage model • Number of linearly independent columns in a matrix • E(X | Z,W) cannot be perfectly correlated with E(X | W)

  33. Statistical Inference with IV • Variance estimation σ2βLS = σ2ε / SSTX σ2βIV= σ2ε / (SSTX R2X,Z) where… ε = Y – β0 – β1X • NOTICE: Because R2X,Z < 1  sbIV > sbLS • IV standard errors tend to be large, especially when R2X,Z is very small, which can lead to type II errors

  34. Instrumental Variables and Randomized Experiments • Imperfect compliance in randomized trials • Some individuals assigned to treatment group will not receive Tx, and some assigned to control group will receive Tx • Assignment error; subject refusal; investigator discretion • Some individuals who receive Tx will not change their behavior, and some who do not receive Tx will change their behavior • A problem in randomized job training studies and other social experiments (e.g., housing vouchers)

  35. Instrumental Variables and Randomized Experiments • Two different measures of treatment (X) • Treatment assigned = Exogenous • Intention-to-treat (ITT) analysis • Reduced-formmodel: Y = δ0 + δ1Z + ξ • Often leads to underestimation of treatment effect • Treatment delivered = Endogenous • Individuals who do not comply probably differ in ways that can undermine the study • Self-selection  bias and inconsistency

  36. Angrist (2006), J.E.C. • Minneapolis D.V. experiment • Sherman and Berk (1984) • Cases of male-on-female misdemeanor assault in two high-density precincts, in which both parties present at scene • Random assignment of arrest-mediation-separation • But...treatment assigned was not treatment delivered • Fidelity vis-à-vis arrest, but many subjects (~25%) assigned to mediation/separation were arrested • “Upgrading” was more likely when suspect was rude, suspect assaulted officer, weapons were involved, victim persistently demanded arrest, and incident violated restraining order

  37. + – Treatment Assigned (Arrest) Treatment Delivered (Arrest) Recidivism + + Violence Proneness Angrist (2006), J.E.C.

  38. Angrist (2006), J.E.C. • Estimates of effect of arrest (vs. mediate or separate) on D.V. recividism (Tables 2, 3) • OLS: b = –.070 (s.e. = .038) • ITT: b = –.108 (s.e. = .041) • 2SLS: b = –.140 (s.e. = .053) • Deterrent effect of arrest is twice as large in 2SLS as opposed to OLS • In this context, 2SLS is known as a “local average treatment effect” (I’ll come back to this)

  39. Sexton and Hebel (1984), J.A.M.A. • Maternal smoking and birth weight • Sexton and Hebel (1984) • Sample of pregnant women who were confirmed smokers, recruited from prenatal care registrants • At least 10 cigarettes per day and not past 18th week • Random assignment of staff assistance in a smoking cessation program • Personal visits; telephone and mail contacts • But...some smokers in treatment group did not quit and some smokers in control group did quit

  40. – Smoking Intervention Smoking Frequency Birth Weight + – – – Smoking Propensity Difficult Pregnancy Sexton and Hebel (1984), J.A.M.A.

  41. Sexton and Hebel (1984), J.A.M.A. (1) First-stage model Mean cigarettes smoked: Treatment = 6.4 Control = 12.8 First-stage effect: bFS = –6.4 (2) Reduced-form model Mean birth weight: Treatment = 3,278g Control = 3,186g Reduced-form effect: bRF = 92 (3) Structural model Effect of smoking frequency on mean birth weight: bIV = 92 / –6.4 = –14.4g Each cigarette reduces birth weight by 14.4 grams

  42. Sexton and Hebel (1984), J.A.M.A. • As an interesting aside, it’s also possible to estimate the effect of continuing smoking (vs. quitting) from the data • First stage: bFS = –0.23 (57% vs. 80% smokers) • Reduced form: bRF = 92g • Structural: bIV = 92 / –0.23 = –400g • Women who kept smoking by the 8th month of pregnancy bore children who were 400 grams lighter, on average

  43. Permutt and Hebel (1989), Biometrics • Estimates of the effect of smoking frequency (in 8th month) on birth weight • OLS: b = 2g (s.e. not reported) • 2SLS: b = –14g (s.e. = 7g) • Here as well, 2SLS yields the “local average treatment effect” of smoking on birth weight

  44. Instrumental Variables and Local Average Treatment Effects • Definition of a L.A.T.E. • The average treatment effect for individuals “who can be induced to change [treatment] status by a change in the instrument” • Imbens and Angrist (1994, p. 470) • The average causal effect of X on Y for “compliers,” as opposed to “always takers” or “never takers” • Not a particularly well-defined (sub)population • L.A.T.E. is instrument-dependent, in contrast to the population A.T.E.

  45. L.A.T.E. in the Previous Two Examples • In the D.V. study... • For men who were arrested as per the experimental protocol, arrest resulted in a mean 14-point decline in the probability of recidivism compared to non-arrest interventions • In the maternal smoking study... • For women who reduced their smoking frequency because they were assigned to the intervention, each one-cigarette reduction resulted in a 14-gram increase in birth weight (from mean 11 cigarettes)

More Related