Instrumental Variables Estimation (with Examples from Criminology)

Instrumental Variables Estimation (with Examples from Criminology) Robert Apel, Ph.D. School of Criminal Justice University at Albany Center for Social and Demographic Analysis University at Albany May 5 & 7, 2009

Vital Statistics • Ph.D., Criminology and Criminal Justice, 2004 • University of Maryland • Coursework in Department of Economics • Dissertation used instrumental variables • State child labor laws as instrumental variables for the causal effect of youth employment on antisocial behavior

Topics That Will Be Covered in this Workshop • Why use IV? • Discussion of endogeneity bias • Statistical motivation for IV • What is an IV? • Identification issues • Statistical properties of IV estimators • How is an IV model estimated? • Software and data examples • Diagnostics: IV relevance, IV exogeneity, Hausman

Review of the Linear Model • Population model: Y = α + βX + ε • Assume that the true slope is positive, so β > 0 • Sample model: Y = a + bX + e • Least squares (LS) estimator of β: bLS= (X′X)–1X′Y = Cov(X,Y) / Var(X) • Under what conditions can we speak of bLS as a causal estimate of the effect of X on Y?

Review of the Linear Model • Key assumption of the linear model: E(X′e) = Cov(X,e) = E(e | X) = 0 • Exogeneity assumption = X is uncorrelated with the unobserved determinants of Y • Important statistical property of the LS estimator under exogeneity: E(bLS) =β + Cov(X,e) / Var(X) plim(bLS) =β + Cov(X,e) / Var(X) Second terms 0, so bLS unbiased and consistent

Endogeneity and the Evaluation Problem • When is the exogeneity assumption violated? • Measurement error → Attenuation bias • Instantaneous causation → Simultaneity bias • Omitted variables → Selection bias • Selection bias is the problem in observational research that undermines causal inference • Measurement error and instantaneous causation can be posed as problems of omitted variables

e X Y u v When Is the Exogeneity Assumption Violated? (1)Measurement error in X (u) that is correlated with M.E. in Y (v) or with the model error (e) • Classical M.E. leads to attenuation, 0 < E(bLS) < β, but non-random M.E. (or correlation between M.E. and X, Y, V, and/or e) introduces unknown biases And, if there are multiple X’s, bias contaminates the whole model, not just the coefficient on the X measured with error (a.k.a. “smearing”)

X Y When Is the Exogeneity Assumption Violated? (2)Instantaneous causation of Y on X • Direction of the bias depends on what the sign is for the feedback effect, Y → X • If positive, E(bLS) > β, so overestimate true effect • If negative, E(bLS) < β, so underestimate true effect and in severe cases can even flip the sign so that E(bLS) < 0 even though β > 0 This non-recursivity complicates the relationship between price and quantity in economics

X Y W When Is the Exogeneity Assumption Violated? (3) Omitted variable (W) that is correlated with both X and Y • Classic problem of omitted variables bias • Coefficient on X will absorb the indirect path through W, whose sign depends on Cov(X,W) and Cov(W,Y) Things more complicated in applied settings because there are bound to be many W’s, not to mention that the “smearing” problem applies in this context also

Example #1: Police Hiring • Measurement error • Mobilization of sworn officers (M.E. in X) as well as differential victim reporting or crime recording (M.E. in Y) may be correlated with police size • Instantaneous causation • More police might be hired during a crime wave • Omitted variables • Large departments may differ in fundamental ways difficult to measure (e.g., urban, heterogeneous)

Example #2: Sanction Perceptions • Measurement error • Measures of perceived sanction risk are probably “noisy” (M.E. in X), resulting in attenuation at best • Instantaneous causation • Perceptions are sensitive to the success/failure of criminal behavior, so feedback is negative • Omitted variables • Perceived risk probably correlated with unobserved determinants of crime (e.g., intelligence)

Example #3: Delinquent Peers • Measurement error • Highly delinquent youth probably overestimate the delinquency of their peers (M.E. in X), and likely underestimate their own delinquency (M.E. in Y) • Instantaneous causation • If there is influence/imitation, then it is bidirectional • Omitted variables • High-risk youth probably select themselves into delinquent peer groups (“birds of a feather”)

Regression EstimationIgnoring Omitted Variables • Suppose we estimate treatment effect model: Y = α + βX + ε • Let’s assume without loss of generality that X is a binary “treatment” (= 1 if treated; = 0 if untreated) • Least squares estimator: bLS = Cov(X,Y) / Var(X) = E(Y | X = 1) – E(Y | X = 0) • Simply the difference in means between “treated” units (X = 1) and “untreated” units (X = 0)

Regression EstimationIgnoring Omitted Variables • But suppose the population treatment effect model is instead: Y = α + βX + (δW + ω) • Now the residual conveys information about W • Consider a plausible example • Y = crime, X = marriage, W = “marriageability” • “Marriageability” can be broadly construed to encompass earnings potential, desire for children, willingness to compromise, faithfulness, verbal communication skills,... • Including “signals” that individuals emit about these qualities

Impact of marriage-ability on crime (–) Difference in marriageability between married and unmarried (+) True impact of marriage on crime (–) Regression EstimationIgnoring Omitted Variables • What does LS estimate when W is omitted? bLS = [C(X,Y)/V(X)] + [C(W,Y)/V(W)] × [C(X,W)/V(X)] = β+ δ × [E(W | X = 1) – E(W | X = 0)] • Marriage effect on crime will be overestimated • IMPORTANT: Even if β = 0, bLS < 0

Regression EstimationIgnoring Omitted Variables • So... bLS = β+ δ × [E(W | X = 1) – E(W | X = 0)] • Estimate of β is unbiased if and only if 1. Marriageability is uncorrelated with crime δ = 0 or... 2. Marriageability is “balanced” (i.e., equivalent) between married and unmarried subjects E(W | X = 1) = E(W | X = 0)

Omitted Variables in Criminological Research • What variables of interest to criminologists are surely endogenous? • Micro = Employment, education, marriage, military service, fertility, conviction, family structure,.... • Macro = Poverty, unemployment rate, collective efficacy, immigrant concentration,.... • Basically, EVERYTHING! • (I’m sorry to be the one to break it to you)

Traditional Strategies to Deal with Omitted Variables • Randomization (physical control) • Achieves balance (in expectation) on any and all potential W’s • Control variables are technically unnecessary • Covariate adjustment (statistical control) • Control for potential W’s in a regression model • But...we have no idea how many W’s there are, so model misspecification is still a real problem here

Quasi-Experimental Strategies to Deal with Omitted Variables • Difference in differences (fixed-effects model) • Requires panel data • Propensity score matching • Requires a lot of measured background variables • Similar to covariate adjustment, but only the treated and untreated cases which are “on support” are utilized • Instrumental variables estimation • Requires an exclusion restriction

Z e X Y W Instrumental Variables Estimation Is a Viable Approach • An “instrumental variable” for X is one solution to the problem of omitted variables bias • Requirements for Z to be a valid instrument for X • Relevant = Correlated with X • Exogenous = Not correlated with Y but through its correlation with X

Important Point about Instrumental Variables Models • I often hear...“A good instrument should not be correlated with the dependent variable” • WRONG!!! • Z has to be correlated with Y, otherwise it is useless as an instrument • It can only be correlated with Y through X • A good instrument must not be correlated with the unobserved determinants of Y

X Y Z Important Point about Instrumental Variables Models • Not all of the available variation in X is used • Only that portion of X which is “explained” by Z is used to explain Y X = Endogenous variable Y = Response variable Z = Instrumental variable

X Y Z X Y Z Important Point about Instrumental Variables Models Best-case scenario: A lot of X is explained by Z, and most of the overlap between X and Y is accounted for Realistic scenario: Very little of X is explained by Z, or what is explained does not overlap much with Y

Important Point about Instrumental Variables Models • The IV estimator is BIASED • In other words, E(bIV) ≠β (finite-sample bias) • The appeal of IV derives from its consistency • “Consistency” is a way of saying that E(b) → β as N → ∞ • So…IV studies often have very large samples • But with endogeneity, E(bLS) ≠β and plim(bLS) ≠β anyway • Asymptotic behavior of IV plim(bIV) =β + Cov(Z,e) / Cov(Z,X) • If Z is truly exogenous, then Cov(Z,e) = 0

ω ε α1 β1 Z X Y ξ δ1 Z Y Instrumental Variables Terminology • Three different models to be familiar with • First stage: X = α0 + α1Z + ω • Structural model: Y = β0 + β1X + ε • Reduced form: Y = δ0 + δ1Z + ξ • An interesting equality: δ1 = α1×β1 so… β1 = δ1 / α1

Different Types of Instrumental Variables Estimators • Wald estimator for binary instrument: bWald= [E(Y | Z = 1) – E(Y | Z = 0)] / [E(X | Z = 1) – E(X | Z = 0)] • Difference in response ÷ Difference in treatment • Instrumental variables (IV) estimator: bIV= (Z′X)–1Z′Y = Cov(Z,Y) / Cov(Z,X) • Shows that bIV can be recovered from two samples • Two-stage least squares (2SLS) estimator: b2SLS= (X̃′X̃)–1X̃′Y = Cov(X̃,Y) / Var(X̃) • X̃ represents “fitted” value from first-stage model

Different Types of Instrumental Variables Estimators • Single binary instrument and no control variables... bWald = bIV = b2SLS • Single instrument (binary or continuous) with or without control variables... bIV = b2SLS • Multiple instruments (binary or continuous) with or without control variables... b2SLS

More on the Method of Two-Stage Least Squares (2SLS) • Step 1: X = a0 + a1Z1 + a2Z2 +  + akZk + u • Obtain fitted values (X̃) from the first-stage model • Step 2: Y = b0 + b1X̃ + e • Substitute the fitted X̃ in place of the original X • Note: If done manually in two stages, the standard errors are based on the wrong residual e = Y – b0 – b1X̃ when it should be e = Y – b0 – b1X • Best to just let the software do it for you

Including Control Variables in an IV/2SLS Model • Control variables (W’s) should be entered into the model at both stages • First stage: X = a0 + a1Z + a2W + u • Second stage: Y = b0 + b1X̃ + b2W + e • Control variables are considered “instruments,” they are just not “excluded instruments” • They serve as their own instrument

Functional Form Considerations with IV/2SLS • Binary endogenous regressor (X) • Consistency of second-stage estimates do not hinge on getting first-stage functional form correct • Binary response variable (Y) • IV probit (or logit) is feasible but is technically unnecessary • In both cases, linear model is tractable, easily interpreted, and consistent • Although variance adjustment is well advised

Functional Form Considerations with IV/2SLS • Quadratic second stage with a continuous endogenous regressor • Entering first-stage fitted values and their square into second-stage model leads to inconsistency • The square of a linear projection is not equivalent to a linear projection on a quadratic • Squares and cross-products of IV’s should be treated as additional instruments • Kelejian (1971) • Linear and squared X’s are treated as two different endogenous regressors

Technical Conditions Required for Model Identification • Order condition = At least the same # of IV’s as endogenous X’s • Just-identified model: # IV’s = # X’s • Overidentified model: # IV’s > # X’s • Rank condition = At least one IV must be significant in the first-stage model • Number of linearly independent columns in a matrix • E(X | Z,W) cannot be perfectly correlated with E(X | W)

Statistical Inference with IV • Variance estimation σ2βLS = σ2ε / SSTX σ2βIV= σ2ε / (SSTX R2X,Z) where… ε = Y – β0 – β1X • NOTICE: Because R2X,Z < 1  sbIV > sbLS • IV standard errors tend to be large, especially when R2X,Z is very small, which can lead to type II errors

Instrumental Variables and Randomized Experiments • Imperfect compliance in randomized trials • Some individuals assigned to treatment group will not receive Tx, and some assigned to control group will receive Tx • Assignment error; subject refusal; investigator discretion • Some individuals who receive Tx will not change their behavior, and some who do not receive Tx will change their behavior • A problem in randomized job training studies and other social experiments (e.g., housing vouchers)

Instrumental Variables and Randomized Experiments • Two different measures of treatment (X) • Treatment assigned = Exogenous • Intention-to-treat (ITT) analysis • Reduced-formmodel: Y = δ0 + δ1Z + ξ • Often leads to underestimation of treatment effect • Treatment delivered = Endogenous • Individuals who do not comply probably differ in ways that can undermine the study • Self-selection  bias and inconsistency

Angrist (2006), J.E.C. • Minneapolis D.V. experiment • Sherman and Berk (1984) • Cases of male-on-female misdemeanor assault in two high-density precincts, in which both parties present at scene • Random assignment of arrest-mediation-separation • But...treatment assigned was not treatment delivered • Fidelity vis-à-vis arrest, but many subjects (~25%) assigned to mediation/separation were arrested • “Upgrading” was more likely when suspect was rude, suspect assaulted officer, weapons were involved, victim persistently demanded arrest, and incident violated restraining order

+ – Treatment Assigned (Arrest) Treatment Delivered (Arrest) Recidivism + + Violence Proneness Angrist (2006), J.E.C.

Angrist (2006), J.E.C. • Estimates of effect of arrest (vs. mediate or separate) on D.V. recividism (Tables 2, 3) • OLS: b = –.070 (s.e. = .038) • ITT: b = –.108 (s.e. = .041) • 2SLS: b = –.140 (s.e. = .053) • Deterrent effect of arrest is twice as large in 2SLS as opposed to OLS • In this context, 2SLS is known as a “local average treatment effect” (I’ll come back to this)

Sexton and Hebel (1984), J.A.M.A. • Maternal smoking and birth weight • Sexton and Hebel (1984) • Sample of pregnant women who were confirmed smokers, recruited from prenatal care registrants • At least 10 cigarettes per day and not past 18th week • Random assignment of staff assistance in a smoking cessation program • Personal visits; telephone and mail contacts • But...some smokers in treatment group did not quit and some smokers in control group did quit

– – Smoking Intervention Smoking Frequency Birth Weight + – – – Smoking Propensity Difficult Pregnancy Sexton and Hebel (1984), J.A.M.A.

Sexton and Hebel (1984), J.A.M.A. (1) First-stage model Mean cigarettes smoked: Treatment = 6.4 Control = 12.8 First-stage effect: bFS = –6.4 (2) Reduced-form model Mean birth weight: Treatment = 3,278g Control = 3,186g Reduced-form effect: bRF = 92 (3) Structural model Effect of smoking frequency on mean birth weight: bIV = 92 / –6.4 = –14.4g Each cigarette reduces birth weight by 14.4 grams

Sexton and Hebel (1984), J.A.M.A. • As an interesting aside, it’s also possible to estimate the effect of continuing smoking (vs. quitting) from the data • First stage: bFS = –0.23 (57% vs. 80% smokers) • Reduced form: bRF = 92g • Structural: bIV = 92 / –0.23 = –400g • Women who kept smoking by the 8th month of pregnancy bore children who were 400 grams lighter, on average

Permutt and Hebel (1989), Biometrics • Estimates of the effect of smoking frequency (in 8th month) on birth weight • OLS: b = 2g (s.e. not reported) • 2SLS: b = –14g (s.e. = 7g) • Here as well, 2SLS yields the “local average treatment effect” of smoking on birth weight

Instrumental Variables and Local Average Treatment Effects • Definition of a L.A.T.E. • The average treatment effect for individuals “who can be induced to change [treatment] status by a change in the instrument” • Imbens and Angrist (1994, p. 470) • The average causal effect of X on Y for “compliers,” as opposed to “always takers” or “never takers” • Not a particularly well-defined (sub)population • L.A.T.E. is instrument-dependent, in contrast to the population A.T.E.

L.A.T.E. in the Previous Two Examples • In the D.V. study... • For men who were arrested as per the experimental protocol, arrest resulted in a mean 14-point decline in the probability of recidivism compared to non-arrest interventions • In the maternal smoking study... • For women who reduced their smoking frequency because they were assigned to the intervention, each one-cigarette reduction resulted in a 14-gram increase in birth weight (from mean 11 cigarettes)

Instrumental Variables Estimation (with Examples from Criminology)