- 81 Views
- Uploaded on
- Presentation posted in: General

Instrumental Variables Estimation (with Examples from Criminology)

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Instrumental Variables Estimation (with Examples from Criminology)

Robert Apel, Ph.D.

School of Criminal Justice

University at Albany

Center for Social and Demographic Analysis

University at Albany

May 5 & 7, 2009

- Ph.D., Criminology and Criminal Justice, 2004
- University of Maryland

- Coursework in Department of Economics
- Dissertation used instrumental variables
- State child labor laws as instrumental variables for the causal effect of youth employment on antisocial behavior

- Why use IV?
- Discussion of endogeneity bias
- Statistical motivation for IV

- What is an IV?
- Identification issues
- Statistical properties of IV estimators

- How is an IV model estimated?
- Software and data examples
- Diagnostics: IV relevance, IV exogeneity, Hausman

- Population model: Y = α + βX + ε
- Assume that the true slope is positive, so β > 0

- Sample model: Y = a + bX + e
- Least squares (LS) estimator of β:
bLS= (X′X)–1X′Y = Cov(X,Y) / Var(X)

- Least squares (LS) estimator of β:
- Under what conditions can we speak of bLS as a causal estimate of the effect of X on Y?

- Key assumption of the linear model:
E(X′e) = Cov(X,e) = E(e | X) = 0

- Exogeneity assumption = X is uncorrelated with the unobserved determinants of Y

- Important statistical property of the LS estimator under exogeneity:
E(bLS) =β + Cov(X,e) / Var(X)

plim(bLS) =β + Cov(X,e) / Var(X)

Second terms 0, so bLS unbiased and consistent

- When is the exogeneity assumption violated?
- Measurement error → Attenuation bias
- Instantaneous causation → Simultaneity bias
- Omitted variables → Selection bias

- Selection bias is the problem in observational research that undermines causal inference
- Measurement error and instantaneous causation can be posed as problems of omitted variables

e

X

Y

u

v

(1)Measurement error in X (u) that is correlated with M.E. in Y (v) or with the model error (e)

- Classical M.E. leads to attenuation, 0 < E(bLS) < β, but non-random M.E. (or correlation between M.E. and X, Y, V, and/or e) introduces unknown biases

And, if there are multiple X’s, bias contaminates the whole model, not just the coefficient on the X measured with error (a.k.a. “smearing”)

X

Y

(2)Instantaneous causation of Y on X

- Direction of the bias depends on what the sign is for the feedback effect, Y → X
- If positive, E(bLS) > β, so overestimate true effect
- If negative, E(bLS) < β, so underestimate true effect and in severe cases can even flip the sign so that E(bLS) < 0 even though β > 0

This non-recursivity complicates the relationship between price and quantity in economics

X

Y

W

(3) Omitted variable (W) that is correlated with both X and Y

- Classic problem of omitted variables bias
- Coefficient on X will absorb the indirect path through W, whose sign depends on Cov(X,W) and Cov(W,Y)

Things more complicated in applied settings because there are bound to be many W’s, not to mention that the “smearing” problem applies in this context also

- Measurement error
- Mobilization of sworn officers (M.E. in X) as well as differential victim reporting or crime recording (M.E. in Y) may be correlated with police size

- Instantaneous causation
- More police might be hired during a crime wave

- Omitted variables
- Large departments may differ in fundamental ways difficult to measure (e.g., urban, heterogeneous)

- Measurement error
- Measures of perceived sanction risk are probably “noisy” (M.E. in X), resulting in attenuation at best

- Instantaneous causation
- Perceptions are sensitive to the success/failure of criminal behavior, so feedback is negative

- Omitted variables
- Perceived risk probably correlated with unobserved determinants of crime (e.g., intelligence)

- Measurement error
- Highly delinquent youth probably overestimate the delinquency of their peers (M.E. in X), and likely underestimate their own delinquency (M.E. in Y)

- Instantaneous causation
- If there is influence/imitation, then it is bidirectional

- Omitted variables
- High-risk youth probably select themselves into delinquent peer groups (“birds of a feather”)

- Suppose we estimate treatment effect model:
Y = α + βX + ε

- Let’s assume without loss of generality that X is a binary “treatment” (= 1 if treated; = 0 if untreated)

- Least squares estimator:
bLS = Cov(X,Y) / Var(X) = E(Y | X = 1) – E(Y | X = 0)

- Simply the difference in means between “treated” units (X = 1) and “untreated” units (X = 0)

- But suppose the population treatment effect model is instead:
Y = α + βX + (δW + ω)

- Now the residual conveys information about W

- Consider a plausible example
- Y = crime, X = marriage, W = “marriageability”
- “Marriageability” can be broadly construed to encompass earnings potential, desire for children, willingness to compromise, faithfulness, verbal communication skills,...
- Including “signals” that individuals emit about these qualities

- “Marriageability” can be broadly construed to encompass earnings potential, desire for children, willingness to compromise, faithfulness, verbal communication skills,...

- Y = crime, X = marriage, W = “marriageability”

Impact of marriage-ability on crime

(–)

Difference in marriageability between married and unmarried

(+)

True impact of marriage on crime

(–)

- What does LS estimate when W is omitted?
bLS = [C(X,Y)/V(X)] + [C(W,Y)/V(W)] × [C(X,W)/V(X)]

= β+ δ × [E(W | X = 1) – E(W | X = 0)]

- Marriage effect on crime will be overestimated
- IMPORTANT: Even if β = 0, bLS < 0

- So...
bLS = β+ δ × [E(W | X = 1) – E(W | X = 0)]

- Estimate of β is unbiased if and only if
1. Marriageability is uncorrelated with crime

δ = 0

or...

2. Marriageability is “balanced” (i.e., equivalent) between married and unmarried subjects

E(W | X = 1) = E(W | X = 0)

- What variables of interest to criminologists are surely endogenous?
- Micro = Employment, education, marriage, military service, fertility, conviction, family structure,....
- Macro = Poverty, unemployment rate, collective efficacy, immigrant concentration,....

- Basically, EVERYTHING!
- (I’m sorry to be the one to break it to you)

- Randomization (physical control)
- Achieves balance (in expectation) on any and all potential W’s
- Control variables are technically unnecessary

- Covariate adjustment (statistical control)
- Control for potential W’s in a regression model
- But...we have no idea how many W’s there are, so model misspecification is still a real problem here

- Difference in differences (fixed-effects model)
- Requires panel data

- Propensity score matching
- Requires a lot of measured background variables
- Similar to covariate adjustment, but only the treated and untreated cases which are “on support” are utilized

- Requires a lot of measured background variables
- Instrumental variables estimation
- Requires an exclusion restriction

Z

e

X

Y

W

- An “instrumental variable” for X is one solution to the problem of omitted variables bias

- Requirements for Z to be a valid instrument for X
- Relevant = Correlated with X
- Exogenous = Not correlated with Y but through its correlation with X

- I often hear...“A good instrument should not be correlated with the dependent variable”
- WRONG!!!

- Z has to be correlated with Y, otherwise it is useless as an instrument
- It can only be correlated with Y through X

- A good instrument must not be correlated with the unobserved determinants of Y

X

Y

Z

- Not all of the available variation in X is used
- Only that portion of X which is “explained” by Z is used to explain Y

X = Endogenous variable

Y = Response variable

Z = Instrumental variable

X

Y

Z

X

Y

Z

Best-case scenario: A lot of X is explained by Z, and most of the overlap between X and Y is accounted for

Realistic scenario: Very little of X is explained by Z, or what is explained does not overlap much with Y

- The IV estimator is BIASED
- In other words, E(bIV) ≠β (finite-sample bias)
- The appeal of IV derives from its consistency
- “Consistency” is a way of saying that E(b) → β as N → ∞
- So…IV studies often have very large samples

- But with endogeneity, E(bLS) ≠β and plim(bLS) ≠β anyway

- Asymptotic behavior of IV
plim(bIV) =β + Cov(Z,e) / Cov(Z,X)

- If Z is truly exogenous, then Cov(Z,e) = 0

ω

ε

α1

β1

Z

X

Y

ξ

δ1

Z

Y

- Three different models to be familiar with
- First stage: X = α0 + α1Z + ω
- Structural model: Y = β0 + β1X + ε
- Reduced form: Y = δ0 + δ1Z + ξ

- An interesting equality:
δ1 = α1×β1

so…

β1 = δ1 / α1

- Wald estimator for binary instrument:
bWald= [E(Y | Z = 1) – E(Y | Z = 0)] / [E(X | Z = 1) – E(X | Z = 0)]

- Difference in response ÷ Difference in treatment

- Instrumental variables (IV) estimator:
bIV= (Z′X)–1Z′Y = Cov(Z,Y) / Cov(Z,X)

- Shows that bIV can be recovered from two samples

- Two-stage least squares (2SLS) estimator:
b2SLS= (X̃′X̃)–1X̃′Y = Cov(X̃,Y) / Var(X̃)

- X̃ represents “fitted” value from first-stage model

- Single binary instrument and no control variables...
bWald = bIV = b2SLS

- Single instrument (binary or continuous) with or without control variables...
bIV = b2SLS

- Multiple instruments (binary or continuous) with or without control variables...
b2SLS

- Step 1: X = a0 + a1Z1 + a2Z2 + + akZk + u
- Obtain fitted values (X̃) from the first-stage model

- Step 2: Y = b0 + b1X̃ + e
- Substitute the fitted X̃ in place of the original X
- Note: If done manually in two stages, the standard errors are based on the wrong residual
e = Y – b0 – b1X̃ when it should be e = Y – b0 – b1X

- Best to just let the software do it for you

- Control variables (W’s) should be entered into the model at both stages
- First stage: X = a0 + a1Z + a2W + u
- Second stage: Y = b0 + b1X̃ + b2W + e

- Control variables are considered “instruments,” they are just not “excluded instruments”
- They serve as their own instrument

- Binary endogenous regressor (X)
- Consistency of second-stage estimates do not hinge on getting first-stage functional form correct

- Binary response variable (Y)
- IV probit (or logit) is feasible but is technically unnecessary

- In both cases, linear model is tractable, easily interpreted, and consistent
- Although variance adjustment is well advised

- Quadratic second stage with a continuous endogenous regressor
- Entering first-stage fitted values and their square into second-stage model leads to inconsistency
- The square of a linear projection is not equivalent to a linear projection on a quadratic

- Squares and cross-products of IV’s should be treated as additional instruments
- Kelejian (1971)

- Linear and squared X’s are treated as two different endogenous regressors

- Entering first-stage fitted values and their square into second-stage model leads to inconsistency

- Order condition = At least the same # of IV’s as endogenous X’s
- Just-identified model: # IV’s = # X’s
- Overidentified model: # IV’s > # X’s

- Rank condition = At least one IV must be significant in the first-stage model
- Number of linearly independent columns in a matrix
- E(X | Z,W) cannot be perfectly correlated with E(X | W)

- Number of linearly independent columns in a matrix

- Variance estimation
σ2βLS = σ2ε / SSTX

σ2βIV= σ2ε / (SSTX R2X,Z)

where…

ε = Y – β0 – β1X

- NOTICE: Because R2X,Z < 1 sbIV > sbLS
- IV standard errors tend to be large, especially when R2X,Z is very small, which can lead to type II errors

- Imperfect compliance in randomized trials
- Some individuals assigned to treatment group will not receive Tx, and some assigned to control group will receive Tx
- Assignment error; subject refusal; investigator discretion

- Some individuals who receive Tx will not change their behavior, and some who do not receive Tx will change their behavior
- A problem in randomized job training studies and other social experiments (e.g., housing vouchers)

- Some individuals assigned to treatment group will not receive Tx, and some assigned to control group will receive Tx

- Two different measures of treatment (X)
- Treatment assigned = Exogenous
- Intention-to-treat (ITT) analysis
- Reduced-formmodel: Y = δ0 + δ1Z + ξ

- Often leads to underestimation of treatment effect

- Intention-to-treat (ITT) analysis
- Treatment delivered = Endogenous
- Individuals who do not comply probably differ in ways that can undermine the study
- Self-selection bias and inconsistency

- Treatment assigned = Exogenous

- Minneapolis D.V. experiment
- Sherman and Berk (1984)
- Cases of male-on-female misdemeanor assault in two high-density precincts, in which both parties present at scene

- Random assignment of arrest-mediation-separation
- But...treatment assigned was not treatment delivered
- Fidelity vis-à-vis arrest, but many subjects (~25%) assigned to mediation/separation were arrested
- “Upgrading” was more likely when suspect was rude, suspect assaulted officer, weapons were involved, victim persistently demanded arrest, and incident violated restraining order

- Fidelity vis-à-vis arrest, but many subjects (~25%) assigned to mediation/separation were arrested

- Sherman and Berk (1984)

+

–

Treatment

Assigned

(Arrest)

Treatment

Delivered

(Arrest)

Recidivism

+

+

Violence

Proneness

- Estimates of effect of arrest (vs. mediate or separate) on D.V. recividism (Tables 2, 3)
- OLS: b = –.070 (s.e. = .038)
- ITT: b = –.108 (s.e. = .041)
- 2SLS: b = –.140 (s.e. = .053)

- Deterrent effect of arrest is twice as large in 2SLS as opposed to OLS
- In this context, 2SLS is known as a “local average treatment effect” (I’ll come back to this)

- Maternal smoking and birth weight
- Sexton and Hebel (1984)
- Sample of pregnant women who were confirmed smokers, recruited from prenatal care registrants
- At least 10 cigarettes per day and not past 18th week

- Sample of pregnant women who were confirmed smokers, recruited from prenatal care registrants
- Random assignment of staff assistance in a smoking cessation program
- Personal visits; telephone and mail contacts

- But...some smokers in treatment group did not quit and some smokers in control group did quit

- Sexton and Hebel (1984)

–

–

Smoking

Intervention

Smoking

Frequency

Birth

Weight

+

–

–

–

Smoking

Propensity

Difficult

Pregnancy

(1) First-stage model

Mean cigarettes smoked:

Treatment = 6.4

Control = 12.8

First-stage effect: bFS = –6.4

(2) Reduced-form model

Mean birth weight:

Treatment = 3,278g

Control = 3,186g

Reduced-form effect: bRF = 92

(3) Structural model

Effect of smoking frequency on mean birth weight:

bIV = 92 / –6.4 = –14.4g

Each cigarette reduces birth weight by 14.4 grams

- As an interesting aside, it’s also possible to estimate the effect of continuing smoking (vs. quitting) from the data
- First stage: bFS = –0.23 (57% vs. 80% smokers)
- Reduced form: bRF = 92g
- Structural: bIV = 92 / –0.23 = –400g

- Women who kept smoking by the 8th month of pregnancy bore children who were 400 grams lighter, on average

- Estimates of the effect of smoking frequency (in 8th month) on birth weight
- OLS: b = 2g (s.e. not reported)
- 2SLS: b = –14g (s.e. = 7g)

- Here as well, 2SLS yields the “local average treatment effect” of smoking on birth weight

- Definition of a L.A.T.E.
- The average treatment effect for individuals “who can be induced to change [treatment] status by a change in the instrument”
- Imbens and Angrist (1994, p. 470)

- The average causal effect of X on Y for “compliers,” as opposed to “always takers” or “never takers”
- Not a particularly well-defined (sub)population

- The average treatment effect for individuals “who can be induced to change [treatment] status by a change in the instrument”
- L.A.T.E. is instrument-dependent, in contrast to the population A.T.E.

- In the D.V. study...
- For men who were arrested as per the experimental protocol, arrest resulted in a mean 14-point decline in the probability of recidivism compared to non-arrest interventions

- In the maternal smoking study...
- For women who reduced their smoking frequency because they were assigned to the intervention, each one-cigarette reduction resulted in a 14-gram increase in birth weight (from mean 11 cigarettes)