instrumental variables estimation with examples from criminology
Skip this Video
Download Presentation
Instrumental Variables Estimation (with Examples from Criminology)

Loading in 2 Seconds...

play fullscreen
1 / 45

Instrumental Variables Estimation (with Examples from Criminology) - PowerPoint PPT Presentation

  • Uploaded on

Instrumental Variables Estimation (with Examples from Criminology). Robert Apel, Ph.D. School of Criminal Justice University at Albany. Center for Social and Demographic Analysis University at Albany May 5 & 7, 2009. Vital Statistics. Ph.D., Criminology and Criminal Justice, 2004

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Instrumental Variables Estimation (with Examples from Criminology)' - marli

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
instrumental variables estimation with examples from criminology

Instrumental Variables Estimation (with Examples from Criminology)

Robert Apel, Ph.D.

School of Criminal Justice

University at Albany

Center for Social and Demographic Analysis

University at Albany

May 5 & 7, 2009

vital statistics
Vital Statistics
  • Ph.D., Criminology and Criminal Justice, 2004
    • University of Maryland
  • Coursework in Department of Economics
  • Dissertation used instrumental variables
    • State child labor laws as instrumental variables for the causal effect of youth employment on antisocial behavior
topics that will be covered in this workshop
Topics That Will Be Covered in this Workshop
  • Why use IV?
    • Discussion of endogeneity bias
    • Statistical motivation for IV
  • What is an IV?
    • Identification issues
    • Statistical properties of IV estimators
  • How is an IV model estimated?
    • Software and data examples
    • Diagnostics: IV relevance, IV exogeneity, Hausman
review of the linear model
Review of the Linear Model
  • Population model: Y = α + βX + ε
    • Assume that the true slope is positive, so β > 0
  • Sample model: Y = a + bX + e
    • Least squares (LS) estimator of β:

bLS= (X′X)–1X′Y = Cov(X,Y) / Var(X)

  • Under what conditions can we speak of bLS as a causal estimate of the effect of X on Y?
review of the linear model1
Review of the Linear Model
  • Key assumption of the linear model:

E(X′e) = Cov(X,e) = E(e | X) = 0

    • Exogeneity assumption = X is uncorrelated with the unobserved determinants of Y
  • Important statistical property of the LS estimator under exogeneity:

E(bLS) =β + Cov(X,e) / Var(X)

plim(bLS) =β + Cov(X,e) / Var(X)

Second terms 0, so bLS unbiased and consistent

endogeneity and the evaluation problem
Endogeneity and the Evaluation Problem
  • When is the exogeneity assumption violated?
    • Measurement error → Attenuation bias
    • Instantaneous causation → Simultaneity bias
    • Omitted variables → Selection bias
  • Selection bias is the problem in observational research that undermines causal inference
    • Measurement error and instantaneous causation can be posed as problems of omitted variables
when is the exogeneity assumption violated






When Is the Exogeneity Assumption Violated?

(1)Measurement error in X (u) that is correlated with M.E. in Y (v) or with the model error (e)

  • Classical M.E. leads to attenuation, 0 < E(bLS) < β, but non-random M.E. (or correlation between M.E. and X, Y, V, and/or e) introduces unknown biases

And, if there are multiple X’s, bias contaminates the whole model, not just the coefficient on the X measured with error (a.k.a. “smearing”)

when is the exogeneity assumption violated1



When Is the Exogeneity Assumption Violated?

(2)Instantaneous causation of Y on X

  • Direction of the bias depends on what the sign is for the feedback effect, Y → X
    • If positive, E(bLS) > β, so overestimate true effect
    • If negative, E(bLS) < β, so underestimate true effect and in severe cases can even flip the sign so that E(bLS) < 0 even though β > 0

This non-recursivity complicates the relationship between price and quantity in economics

when is the exogeneity assumption violated2




When Is the Exogeneity Assumption Violated?

(3) Omitted variable (W) that is correlated with both X and Y

  • Classic problem of omitted variables bias
    • Coefficient on X will absorb the indirect path through W, whose sign depends on Cov(X,W) and Cov(W,Y)

Things more complicated in applied settings because there are bound to be many W’s, not to mention that the “smearing” problem applies in this context also

example 1 police hiring
Example #1: Police Hiring
  • Measurement error
    • Mobilization of sworn officers (M.E. in X) as well as differential victim reporting or crime recording (M.E. in Y) may be correlated with police size
  • Instantaneous causation
    • More police might be hired during a crime wave
  • Omitted variables
    • Large departments may differ in fundamental ways difficult to measure (e.g., urban, heterogeneous)
example 2 sanction perceptions
Example #2: Sanction Perceptions
  • Measurement error
    • Measures of perceived sanction risk are probably “noisy” (M.E. in X), resulting in attenuation at best
  • Instantaneous causation
    • Perceptions are sensitive to the success/failure of criminal behavior, so feedback is negative
  • Omitted variables
    • Perceived risk probably correlated with unobserved determinants of crime (e.g., intelligence)
example 3 delinquent peers
Example #3: Delinquent Peers
  • Measurement error
    • Highly delinquent youth probably overestimate the delinquency of their peers (M.E. in X), and likely underestimate their own delinquency (M.E. in Y)
  • Instantaneous causation
    • If there is influence/imitation, then it is bidirectional
  • Omitted variables
    • High-risk youth probably select themselves into delinquent peer groups (“birds of a feather”)
regression estimation ignoring omitted variables
Regression EstimationIgnoring Omitted Variables
  • Suppose we estimate treatment effect model:

Y = α + βX + ε

    • Let’s assume without loss of generality that X is a binary “treatment” (= 1 if treated; = 0 if untreated)
  • Least squares estimator:

bLS = Cov(X,Y) / Var(X) = E(Y | X = 1) – E(Y | X = 0)

    • Simply the difference in means between “treated” units (X = 1) and “untreated” units (X = 0)
regression estimation ignoring omitted variables1
Regression EstimationIgnoring Omitted Variables
  • But suppose the population treatment effect model is instead:

Y = α + βX + (δW + ω)

    • Now the residual conveys information about W
  • Consider a plausible example
    • Y = crime, X = marriage, W = “marriageability”
      • “Marriageability” can be broadly construed to encompass earnings potential, desire for children, willingness to compromise, faithfulness, verbal communication skills,...
        • Including “signals” that individuals emit about these qualities
regression estimation ignoring omitted variables2

Impact of marriage-ability on crime


Difference in marriageability between married and unmarried


True impact of marriage on crime


Regression EstimationIgnoring Omitted Variables
  • What does LS estimate when W is omitted?

bLS = [C(X,Y)/V(X)] + [C(W,Y)/V(W)] × [C(X,W)/V(X)]

= β+ δ × [E(W | X = 1) – E(W | X = 0)]

  • Marriage effect on crime will be overestimated
    • IMPORTANT: Even if β = 0, bLS < 0
regression estimation ignoring omitted variables3
Regression EstimationIgnoring Omitted Variables
  • So...

bLS = β+ δ × [E(W | X = 1) – E(W | X = 0)]

  • Estimate of β is unbiased if and only if

1. Marriageability is uncorrelated with crime

δ = 0


2. Marriageability is “balanced” (i.e., equivalent) between married and unmarried subjects

E(W | X = 1) = E(W | X = 0)

omitted variables in criminological research
Omitted Variables in Criminological Research
  • What variables of interest to criminologists are surely endogenous?
    • Micro = Employment, education, marriage, military service, fertility, conviction, family structure,....
    • Macro = Poverty, unemployment rate, collective efficacy, immigrant concentration,....
  • Basically, EVERYTHING!
    • (I’m sorry to be the one to break it to you)
traditional strategies to deal with omitted variables
Traditional Strategies to Deal with Omitted Variables
  • Randomization (physical control)
    • Achieves balance (in expectation) on any and all potential W’s
    • Control variables are technically unnecessary
  • Covariate adjustment (statistical control)
    • Control for potential W’s in a regression model
    • But...we have no idea how many W’s there are, so model misspecification is still a real problem here
quasi experimental strategies to deal with omitted variables
Quasi-Experimental Strategies to Deal with Omitted Variables
  • Difference in differences (fixed-effects model)
    • Requires panel data
  • Propensity score matching
    • Requires a lot of measured background variables
      • Similar to covariate adjustment, but only the treated and untreated cases which are “on support” are utilized
  • Instrumental variables estimation
    • Requires an exclusion restriction
instrumental variables estimation is a viable approach






Instrumental Variables Estimation Is a Viable Approach
  • An “instrumental variable” for X is one solution to the problem of omitted variables bias
  • Requirements for Z to be a valid instrument for X
    • Relevant = Correlated with X
    • Exogenous = Not correlated with Y but through its correlation with X
important point about instrumental variables models
Important Point about Instrumental Variables Models
  • I often hear...“A good instrument should not be correlated with the dependent variable”
    • WRONG!!!
  • Z has to be correlated with Y, otherwise it is useless as an instrument
    • It can only be correlated with Y through X
  • A good instrument must not be correlated with the unobserved determinants of Y
important point about instrumental variables models1




Important Point about Instrumental Variables Models
  • Not all of the available variation in X is used
    • Only that portion of X which is “explained” by Z is used to explain Y

X = Endogenous variable

Y = Response variable

Z = Instrumental variable

important point about instrumental variables models2







Important Point about Instrumental Variables Models

Best-case scenario: A lot of X is explained by Z, and most of the overlap between X and Y is accounted for

Realistic scenario: Very little of X is explained by Z, or what is explained does not overlap much with Y

important point about instrumental variables models3
Important Point about Instrumental Variables Models
  • The IV estimator is BIASED
    • In other words, E(bIV) ≠β (finite-sample bias)
    • The appeal of IV derives from its consistency
      • “Consistency” is a way of saying that E(b) → β as N → ∞
      • So…IV studies often have very large samples
    • But with endogeneity, E(bLS) ≠β and plim(bLS) ≠β anyway
  • Asymptotic behavior of IV

plim(bIV) =β + Cov(Z,e) / Cov(Z,X)

    • If Z is truly exogenous, then Cov(Z,e) = 0
instrumental variables terminology












Instrumental Variables Terminology
  • Three different models to be familiar with
    • First stage: X = α0 + α1Z + ω
    • Structural model: Y = β0 + β1X + ε
    • Reduced form: Y = δ0 + δ1Z + ξ
  • An interesting equality:

δ1 = α1×β1


β1 = δ1 / α1

different types of instrumental variables estimators
Different Types of Instrumental Variables Estimators
  • Wald estimator for binary instrument:

bWald= [E(Y | Z = 1) – E(Y | Z = 0)] / [E(X | Z = 1) – E(X | Z = 0)]

    • Difference in response ÷ Difference in treatment
  • Instrumental variables (IV) estimator:

bIV= (Z′X)–1Z′Y = Cov(Z,Y) / Cov(Z,X)

    • Shows that bIV can be recovered from two samples
  • Two-stage least squares (2SLS) estimator:

b2SLS= (X̃′X̃)–1X̃′Y = Cov(X̃,Y) / Var(X̃)

    • X̃ represents “fitted” value from first-stage model
different types of instrumental variables estimators1
Different Types of Instrumental Variables Estimators
  • Single binary instrument and no control variables...

bWald = bIV = b2SLS

  • Single instrument (binary or continuous) with or without control variables...

bIV = b2SLS

  • Multiple instruments (binary or continuous) with or without control variables...


more on the method of two stage least squares 2sls
More on the Method of Two-Stage Least Squares (2SLS)
  • Step 1: X = a0 + a1Z1 + a2Z2 +  + akZk + u
    • Obtain fitted values (X̃) from the first-stage model
  • Step 2: Y = b0 + b1X̃ + e
    • Substitute the fitted X̃ in place of the original X
    • Note: If done manually in two stages, the standard errors are based on the wrong residual

e = Y – b0 – b1X̃ when it should be e = Y – b0 – b1X

  • Best to just let the software do it for you
including control variables in an iv 2sls model
Including Control Variables in an IV/2SLS Model
  • Control variables (W’s) should be entered into the model at both stages
    • First stage: X = a0 + a1Z + a2W + u
    • Second stage: Y = b0 + b1X̃ + b2W + e
  • Control variables are considered “instruments,” they are just not “excluded instruments”
    • They serve as their own instrument
functional form considerations with iv 2sls
Functional Form Considerations with IV/2SLS
  • Binary endogenous regressor (X)
    • Consistency of second-stage estimates do not hinge on getting first-stage functional form correct
  • Binary response variable (Y)
    • IV probit (or logit) is feasible but is technically unnecessary
  • In both cases, linear model is tractable, easily interpreted, and consistent
    • Although variance adjustment is well advised
functional form considerations with iv 2sls1
Functional Form Considerations with IV/2SLS
  • Quadratic second stage with a continuous endogenous regressor
    • Entering first-stage fitted values and their square into second-stage model leads to inconsistency
      • The square of a linear projection is not equivalent to a linear projection on a quadratic
    • Squares and cross-products of IV’s should be treated as additional instruments
      • Kelejian (1971)
    • Linear and squared X’s are treated as two different endogenous regressors
technical conditions required for model identification
Technical Conditions Required for Model Identification
  • Order condition = At least the same # of IV’s as endogenous X’s
    • Just-identified model: # IV’s = # X’s
    • Overidentified model: # IV’s > # X’s
  • Rank condition = At least one IV must be significant in the first-stage model
    • Number of linearly independent columns in a matrix
      • E(X | Z,W) cannot be perfectly correlated with E(X | W)
statistical inference with iv
Statistical Inference with IV
  • Variance estimation

σ2βLS = σ2ε / SSTX

σ2βIV= σ2ε / (SSTX R2X,Z)


ε = Y – β0 – β1X

  • NOTICE: Because R2X,Z < 1  sbIV > sbLS
    • IV standard errors tend to be large, especially when R2X,Z is very small, which can lead to type II errors
instrumental variables and randomized experiments
Instrumental Variables and Randomized Experiments
  • Imperfect compliance in randomized trials
    • Some individuals assigned to treatment group will not receive Tx, and some assigned to control group will receive Tx
      • Assignment error; subject refusal; investigator discretion
    • Some individuals who receive Tx will not change their behavior, and some who do not receive Tx will change their behavior
      • A problem in randomized job training studies and other social experiments (e.g., housing vouchers)
instrumental variables and randomized experiments1
Instrumental Variables and Randomized Experiments
  • Two different measures of treatment (X)
    • Treatment assigned = Exogenous
      • Intention-to-treat (ITT) analysis
        • Reduced-formmodel: Y = δ0 + δ1Z + ξ
      • Often leads to underestimation of treatment effect
    • Treatment delivered = Endogenous
      • Individuals who do not comply probably differ in ways that can undermine the study
      • Self-selection  bias and inconsistency
angrist 2006 j e c
Angrist (2006), J.E.C.
  • Minneapolis D.V. experiment
    • Sherman and Berk (1984)
      • Cases of male-on-female misdemeanor assault in two high-density precincts, in which both parties present at scene
    • Random assignment of arrest-mediation-separation
    • But...treatment assigned was not treatment delivered
      • Fidelity vis-à-vis arrest, but many subjects (~25%) assigned to mediation/separation were arrested
        • “Upgrading” was more likely when suspect was rude, suspect assaulted officer, weapons were involved, victim persistently demanded arrest, and incident violated restraining order
angrist 2006 j e c1













Angrist (2006), J.E.C.
angrist 2006 j e c2
Angrist (2006), J.E.C.
  • Estimates of effect of arrest (vs. mediate or separate) on D.V. recividism (Tables 2, 3)
    • OLS: b = –.070 (s.e. = .038)
    • ITT: b = –.108 (s.e. = .041)
    • 2SLS: b = –.140 (s.e. = .053)
  • Deterrent effect of arrest is twice as large in 2SLS as opposed to OLS
    • In this context, 2SLS is known as a “local average treatment effect” (I’ll come back to this)
sexton and hebel 1984 j a m a
Sexton and Hebel (1984), J.A.M.A.
  • Maternal smoking and birth weight
    • Sexton and Hebel (1984)
      • Sample of pregnant women who were confirmed smokers, recruited from prenatal care registrants
        • At least 10 cigarettes per day and not past 18th week
    • Random assignment of staff assistance in a smoking cessation program
      • Personal visits; telephone and mail contacts
    • But...some smokers in treatment group did not quit and some smokers in control group did quit
sexton and hebel 1984 j a m a1












Sexton and Hebel (1984), J.A.M.A.
sexton and hebel 1984 j a m a2
Sexton and Hebel (1984), J.A.M.A.

(1) First-stage model

Mean cigarettes smoked:

Treatment = 6.4

Control = 12.8

First-stage effect: bFS = –6.4

(2) Reduced-form model

Mean birth weight:

Treatment = 3,278g

Control = 3,186g

Reduced-form effect: bRF = 92

(3) Structural model

Effect of smoking frequency on mean birth weight:

bIV = 92 / –6.4 = –14.4g

Each cigarette reduces birth weight by 14.4 grams

sexton and hebel 1984 j a m a3
Sexton and Hebel (1984), J.A.M.A.
  • As an interesting aside, it’s also possible to estimate the effect of continuing smoking (vs. quitting) from the data
    • First stage: bFS = –0.23 (57% vs. 80% smokers)
    • Reduced form: bRF = 92g
    • Structural: bIV = 92 / –0.23 = –400g
  • Women who kept smoking by the 8th month of pregnancy bore children who were 400 grams lighter, on average
permutt and hebel 1989 biometrics
Permutt and Hebel (1989), Biometrics
  • Estimates of the effect of smoking frequency (in 8th month) on birth weight
    • OLS: b = 2g (s.e. not reported)
    • 2SLS: b = –14g (s.e. = 7g)
  • Here as well, 2SLS yields the “local average treatment effect” of smoking on birth weight
instrumental variables and local average treatment effects
Instrumental Variables and Local Average Treatment Effects
  • Definition of a L.A.T.E.
    • The average treatment effect for individuals “who can be induced to change [treatment] status by a change in the instrument”
      • Imbens and Angrist (1994, p. 470)
    • The average causal effect of X on Y for “compliers,” as opposed to “always takers” or “never takers”
      • Not a particularly well-defined (sub)population
  • L.A.T.E. is instrument-dependent, in contrast to the population A.T.E.
l a t e in the previous two examples
L.A.T.E. in the Previous Two Examples
  • In the D.V. study...
    • For men who were arrested as per the experimental protocol, arrest resulted in a mean 14-point decline in the probability of recidivism compared to non-arrest interventions
  • In the maternal smoking study...
    • For women who reduced their smoking frequency because they were assigned to the intervention, each one-cigarette reduction resulted in a 14-gram increase in birth weight (from mean 11 cigarettes)