Identification: Instrumental Variables

Identification: Instrumental Variables Ziyodullo Parpiev, PhD Delivered at Summer school 2017 Tashkent, Uzbekistan June 16, 2017

Why do we need IV? Internal Validity Problems • Independent variables are correlated with the error term. • Three types relevant here: • Errors-in-variables • Omitted Variable Bias • These 2 usually solved by adding omitted variable or correcting error, but what if no additional data? • Simultaneous Causality (Endogeneity) • When X  Y AND Y  X • Simple OLS picks up both effects and produces biased estimate of causal effect.

What is the IV Technique? • When you have endogeneity problem, you want to somehow separate out the part of the independent variable that is correlated with the error term. • Once that part is separated out, you can get an unbiased causal estimate of the effect of the “uncorrelated portion” of the independent variable on the dependent variable of interest.

IV: basic idea Consider the following regression model: yi = β0 + β1 Xi + ei Variation in the endogenous regressor Xi has two parts • the part that is uncorrelated with the error (“good” variation) • the part that is correlated with the error (“bad” variation) The basic idea behind instrumental variables regression is to isolate the “good” variation and disregard the “bad” variation

IV: conditions for a valid instrument The first step is to identify a valid instrument A variable Zi is a valid instrument for the endogenous regressor Xi if it satisfies two conditions: 1. Relevance: corr (Zi , Xi) ≠ 0 • Exogeneity: corr (Zi , ei) = 0

IV: two-stage least squares The most common IV method is two-stage least squares (2SLS) Stage 1: Decompose Xi into the component that can be predicted by Zi and the problematic component Xi = 0 + 1 Zi + i Stage 2: Use the predicted value of Xi from the first-stage regression to estimate its effect on Yi yi = 0 + 1 X-hati + i Note: software packages like Stata perform the two stages in a single regression, producing the correct standard errors

Z as an instrument for X

Clear?

Evaluating Instruments • Two conditions: • Instrument Relevance – IV is correlated with the problematic independent variable: • corr (Zi , Xi) ≠ 0 • Instrument Exogeneity – IV is NOT correlated with the error term: • corr (Zi , ei) = 0

Evaluating Instruments • # POLICE  CRIME (Steven Levitt 1997) • Simple OLS gives positive result – increase number of police, increase crime • Why?

Evaluating Instruments • # POLICE  CRIME (Steven Levitt 1997) • Simple OLS gives positive result – increase number of police, increase crime • Why? • Instrument: Was there a mayoral election in the year the measurements were taken? • IV regression gives expected negative result – increase number of police, decrease crime • Why is this a good instrument?

IV: example Two-stage least squares: Stage 1: Decompose police hires into the component that can be predicted by the electoral cycle and the problematic component policei = 0 + 1 electioni + i Stage 2: Use the predicted value of policei from the first-stage regression to estimate its effect on crimei crimei = 0 + 1 police-hati + i Finding: an increased police force reduces violent crime (but has little effect on property crime)

IV: number of instruments There must be at least as many instruments as endogenous regressors Let k = number of endogenous regressors m = number of instruments The regression coefficients are exactly identified if m=k (OK) overidentified if m>k (OK) underidentified if m<k (not OK)

IV: testing instrument relevance How do we know if our instruments are valid? Recall our first condition for a valid instrument: 1. Relevance: corr (Zi , Xi) ≠ 0 Stock and Watson’s rule of thumb: the first-stage F-statistic testing the hypothesis that the coefficients on the instruments are jointly zero should be at least 10 (for a single endogenous regressor) A small F-statistic means the instruments are “weak” (they explain little of the variation in X) and the estimator is biased

IV: testing instrument exogeneity Recall our second condition for a valid instrument: 2. Exogeneity: corr (Zi , ei) = 0 If you have the same number of instruments and endogenous regressors, it is impossible to test for instrument exogeneity But if you have more instruments than regressors: Overidentifying restrictions test – regress the residuals from the 2SLS regression on the instruments (and any exogenous control variables) and test whether the coefficients on the instruments are all zero

IV: drawbacks of this method It can be difficult to find an instrument that is both relevant (not weak) and exogenous Assessment of instrument exogeneity can be highly subjective when the coefficients are exactly identified IV can be difficult to explain to those who are unfamiliar with it

X1 X2 Y1 Y2 Closing Comments about Instrumental Variables Studies • In general, a lagged value of the endogenous regressor is not a good instrument • Traditional structural equation model uses lagged values of X and Y as instruments to break the simultaneity between the current values of X and Y These models impose the awfully strong assumption that lagged values of X and Y only affect the outcomes through current values

Closing Comments about Instrumental Variables Studies • Good IV models are generally interesting in their own right, and should not be treated as “tack on” analyses • Practice varies widely across disciplines • Some researchers write papers about their discovery and application of a “clever” IV for some problem • Other researchers “tack on” IV models at the end of their analysis, often poorly, as a way to convince readers that their results are robust

Rules for Good Practice with Instrumental Variables Models • IV models can be very informative, but it’s your job to convince your audience • Show the first-stage model diagnostics • Even the most clever IV might not be sufficiently strongly related to X to be a useful source of identification • Report test(s) of overidentifying restrictions • An invalid IV is often worse than no IV at all • Report LS endogeneity (DWH) test

Rules for Good Practice with Instrumental Variables Models • Most importantly, TELL A STORY about why a particular IV is a “good instrument” • Something to consider when thinking about whether a particular IV is “good” • Does the IV, for all intents and purposes, randomize the endogenous regressor?

Identification: Instrumental Variables