non experimental data natural experiments and more on iv n.
Skip this Video
Loading SlideShow in 5 Seconds..
Non-Experimental Data: Natural Experiments and more on IV PowerPoint Presentation
Download Presentation
Non-Experimental Data: Natural Experiments and more on IV

Loading in 2 Seconds...

play fullscreen
1 / 41

Non-Experimental Data: Natural Experiments and more on IV - PowerPoint PPT Presentation

  • Uploaded on

Non-Experimental Data: Natural Experiments and more on IV. Non-Experimental Data. Refers to all data that has not been collected as part of experiment Quality of analysis depends on how well one can deal with problems of: Omitted variables Reverse causality Measurement error selection

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Non-Experimental Data: Natural Experiments and more on IV' - mandar

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
non experimental data
Non-Experimental Data
  • Refers to all data that has not been collected as part of experiment
  • Quality of analysis depends on how well one can deal with problems of:
    • Omitted variables
    • Reverse causality
    • Measurement error
    • selection
  • Or… how close one can get to experimental conditions
natural quasi experiments
Natural/ ‘Quasi’ Experiments
  • Used to refer to situation that is not experimental but is ‘as if’ it was
  • Not a precise definition – saying your data is a ‘natural experiment’ makes it sound better
  • Refers to case where variation in X is ‘good variation’ (directly or indirectly via instrument)
  • A Famous Example: London, 1854
the case of the broad street pump
The Case of the Broad Street Pump
  • Regular cholera epidemics in 19th century London
  • Widely believed to be caused by ‘bad air’
  • John Snow thought ‘bad water’ was cause
  • Experimental design would be to randomly give some people good water and some bad water
  • Ethical Problems with this
soho outbreak august september 1854
Soho Outbreak August/September 1854
  • People closest to Broad Street Pump most likely to die
  • But breathe same air so does not resolve air vs. water hypothesis
  • Nearby workhouse had own well and few deaths
  • Nearby brewery had own well and no deaths (workers all drank beer)
why is this a natural experiment
Why is this a Natural experiment?
  • Variation in water supply ‘as if’ it had been randomly assigned – other factors (‘air’) held constant
  • Can then estimate treatment effect using difference in means
  • Or run regression of death on water source distance to pump, other factors
  • Strongly suggests water the cause
  • Woman died in Hampstead, niece in Islington
what s that got to do with it
What’s that got to do with it?
  • Aunt liked taste of water from Broad Street pump
  • Had it delivered every day
  • Niece had visited her
  • Investigation of well found contamination by sewer
  • This is non-experimental data but analysed in a way that makes a very powerful case – no theory either
methods for analysing data from natural experiments
Methods for Analysing Data from Natural Experiments
  • If data is ‘as if’ it were experimental then can use all techniques described for experimental data
    • OLS (perhaps Snow case)
    • IV to get appropriate units of measurement
  • Will say more about IV than OLS
    • IV perhaps more common
    • If can use OLS not more to say
    • With IV there is more to say – weak instruments
conditions for instrument validity
Conditions for Instrument Validity
  • To be valid instrument:
    • Must be correlated with X - testable
    • Must be uncorrelated with ‘error’ – untestable – have to argue case for this assumption
  • These conditions guaranteed with instrument for experimental data
  • But more problematic for data from quasi-experiments
bombs bones and breakpoints the geography of economic activity davis and weinstein aer 2002
Bombs, Bones and Breakpoints:The Geography of Economic Activity Davis and Weinstein, AER, 2002
  • Existence of agglomerations (e.g. cities) a puzzle
  • Land and labour costs higher so why don’t firms relocate to increase profits
  • Must be some compensatory productivity effect
  • Different hypotheses about this:
    • Locational fundamentals
    • Increasing returns (Krugman) – path-dependence
testing these hypotheses
Testing these Hypotheses
  • Consider a temporary shock to city population
  • Locational fundamentals theory would predict no permanent effect
  • Increasing returns would suggest permanent effect
  • Would like to do experiment of randomly assigning shocks to city size
  • This is not going to happen
the davis weinstein idea
The Davis-Weinstein idea
  • Use US bombing of Japanese cities in WW2
  • This is a ‘natural experiment’ not a true experiment because:
    • WW2 not caused by desire to test theories of economic geography
    • Pattern of US bombing not random
  • Sample is 303 Japanese cities, data is:
    • Population before and after bombing
    • Measures of destruction
basic equation
Basic Equation
  • Δsi,47-40 is change in population just before and after war
  • Δsi,60-47 is change in population at later period
  • How to test hypotheses:
    • Locational fundamentals predicts β1=-1
    • Increasing returns predicts β1=0
the iv approach
The IV approach
  • Δsi,47-40 might be influenced by both permanent and temporary factors
  • Only want part that is transitory shock caused by war damage
  • Instrument Δsi,47-40 by measures of death and destruction
why do we need first stage
Why Do We Need First-Stage?
  • Establishes instrument relevance – correlation of X and Z
  • Gives an idea of how strong this correlation is – ‘weak instrument’ problem
  • In this case reported first-stage not obviously that implicit in what follows
    • That would be bad practice
why are these other variables included
Why Are these other variables included?
  • Potential criticisms of instrument exogeneity
    • Government post-war reconstruction expenses correlated with destruction and had an effect on population growth
    • US bombing heavier of cities of strategic importance (perhaps they had higher growth rates)
  • Inclusion of the extra variables designed to head off these criticisms
  • Assumption is that of exogeneity conditional on the inclusion of these variables
  • Conclusion favours locational fundamentals view
an additional piece of supporting evidence
An additional piece of supporting evidence….
  • Always trying to build a strong evidence base – many potential ways to do this, not just estimating equations
the problem of weak instruments
The Problem of Weak Instruments
  • Say that instruments are ‘weak’ if correlation between X and Z low (after inclusion of other exogenous variables)
  • Rule of thumb - If F-statistic on instruments in first-stage less than 10 then may be problem (will explain this a bit later)
why do weak instruments matter
Why Do Weak Instruments Matter?
  • A whole range of problems tend to arise if instruments are weak
  • Asymptotic problems:
    • High asymptotic variance
    • Small departures from instrument exogeneity lead to big inconsistencies
  • Finite-Sample Problems:
    • Small-sample distirbution may be very different from asymptotic one
      • May be large bias
      • Computed variance may be wrong
      • Distribution may be very different from normal
asymptotic problems i low precision
Asymptotic Problems I:Low precision
  • asymptotic variance of IV estimator is larger the weaker the instruments
  • Intuition – variance in any estimator tends to be lower the bigger the variation in X – think of σ2(X’X)-1
  • IV only uses variation in X that is associated with Z
  • As instruments get weaker using less and less variation in X
asymptotic problems ii small departures from instrument exogeneity lead to big inconsistencies
Asymptotic Problems II:Small Departures from Instrument Exogeneity Lead to Big Inconsistencies
  • Suppose true causal model is


So possibly direct effect of Z on y.

  • Instrument exogeneity is γ=0.
  • Obviously want this to be zero but might hope that no big problem if ‘close to zero’ – a small deviation from exogeneity
but this will not be the case if instruments weak consider just identified case
But this will not be the case if instruments weak… consider just-identified case
  • If instruments weak then ΣZX small so ΣZX-1 large so γ multiplied by a large number
an example the return to education
An Example: The Return to Education
  • Economists long-interested in whether investment in human capital a ‘good’ investment
  • Some theory shows that coefficient on s in regression:


Is measure of rate of return to education

  • OLS estimates around 8% - suggests very good investment
  • Might be liquidity constraints
  • Might be bias
potential sources of bias
Potential Sources of Bias
  • Most commonly mentioned is ‘ability bias’
  • Ability correlated with earnings independent of education
  • Ability correlated with education
  • If ability omitted from ‘x’ variables then usual formula for omitted variables bias suggests upward bias in OLS estimate
potential solution
Potential Solution
  • Find an instrument correlated with education but uncorrelated with ‘ability’ (or other excluded variables)
  • Angrist-Krueger “Does Compulsory Schooling Attendance Affect Schooling and Earnings” , QJE 1991, suggest using quarter of birth
  • Argue correlated with education because of school start age policies and school leaving laws (instrument relevance)
  • Don’t have to accept this – can test it
in this case
In this case…
  • Their instrument is binary so IV estimator can be written in Wald form
  • And this leads to following expression for potential inconsistency:
  • Note denominator is difference in schooling for those born in first- and other quarters
  • Instrument will be ‘weak’ if this difference is small
interpretation and potential criticism
Interpretation (and Potential Criticism)
  • IV estimates not much below OLS estimates (higher in one case)
  • Suggests ‘ability bias’ no big deal
  • But instrument is weak
  • Being born in 1st quarter reduces education by 0.1 years
  • Means ‘γ’ will be multiplied by 10
but why should we have 0
But why should we have γ≠0
  • Remember this would imply a direct effect of quarter of birth on earnings, not just one that works through the effect on education
  • Bound, Jaeger and Baker argued that evidence that quarter of birth correlated with:
    • Mental and physical health
    • Socioeconomic status of parents
  • Unlikely that any effects are large but don’t have to be when instruments are weak
an example uk data
An example: UK data

Effect is small but significantly different from zero

a back of the envelope calculation
A Back-of-the-Envelope Calculation
  • Being born in first quarter means 0.01 less likely to have a managerial/professional parent
  • Being a manager/professional raises log earnings by 0.64
  • Correlation between earnings of children and parents 0.4
  • Effect on earnings through this route 0.01*0.64*0.4=0.00256 i.e. ¼ of 1 per cent
  • Small but weak instrument causes effect on inconsistency of IV estimate to be multiplied by 10 – 0.0256
  • Now large relative to OLS estimate of 0.08
  • Small deviations from instrument exogeneity lead to big inconsistencies in IV estimate if instruments are weak
  • Suspect this is often of great practical importance
  • Quite common to use ‘odd’ instrument – argue that ‘no reason to believe’ it is correlated with ε but show correlation with X
finite sample problems
Finite Sample Problems
  • This is a very complicated topic
  • Exact results for special cases, approximations for more general cases
  • Hard to say anything that is definitely true but can give useful guidance
  • Problems in 3 areas
    • Bias
    • Incorrect measurement of variance
    • Non-normal distribution
  • But really all different symptoms of same thing
review and reminder
Review and Reminder
  • If ask STATA to estimate equation by IV
  • Coefficients compute using formula given
  • Standard errors computed using formula for asymptotic variance
  • T-statistics, confidence intervals and p-values computed using assumption that estimator is unbiased with variance as computed and normally distributed
  • All are asymptotic results
difference between asymptotic and finite sample distributions
Difference between asymptotic and finite-sample distributions
  • This is normal case
  • Only in special cases e.g. linear regression model with normally distributed errors are small-sample and asymptotic distributions the same.
  • Difference likely to be bigger
    • The smaller the sample size
    • The weaker the instruments
rule of thumb for weak instruments
Rule of Thumb for Weak Instruments
  • F-test for instruments in first-stage >10
  • Stricter than significant e.g. if one instrument F=10 equivalent to t=3.3
  • Natural experiments useful source of knowledge
  • Often requires use of IV
  • Instrument exogeneity and relevance need justification
  • Weak instruments potentially serious
  • Good practice to present first-stage regression
  • Finding more robust alternative to IV an active research area