Loading in 2 Seconds...

Non-Experimental Data: Natural Experiments and more on IV

Loading in 2 Seconds...

- 88 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Non-Experimental Data: Natural Experiments and more on IV' - mandar

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Non-Experimental Data

- Refers to all data that has not been collected as part of experiment
- Quality of analysis depends on how well one can deal with problems of:
- Omitted variables
- Reverse causality
- Measurement error
- selection
- Or… how close one can get to experimental conditions

Natural/ ‘Quasi’ Experiments

- Used to refer to situation that is not experimental but is ‘as if’ it was
- Not a precise definition – saying your data is a ‘natural experiment’ makes it sound better
- Refers to case where variation in X is ‘good variation’ (directly or indirectly via instrument)
- A Famous Example: London, 1854

The Case of the Broad Street Pump

- Regular cholera epidemics in 19th century London
- Widely believed to be caused by ‘bad air’
- John Snow thought ‘bad water’ was cause
- Experimental design would be to randomly give some people good water and some bad water
- Ethical Problems with this

Soho Outbreak August/September 1854

- People closest to Broad Street Pump most likely to die
- But breathe same air so does not resolve air vs. water hypothesis
- Nearby workhouse had own well and few deaths
- Nearby brewery had own well and no deaths (workers all drank beer)

Why is this a Natural experiment?

- Variation in water supply ‘as if’ it had been randomly assigned – other factors (‘air’) held constant
- Can then estimate treatment effect using difference in means
- Or run regression of death on water source distance to pump, other factors
- Strongly suggests water the cause
- Woman died in Hampstead, niece in Islington

What’s that got to do with it?

- Aunt liked taste of water from Broad Street pump
- Had it delivered every day
- Niece had visited her
- Investigation of well found contamination by sewer
- This is non-experimental data but analysed in a way that makes a very powerful case – no theory either

Methods for Analysing Data from Natural Experiments

- If data is ‘as if’ it were experimental then can use all techniques described for experimental data
- OLS (perhaps Snow case)
- IV to get appropriate units of measurement
- Will say more about IV than OLS
- IV perhaps more common
- If can use OLS not more to say
- With IV there is more to say – weak instruments

Conditions for Instrument Validity

- To be valid instrument:
- Must be correlated with X - testable
- Must be uncorrelated with ‘error’ – untestable – have to argue case for this assumption
- These conditions guaranteed with instrument for experimental data
- But more problematic for data from quasi-experiments

Bombs, Bones and Breakpoints:The Geography of Economic Activity Davis and Weinstein, AER, 2002

- Existence of agglomerations (e.g. cities) a puzzle
- Land and labour costs higher so why don’t firms relocate to increase profits
- Must be some compensatory productivity effect
- Different hypotheses about this:
- Locational fundamentals
- Increasing returns (Krugman) – path-dependence

Testing these Hypotheses

- Consider a temporary shock to city population
- Locational fundamentals theory would predict no permanent effect
- Increasing returns would suggest permanent effect
- Would like to do experiment of randomly assigning shocks to city size
- This is not going to happen

The Davis-Weinstein idea

- Use US bombing of Japanese cities in WW2
- This is a ‘natural experiment’ not a true experiment because:
- WW2 not caused by desire to test theories of economic geography
- Pattern of US bombing not random
- Sample is 303 Japanese cities, data is:
- Population before and after bombing
- Measures of destruction

Basic Equation

- Δsi,47-40 is change in population just before and after war
- Δsi,60-47 is change in population at later period
- How to test hypotheses:
- Locational fundamentals predicts β1=-1
- Increasing returns predicts β1=0

The IV approach

- Δsi,47-40 might be influenced by both permanent and temporary factors
- Only want part that is transitory shock caused by war damage
- Instrument Δsi,47-40 by measures of death and destruction

Why Do We Need First-Stage?

- Establishes instrument relevance – correlation of X and Z
- Gives an idea of how strong this correlation is – ‘weak instrument’ problem
- In this case reported first-stage not obviously that implicit in what follows
- That would be bad practice

Why Are these other variables included?

- Potential criticisms of instrument exogeneity
- Government post-war reconstruction expenses correlated with destruction and had an effect on population growth
- US bombing heavier of cities of strategic importance (perhaps they had higher growth rates)
- Inclusion of the extra variables designed to head off these criticisms
- Assumption is that of exogeneity conditional on the inclusion of these variables
- Conclusion favours locational fundamentals view

An additional piece of supporting evidence….

- Always trying to build a strong evidence base – many potential ways to do this, not just estimating equations

The Problem of Weak Instruments

- Say that instruments are ‘weak’ if correlation between X and Z low (after inclusion of other exogenous variables)
- Rule of thumb - If F-statistic on instruments in first-stage less than 10 then may be problem (will explain this a bit later)

Why Do Weak Instruments Matter?

- A whole range of problems tend to arise if instruments are weak
- Asymptotic problems:
- High asymptotic variance
- Small departures from instrument exogeneity lead to big inconsistencies
- Finite-Sample Problems:
- Small-sample distirbution may be very different from asymptotic one
- May be large bias
- Computed variance may be wrong
- Distribution may be very different from normal

Asymptotic Problems I:Low precision

- asymptotic variance of IV estimator is larger the weaker the instruments
- Intuition – variance in any estimator tends to be lower the bigger the variation in X – think of σ2(X’X)-1
- IV only uses variation in X that is associated with Z
- As instruments get weaker using less and less variation in X

Asymptotic Problems II:Small Departures from Instrument Exogeneity Lead to Big Inconsistencies

- Suppose true causal model is

y=Xβ+Zγ+ε

So possibly direct effect of Z on y.

- Instrument exogeneity is γ=0.
- Obviously want this to be zero but might hope that no big problem if ‘close to zero’ – a small deviation from exogeneity

But this will not be the case if instruments weak… consider just-identified case

- If instruments weak then ΣZX small so ΣZX-1 large so γ multiplied by a large number

An Example: The Return to Education

- Economists long-interested in whether investment in human capital a ‘good’ investment
- Some theory shows that coefficient on s in regression:

y=β0+β1s+β2x+ε

Is measure of rate of return to education

- OLS estimates around 8% - suggests very good investment
- Might be liquidity constraints
- Might be bias

Potential Sources of Bias

- Most commonly mentioned is ‘ability bias’
- Ability correlated with earnings independent of education
- Ability correlated with education
- If ability omitted from ‘x’ variables then usual formula for omitted variables bias suggests upward bias in OLS estimate

Potential Solution

- Find an instrument correlated with education but uncorrelated with ‘ability’ (or other excluded variables)
- Angrist-Krueger “Does Compulsory Schooling Attendance Affect Schooling and Earnings” , QJE 1991, suggest using quarter of birth
- Argue correlated with education because of school start age policies and school leaving laws (instrument relevance)
- Don’t have to accept this – can test it

A graphical version of first-stage (correlation between education and Z)

In this case…

- Their instrument is binary so IV estimator can be written in Wald form
- And this leads to following expression for potential inconsistency:

- Note denominator is difference in schooling for those born in first- and other quarters
- Instrument will be ‘weak’ if this difference is small

Interpretation (and Potential Criticism)

- IV estimates not much below OLS estimates (higher in one case)
- Suggests ‘ability bias’ no big deal
- But instrument is weak
- Being born in 1st quarter reduces education by 0.1 years
- Means ‘γ’ will be multiplied by 10

But why should we have γ≠0

- Remember this would imply a direct effect of quarter of birth on earnings, not just one that works through the effect on education
- Bound, Jaeger and Baker argued that evidence that quarter of birth correlated with:
- Mental and physical health
- Socioeconomic status of parents
- Unlikely that any effects are large but don’t have to be when instruments are weak

An example: UK data

Effect is small but significantly different from zero

A Back-of-the-Envelope Calculation

- Being born in first quarter means 0.01 less likely to have a managerial/professional parent
- Being a manager/professional raises log earnings by 0.64
- Correlation between earnings of children and parents 0.4
- Effect on earnings through this route 0.01*0.64*0.4=0.00256 i.e. ¼ of 1 per cent
- Small but weak instrument causes effect on inconsistency of IV estimate to be multiplied by 10 – 0.0256
- Now large relative to OLS estimate of 0.08

Summary

- Small deviations from instrument exogeneity lead to big inconsistencies in IV estimate if instruments are weak
- Suspect this is often of great practical importance
- Quite common to use ‘odd’ instrument – argue that ‘no reason to believe’ it is correlated with ε but show correlation with X

Finite Sample Problems

- This is a very complicated topic
- Exact results for special cases, approximations for more general cases
- Hard to say anything that is definitely true but can give useful guidance
- Problems in 3 areas
- Bias
- Incorrect measurement of variance
- Non-normal distribution
- But really all different symptoms of same thing

Review and Reminder

- If ask STATA to estimate equation by IV
- Coefficients compute using formula given
- Standard errors computed using formula for asymptotic variance
- T-statistics, confidence intervals and p-values computed using assumption that estimator is unbiased with variance as computed and normally distributed
- All are asymptotic results

Difference between asymptotic and finite-sample distributions

- This is normal case
- Only in special cases e.g. linear regression model with normally distributed errors are small-sample and asymptotic distributions the same.
- Difference likely to be bigger
- The smaller the sample size
- The weaker the instruments

Rule of Thumb for Weak Instruments

- F-test for instruments in first-stage >10
- Stricter than significant e.g. if one instrument F=10 equivalent to t=3.3

Conclusion

- Natural experiments useful source of knowledge
- Often requires use of IV
- Instrument exogeneity and relevance need justification
- Weak instruments potentially serious
- Good practice to present first-stage regression
- Finding more robust alternative to IV an active research area

Download Presentation

Connecting to Server..