Loading in 2 Seconds...

Differences-in-Differences and A Brief Introduction to Panel Data

Loading in 2 Seconds...

- 56 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Differences-in-Differences and A Brief Introduction to Panel Data' - sloane-bryan

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

The Grand Experiment

- Water supplied to households by competing private companies
- Sometimes different companies supplied households in same street
- In south London two main companies:
- Lambeth Company (water supply from Thames Ditton, 22 miles upstream)
- Southwark and Vauxhall Company (water supply from Thames)

In 1853/54 cholera outbreak

- Death Rates per 10000 people by water company
- Lambeth 10
- Southwark and Vauxhall 150
- Might be water but perhaps other factors
- Snow compared death rates in 1849 epidemic
- Lambeth 150
- Southwark and Vauxhall 125
- In 1852 Lambeth Company had changed supply from Hungerford Bridge

This is basic idea of Differences-in-Differences

- Have already seen idea of using differences to estimate causal effects
- Treatment/control groups in experimental data
- Often would like to find ‘treatment’ and ‘control’ group who can be assumed to be similar in every way except receipt of treatment
- This may be very difficult to do

A Weaker Assumption is..

- Assume that, in absence of treatment, difference between ‘treatment’ and ‘control’ group is constant over time
- With this assumption can use observations on treatment and control group pre- and post-treatment to estimate causal effect
- Idea
- Difference pre-treatment is ‘normal’ difference
- Difference pre-treatment is ‘normal’ difference + causal effect
- Difference-in-difference is causal effect

What is D-in-D estimate?

- Standard differences estimator is AB
- But ‘normal’ difference estimated as CB
- Hence D-in-D estimate is AC
- Note: assumes trends in outcome variables the same for treatment and control groups
- This is not testable
- with two periods can get no idea of plausibility but can with more periods

Some Notation

- Define:

μit=E(yit)

Where i=0 is control group, i=1 is treatment

Where t=0 is pre-period, t=1 is post-period

- Standard ‘differences’ estimate of causal effect is estimate of:

μ11-μ01

- ‘Differences-in-Differences’ estimate of causal effect is estimate of:

(μ11-μ01)-(μ10-μ00)

How to estimate?

- Can write D-in-D estimate as:

(μ11-μ10)-(μ01 -μ00)

- This is simply the difference in the change of treatment and control groups so can estimate as:

This is simply ‘differences’ estimator applied to the difference

- To implement this need to have repeat observations on the same individuals
- May not have this – individuals observed pre- and post-treatment may be different
- What can we do in this case?

In this case can estimate….

- D-in-D estimate is estimate of β3 – why is this?

A Comparison of the Two Methods

- Where have repeated observations could use both methods
- Will give same parameter estimates
- But will give different standard errors
- ‘levels’ version will assume residuals are independent – unlikely to be a good assumption
- Can deal with this by:
- Clustering
- Or estimating ‘differences’ version

Other Regressors

- Can put in other regressors as before
- Perhaps should think about way in which they enter the estimating equation
- E.g. if level of W affects level of y then should include ΔW in differences version

Differential Trends in Treatment and Control Groups

- Key assumption underlying validity of D-in-D estimate is that differences between treatment and control group would have remained constant in absence of treatment
- Can never test this
- With only two periods can get no idea of plausibility
- But can with more than two periods

An Example:“Vertical Relationships and Competition in Retail Gasoline Markets”, by Justine Hastings, American Economic Review, 2004

- Interested in effect of vertical integration on retail petrol prices
- Investigates take-over in CA of independent ‘Thrifty’ chain of petrol stations by ARCO (more integrated)
- Defines treatment group as petrol stations which had a ‘Thrifty’ within 1 mile
- Control group those that did not
- Lots of reasons why these groups might be different so D-in-D approach seems a good idea

This picture contains relevant information…

- Can see D-in-D estimate of +5c per gallon
- Also can see trends before and after change very similar – D-in-D assumption valid

A Case which does not look so good…..Ashenfelter’s Dip

- Interested in effect of government-sponsored training (MDTA) on earnings
- Treatment group are those who received training in 1964
- Control group are random sample of population as a whole

Things to Note..

- Earnings for trainees very low in 1964 as training not working in that year – should ignore this year
- Simple D-in-D approach would compare earnings in 1965 with 1963
- But earnings of trainees in 1963 seem to show a ‘dip’ – so D-in-D assumption probably not valid
- Probably because those who enter training are those who had a bad shock (e.g. job loss)

Differences-in-Differences:Summary

- A very useful and widespread approach
- Validity does depend on assumption that trends would have been the same in absence of treatment
- Can use other periods to see if this assumption is plausible or not
- Uses 2 observations on same individual – most rudimentary form of panel data

A Brief Introduction to Panel Data

- Panel Data has both time-series and cross-section dimension – N individuals over T periods
- Will restrict attention to balanced panels – same number of observations on each individuals
- Whole books written about but basics can be understood very simply and not very different from what we have seen before
- Asymptotics typically done on large N, small T
- Use yit to denote variable for individual i at time t

The Pooled Model

- Can simply ignore panel nature of data and estimate:

yit=β’xit+εit

- This will be consistent if E(εit|xit)=0 or plim(X’ ε/N)=0
- But computed standard errors will only be consistent if errors uncorrelated across observations
- This is unlikely:
- Correlation between residuals of same individual in different time periods
- Correlation between residuals of different individuals in same time period (aggregate shocks)

A More Plausible Model

- Should recognise this as model with ‘group-level’ dummies or residuals
- Here, individual is a ‘group’

Three Models

- Fixed Effects Model
- Treats θi as parameter to be estimated (like β)
- Consistency does not require anything about correlation with xit
- Random Effects Model
- Treats θi as part of residual (like θ)
- Consistency does require no correlation between θi and xit
- Between-Groups Model
- Runs regression on averages for each individual

Proposition 5.2The fixed effect estimator of β will be consistent if:

- E(εit|xit)=0
- Rank(X,D)=N+K
- Proof: Simple application of what you should know about linear regression model

Intuition

- First condition should be obvious – regressors uncorrelated with residuals
- Second condition requires regressors to be of full rank
- Main way in which this is likely to fail in fixed effects model is if some regressors vary only across individuals and not over time
- Such a variable perfectly multicollinear with individual fixed effect

Estimating the Fixed Effects Model

- Can estimate by ‘brute force’ - include separate dummy variable for every individual – but may be a lot of them
- Can also estimate in mean-deviation form:

How does de-meaning work?

- Can do simple OLS on de-meaned variables
- STATA command is like:

. xtreg y x, fe i(id)

Problems with fixed effect estimator

- Only uses variation within individuals – sometimes called ‘within-group’ estimator
- This variation may be small part of total (so low precision) and more prone to measurement error (so more attenuation bias)
- Cannot use it to estimate effect of regressor that is constant for an individual

Random Effects Estimator

- Treats θi as part of residual (like θ)
- Consistency does require no correlation between θi and xit
- Should recognise as like model with clustered standard errors
- But random effects estimator is feasible GLS estimator

More on RE Estimator

- Will not describe how we compute Ω-hat – see Wooldridge
- STATA command

. xtreg y x, re i(id)

Proposition 5.3The random effects estimator of β will be consistent if:

- E(εit|xi1,..xit,.. xiT)=0
- E(θi|xi1,..xit,.. xiT)=0
- Rank(X’Ω-1X)=k
- Proof: RE estimator a special case of the feasible GLS estimator so conditions for consistency are the same.
- Error has two components so need a. and b.

Comments

- Assumption about exogeneity of errors is stronger than for FE model – need to assume εit uncorrelated with whole history of x – this is called strong exogeneity
- Assumption about rank condition weaker than for FE model e.g. can estimate effect variables that are constant for a given individual

Another reason why may prefer RE to FE model

- If exogeneity assumptions are satisfied RE estimate will be more efficient than FE estimator
- Application of general principle that imposing true restriction on data leads to efficiency gain.

Another Useful Result

- Can show that RE estimator can be thought of as an OLS regression of:
- On:
- Where:
- This is sometimes called quasi-time demeaning
- See Wooldridge (ch10, pp286-7) if want to know more

Between-Groups Estimator

- This takes individual means and estimates the regression by OLS:
- Stata command is xtreg y x, be i(id)
- Condition for consistency the same as for RE estimator
- But BE estimator less efficient as does not exploit variation in regressors for a given individual
- And cannot estimate variables like time trends whose average values do not vary across individuals
- So why would anyone ever use it – lets think about measurement error

Measurement Error in Panel Data Models

- Assume true model is:
- Where x is one-dimensional
- Assume E(εit|xi1,..xit,.. xiT)=0 and E(θi|xi1,..xit,.. xiT)=0 so that RE and BE estimators are consistent

Measurement Error Model

- Assume:
- where uit is classical measurement error, x*iis average value of x* for individual i and ηit is variation around the true value which is assumed to be uncorrelated with and uit and iid.
- We know this measurement error is likely to cause attenuation bias but this will vary between FE, RE and BE estimators.

Proposition 5.4

- For FE model we have:
- For BE model we have:
- For RE model we have:
- Where:

What should we learn from this?

- All rather complicated – don’t worry too much about details
- But intuition is simple
- Attenuation bias largest for FE estimator – Var(x*) does not appear in denominator – FE estimator does not use this variation in data

Attenuation bias larger for RE than BE estimator as T>1>κ

- The averaging in the BE estimator reduces the importance of measurement error.
- Important to note that these results are dependent on the particular assumption about the measurement error process and the nature of the variation in xit – things would be very different if measurement error for a given individual did not vary over time
- But general point is the measurement error considerations could affect choice of model to estimate with panel data

Time Effects

- Have treated time and individual dimensions asymmetrically – no good reason for this
- Errors likely to be correlated for different individuals in same time period – most common way to deal with this is to include set of time dummies:

Estimating Fixed Effects Model in Differences

- Can also get rid of fixed effect by differencing:

Comparison of two methods

- Estimate parameters by OLS on differenced data
- If only 2 observations then get same estimates as ‘de-meaning’ method
- But standard errors different
- Why?: assumption about autocorrelation in residuals

This leads to time series…

- Which is ‘better’ depends on which assumption is right – how can we decide this?
- We are not going to cover this in this course.

Download Presentation

Connecting to Server..