Differences in differences and a brief introduction to panel data
This presentation is the property of its rightful owner.
Sponsored Links
1 / 48

Differences-in-Differences and A Brief Introduction to Panel Data PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Differences-in-Differences and A Brief Introduction to Panel Data. John Snow again…. The Grand Experiment. Water supplied to households by competing private companies Sometimes different companies supplied households in same street In south London two main companies:

Download Presentation

Differences-in-Differences and A Brief Introduction to Panel Data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Differences in differences and a brief introduction to panel data

Differences-in-Differencesand A Brief Introduction to Panel Data

John snow again

John Snow again…

The grand experiment

The Grand Experiment

  • Water supplied to households by competing private companies

  • Sometimes different companies supplied households in same street

  • In south London two main companies:

    • Lambeth Company (water supply from Thames Ditton, 22 miles upstream)

    • Southwark and Vauxhall Company (water supply from Thames)

In 1853 54 cholera outbreak

In 1853/54 cholera outbreak

  • Death Rates per 10000 people by water company

    • Lambeth10

    • Southwark and Vauxhall150

  • Might be water but perhaps other factors

  • Snow compared death rates in 1849 epidemic

    • Lambeth150

    • Southwark and Vauxhall125

  • In 1852 Lambeth Company had changed supply from Hungerford Bridge

What would be good estimate of effect of clean water

What would be good estimate of effect of clean water?

This is basic idea of differences in differences

This is basic idea of Differences-in-Differences

  • Have already seen idea of using differences to estimate causal effects

    • Treatment/control groups in experimental data

  • Often would like to find ‘treatment’ and ‘control’ group who can be assumed to be similar in every way except receipt of treatment

  • This may be very difficult to do

A weaker assumption is

A Weaker Assumption is..

  • Assume that, in absence of treatment, difference between ‘treatment’ and ‘control’ group is constant over time

  • With this assumption can use observations on treatment and control group pre- and post-treatment to estimate causal effect

  • Idea

    • Difference pre-treatment is ‘normal’ difference

    • Difference pre-treatment is ‘normal’ difference + causal effect

    • Difference-in-difference is causal effect

A graphical representation










A Graphical Representation

What is d in d estimate

What is D-in-D estimate?

  • Standard differences estimator is AB

  • But ‘normal’ difference estimated as CB

  • Hence D-in-D estimate is AC

  • Note: assumes trends in outcome variables the same for treatment and control groups

  • This is not testable

  • with two periods can get no idea of plausibility but can with more periods

Some notation

Some Notation

  • Define:


    Where i=0 is control group, i=1 is treatment

    Where t=0 is pre-period, t=1 is post-period

  • Standard ‘differences’ estimate of causal effect is estimate of:


  • ‘Differences-in-Differences’ estimate of causal effect is estimate of:


How to estimate

How to estimate?

  • Can write D-in-D estimate as:

    (μ11-μ10)-(μ01 -μ00)

  • This is simply the difference in the change of treatment and control groups so can estimate as:

Differences in differences and a brief introduction to panel data

  • This is simply ‘differences’ estimator applied to the difference

  • To implement this need to have repeat observations on the same individuals

  • May not have this – individuals observed pre- and post-treatment may be different

  • What can we do in this case?

In this case can estimate

In this case can estimate….

  • D-in-D estimate is estimate of β3 – why is this?

A comparison of the two methods

A Comparison of the Two Methods

  • Where have repeated observations could use both methods

  • Will give same parameter estimates

  • But will give different standard errors

  • ‘levels’ version will assume residuals are independent – unlikely to be a good assumption

  • Can deal with this by:

    • Clustering

    • Or estimating ‘differences’ version

Other regressors

Other Regressors

  • Can put in other regressors as before

  • Perhaps should think about way in which they enter the estimating equation

  • E.g. if level of W affects level of y then should include ΔW in differences version

Differential trends in treatment and control groups

Differential Trends in Treatment and Control Groups

  • Key assumption underlying validity of D-in-D estimate is that differences between treatment and control group would have remained constant in absence of treatment

  • Can never test this

  • With only two periods can get no idea of plausibility

  • But can with more than two periods

Differences in differences and a brief introduction to panel data

An Example:“Vertical Relationships and Competition in Retail Gasoline Markets”, by Justine Hastings, American Economic Review, 2004

  • Interested in effect of vertical integration on retail petrol prices

  • Investigates take-over in CA of independent ‘Thrifty’ chain of petrol stations by ARCO (more integrated)

  • Defines treatment group as petrol stations which had a ‘Thrifty’ within 1 mile

  • Control group those that did not

  • Lots of reasons why these groups might be different so D-in-D approach seems a good idea

This picture contains relevant information

This picture contains relevant information…

  • Can see D-in-D estimate of +5c per gallon

  • Also can see trends before and after change very similar – D-in-D assumption valid

A case which does not look so good ashenfelter s dip

A Case which does not look so good…..Ashenfelter’s Dip

  • Interested in effect of government-sponsored training (MDTA) on earnings

  • Treatment group are those who received training in 1964

  • Control group are random sample of population as a whole

Earnings for period 1959 69

Earnings for period 1959-69

Things to note

Things to Note..

  • Earnings for trainees very low in 1964 as training not working in that year – should ignore this year

  • Simple D-in-D approach would compare earnings in 1965 with 1963

  • But earnings of trainees in 1963 seem to show a ‘dip’ – so D-in-D assumption probably not valid

  • Probably because those who enter training are those who had a bad shock (e.g. job loss)

Differences in differences summary


  • A very useful and widespread approach

  • Validity does depend on assumption that trends would have been the same in absence of treatment

  • Can use other periods to see if this assumption is plausible or not

  • Uses 2 observations on same individual – most rudimentary form of panel data

A brief introduction to panel data

A Brief Introduction to Panel Data

  • Panel Data has both time-series and cross-section dimension – N individuals over T periods

  • Will restrict attention to balanced panels – same number of observations on each individuals

  • Whole books written about but basics can be understood very simply and not very different from what we have seen before

  • Asymptotics typically done on large N, small T

  • Use yit to denote variable for individual i at time t

The pooled model

The Pooled Model

  • Can simply ignore panel nature of data and estimate:


  • This will be consistent if E(εit|xit)=0 or plim(X’ ε/N)=0

  • But computed standard errors will only be consistent if errors uncorrelated across observations

  • This is unlikely:

    • Correlation between residuals of same individual in different time periods

    • Correlation between residuals of different individuals in same time period (aggregate shocks)

A more plausible model

A More Plausible Model

  • Should recognise this as model with ‘group-level’ dummies or residuals

  • Here, individual is a ‘group’

Three models

Three Models

  • Fixed Effects Model

    • Treats θi as parameter to be estimated (like β)

    • Consistency does not require anything about correlation with xit

  • Random Effects Model

    • Treats θi as part of residual (like θ)

    • Consistency does require no correlation between θi and xit

  • Between-Groups Model

    • Runs regression on averages for each individual

Proposition 5 2 the fixed effect estimator of will be consistent if

Proposition 5.2The fixed effect estimator of β will be consistent if:

  • E(εit|xit)=0

  • Rank(X,D)=N+K

  • Proof: Simple application of what you should know about linear regression model



  • First condition should be obvious – regressors uncorrelated with residuals

  • Second condition requires regressors to be of full rank

  • Main way in which this is likely to fail in fixed effects model is if some regressors vary only across individuals and not over time

  • Such a variable perfectly multicollinear with individual fixed effect

Estimating the fixed effects model

Estimating the Fixed Effects Model

  • Can estimate by ‘brute force’ - include separate dummy variable for every individual – but may be a lot of them

  • Can also estimate in mean-deviation form:

How does de meaning work

How does de-meaning work?

  • Can do simple OLS on de-meaned variables

  • STATA command is like:

    . xtreg y x, fe i(id)

Problems with fixed effect estimator

Problems with fixed effect estimator

  • Only uses variation within individuals – sometimes called ‘within-group’ estimator

  • This variation may be small part of total (so low precision) and more prone to measurement error (so more attenuation bias)

  • Cannot use it to estimate effect of regressor that is constant for an individual

Random effects estimator

Random Effects Estimator

  • Treats θi as part of residual (like θ)

  • Consistency does require no correlation between θi and xit

  • Should recognise as like model with clustered standard errors

  • But random effects estimator is feasible GLS estimator

More on re estimator

More on RE Estimator

  • Will not describe how we compute Ω-hat – see Wooldridge

  • STATA command

    .xtreg y x, re i(id)

Proposition 5 3 the random effects estimator of will be consistent if

Proposition 5.3The random effects estimator of β will be consistent if:

  • E(εit|xi1,..xit,.. xiT)=0

  • E(θi|xi1,..xit,.. xiT)=0

  • Rank(X’Ω-1X)=k

  • Proof: RE estimator a special case of the feasible GLS estimator so conditions for consistency are the same.

  • Error has two components so need a. and b.



  • Assumption about exogeneity of errors is stronger than for FE model – need to assume εit uncorrelated with whole history of x – this is called strong exogeneity

  • Assumption about rank condition weaker than for FE model e.g. can estimate effect variables that are constant for a given individual

Another reason why may prefer re to fe model

Another reason why may prefer RE to FE model

  • If exogeneity assumptions are satisfied RE estimate will be more efficient than FE estimator

  • Application of general principle that imposing true restriction on data leads to efficiency gain.

Another useful result

Another Useful Result

  • Can show that RE estimator can be thought of as an OLS regression of:

  • On:

  • Where:

  • This is sometimes called quasi-time demeaning

  • See Wooldridge (ch10, pp286-7) if want to know more

Between groups estimator

Between-Groups Estimator

  • This takes individual means and estimates the regression by OLS:

  • Stata command is xtreg y x, be i(id)

  • Condition for consistency the same as for RE estimator

  • But BE estimator less efficient as does not exploit variation in regressors for a given individual

  • And cannot estimate variables like time trends whose average values do not vary across individuals

  • So why would anyone ever use it – lets think about measurement error

Measurement error in panel data models

Measurement Error in Panel Data Models

  • Assume true model is:

  • Where x is one-dimensional

  • Assume E(εit|xi1,..xit,.. xiT)=0 and E(θi|xi1,..xit,.. xiT)=0 so that RE and BE estimators are consistent

Measurement error model

Measurement Error Model

  • Assume:

  • where uit is classical measurement error, x*iis average value of x* for individual i and ηit is variation around the true value which is assumed to be uncorrelated with and uit and iid.

  • We know this measurement error is likely to cause attenuation bias but this will vary between FE, RE and BE estimators.

Proposition 5 4

Proposition 5.4

  • For FE model we have:

  • For BE model we have:

  • For RE model we have:

  • Where:

What should we learn from this

What should we learn from this?

  • All rather complicated – don’t worry too much about details

  • But intuition is simple

  • Attenuation bias largest for FE estimator – Var(x*) does not appear in denominator – FE estimator does not use this variation in data

Differences in differences and a brief introduction to panel data

  • Attenuation bias larger for RE than BE estimator as T>1>κ

  • The averaging in the BE estimator reduces the importance of measurement error.

  • Important to note that these results are dependent on the particular assumption about the measurement error process and the nature of the variation in xit – things would be very different if measurement error for a given individual did not vary over time

  • But general point is the measurement error considerations could affect choice of model to estimate with panel data

Time effects

Time Effects

  • Have treated time and individual dimensions asymmetrically – no good reason for this

  • Errors likely to be correlated for different individuals in same time period – most common way to deal with this is to include set of time dummies:

Estimating fixed effects model in differences

Estimating Fixed Effects Model in Differences

  • Can also get rid of fixed effect by differencing:

Comparison of two methods

Comparison of two methods

  • Estimate parameters by OLS on differenced data

  • If only 2 observations then get same estimates as ‘de-meaning’ method

  • But standard errors different

  • Why?: assumption about autocorrelation in residuals

What are these assumptions

What Are these assumptions?

  • For de-meaned model:

  • For differenced model:

  • These are not consistent:

This leads to time series

This leads to time series…

  • Which is ‘better’ depends on which assumption is right – how can we decide this?

  • We are not going to cover this in this course.

  • Login