Difference-in-Difference Development Workshop
Typical problem in proving causal effects • Using differences to estimate causal effects in experimental data (treatment+control groups) • Wish: ‘treatment’ and ‘control’ group can be assumed to be similar in every way except receipt of treatment • This may be very difficult to do
A Weaker Assumption is.. • In absence of treatment, difference between ‘treatment’ and ‘control’ group is constant over time • With this assumption can use observations on treatment and control group pre- and post-treatment to estimate causal effect • Idea • Difference pre-treatment is ‘normal’ difference • Difference post-treatment is ‘normal’ difference + causal effect • Difference-in-difference is causal effect
A Treatment y C B Control Pre- Post- Time Graphically…
What is D-in-D estimate? • Standard differences estimator is AB • But ‘normal’ difference estimated as CB • Hence D-in-D estimate is AC • Note: assumes trends in outcome variables the same for treatment and control groups • This is not testable • Two periods (before and after) crucial
The Grand Experiment (Snow) • Water supplied to households by competing private companies • Sometimes different companies supplied households in same street • In south London two main companies: • Lambeth Company (water supply from Thames Ditton, 22 miles upstream) • Southwark and Vauxhall Company (water supply from Thames)
In 1853/54 cholera outbreak • Death Rates per 10000 people by water company • Lambeth 10 • Southwark and Vauxhall 150 • Might be water but perhaps other factors • Snow compared death rates in 1849 epidemic • Lambeth 150 • Southwark and Vauxhall 125 • In 1852 Lambeth Company had changed supply from Hungerford Bridge
Card and Krueger (1994) • Basic microeconomic theory of the firm: factor demand curves slope downwards. • Hence, if minimum wages arebinding, we would expect employment to fall if minimum wage is raised. • Natural experiment: New Jersey raising its minimum wage from $4.25 to$5.05 on 1 April 1992 while the minimum wage in neighbouring Pennsylvaniaremained unchanged. • Data: wages and employment in 65 fast-foodrestaurants in Pennsylvania and 284 in New Jersey in Feb/March 1992 (i.e. before therise in the NJ minimum wage) and in Nov/Dec 1992 (i.e. after the rise). • Difference-in-difference design to investigate the impact of minimum wages onemployment.
What data we have? • 698 observations • Sheet: an identifier for each restaurant (each has two observations, pre- and post-) • NJ: dummy for whether a NJ restaurant • After: dummy for whether post- observation • Njafter: nj*after • Fte: full-time equivalent employment • Dfte: change in full-time equivalent employment
Tabulate command • Tabulate in STATA: • tabulate var (or tab var) – just a simple table • tab var, g(newvar) – generating a new variable • tab var, su(othervar) – summarising some other variable
Let’s get our first DinD estimator • tabulate nj after, su(fte) means Whyisnt’ thisenough?
Going from means to statistics • reg dfte nj
… and with robust standard errors • reg dfte nj • reg dfte nj, robust
An alternative specification … • reg fte nj after njafter, robust Soit’s not „n” (dof)…
Alternative specifications… • reg fte nj after njafter, cl(sheet) • xtreg fte nj after njafter, fe i(sheet) • Any key differences? • Should there be any?
Suppose we’d like to observe many estimations • STATA commands for results-sets • Guy named Roger Newson • estimates store • outreg (works mostly with regressions) • parmest/parmby
Summary • A very useful and widespread approach • Validity does depend on assumption that trends would have been the same in absence of treatment • Can use other periods to see if this assumption is plausible or not • Uses 2 observations on same individual – most rudimentary form of panel data