1 / 61

# Estimating Causal Effects with Experimental Data - PowerPoint PPT Presentation

Estimating Causal Effects with Experimental Data. Some Basic Terminology. Start with example where X is binary (though simple to generalize): X=0 is control group X=1 is treatment group Causal effect sometimes called treatment effect

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## PowerPoint Slideshow about 'Estimating Causal Effects with Experimental Data' - avian

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Estimating Causal Effects with Experimental Data

• Start with example where X is binary (though simple to generalize):

• X=0 is control group

• X=1 is treatment group

• Causal effect sometimes called treatment effect

• Randomization implies everyone has same probability of treatment

• If X allocated at random then know that X is independent of all pre-treatment variables in whole wide world

• an amazing claim but true.

• Implies there cannot be a problem of omitted variables, reverse causality etc

• On average, only reason for difference between treatment and control group is different receipt of treatment

Proposition 2.1:Pre-treatment characteristics must be independent of randomized treatment

• Proof: Joint distribution of X and W is f(X,W)

• Can decompose this into:

f(X,W)=fX│W(X│W)fW(W)

• Now random assignment means

fX│W(X│W)=fX (X)

• This implies:

f(X,W)=fX (X)fW(W)

• This implies X and W independent

Why is this useful?An Example: Racial Discrimination

• Black men earn less than white men in US

LOGWAGE | Coef. Std. Err. t

-----------+-------------------------------

BLACK | -.1673813 .0066708 -25.09

NO_HS | -.2138331 .0077192 -27.70

SOMECOLL | .1104148 .0049139 22.47

COLLEGE | .4660205 .0048839 95.42

AGE | .0704488 .0008552 82.38

AGESQUARED | -.0007227 .0000101 -71.41

_cons | 1.088116 .0172715 63.00

• Could be discrimination or other factors unobserved by the researcher but observed by the employer?

• hard to fully resolve with non-experimental data

• Bertrand/Mullainathan “Are Emily and Greg More Employable Than Lakisha and Jamal”, American Economic Review, 2004

• Create fake CVs and send replies to job adverts

• Allocate names at random to CVs – some given ‘black-sounding’ names, others ‘white-sounding’

• Outcome variable is call-back rates

• Interpretation – not direct measure of racial discrimination, just effect of having a ‘black-sounding’ name – may have other connotations.

• But name uncorrelated by construction with other material on CV

• Want estimate of:

• Take mean of outcome variable in treatment group

• Take mean of outcome variable in control group

• Take difference between the two

• No problems but:

• Does not generalize to where X is not binary

• Does not directly compute standard errors

• Run regression:

yi=β0+β1Xi+εi

• Proposition 2.2 The OLS estimator of β1 is an unbiased estimator of the causal effect of X on y:

• Proof: Many ways to prove this but simplest way is perhaps:

• Proposition 1.1 says OLS estimates E(y|X)

• E(y|X=0)= β0 so OLS estimate of intercept is consistent estimate of E(y│X=0)

• E(y|X=1)= β0+β1so β1 is consistent estimate of E(y│X=1) -E(y│X=0)

• Hence can read off estimate of treatment effect from coefficient on X

• Approach easily generalizes to where X is not binary

• Also gives estimate of standard error

• Unless told otherwise regression package will compute standard errors assuming errors are homoskedastic i.e.

• Even if only interested in effect of treatment on mean X may affect other aspects of distribution e.g. variance

• This will cause heteroskedasticity

• Heteroskedasticity does not make OLS regression coefficients inconsistent but does make OLS standard errors inconsistent

• Also called:

• Huber standard errors

• White standard errors

• Heteroskedastic-consistent standard errors

• Statistics course approach

• Get variance of estimate of mean of treatment and control group

• Sum to give estimate of variance of difference in means

• Can estimate this by using sample equivalents

• Note that this is same as OLS standard errors if X and ε are independent

Proposition 2.3If εand X are independent the OLS formula for the standard errors will be consistent even if the variance of ε differs across individuals.

• Proof: If ε and X are independent

• Putting this in expression for asymptotic variance of OLS estimator:

• A consistent estimate of the final term is the mean of the squared residuals i.e. usual estimate of σ2

• Have to interpret residual variance differentyl – not common to all individuals but the mean across individuals

• With one regressor can write robust standard error as:

• Simple to use in practice e.g. in STATA:

. reg y x, robust

• Econometrics very easy if all data comes from randomized controlled experiment

• Just need to collect data on treatment/control and outcome variables

• Just need to compare means of outcomes of treatment and control groups

• Is data on other variables of any use at all?

• Not necessary but useful

• Can get consistent estimate of treatment effect without worrying about other variables

• Reason is that randomization ensures no problem of omitted variables bias

• But there are reasons to include other regressors:

• Improved efficiency

• Check for randomization

• Improve randomization

• Control for conditional randomization

• Heterogeneity in treatment effects

• Don’t just want consistent estimate of causal effect – also want low standard error (or high precision or efficiency).

• Standard formula for standard error of OLS estimate of βis σ2(X’X)-1

• σ2 comes from variance of residual in regression – (1-R2)* Var(y)

Proposition 2.4The asymptotic variance of βˆ is lower when W is included

• Proof: (Will only do case where X and W are one-dimensional)

• When W is included variance of the estimate of the treatment effect will by first diagonal element of:

• Now:

• Using trick from end of notes on causal effects we can write this as:

• Inverting leads to

• By randomization X and W are independent so:

• The only difference is in the error variance – this must be smaller when W is included as R2 rises

• Randomization can go wrong

• Poor implementation of research design

• Bad luck

• If randomization done well then W should be independent of X – this is testable:

• Test for differences in W in treatment/control groups

• Probit model for X on W

The Uses of Other Regressors III:Improve Randomization

• Can also use W at stage of assigning treatment

• Can guarantee that in your sample X and W are independent instead of it being just probabiliistic

• This is what Bertrand/Mullainathan do when assigning names to CVs

The Uses of Other Regressors IV:Adjust for Conditional Randomization

• This is case where must include W to get consistent estimates of treatment effects

• Conditional randomization is where probability of treatment is different for people with different values of W, but random conditional on W

• Why have conditional randomization?

• May have no choice

• May want to do it (c.f. stratification)

• Allocation of students to classes is random within schools

• But small number of classes per school

• This leads to following relationship between probability of treatment and number of kids in school:

• X can know be correlated with W

• But, conditional on W, X independent of other factors

• But must get functional form of relationship between y and W correct – matching procedures

• This is not the case with (unconditional) randomization – see class exercize

• So far have assumed causal (treatment) effect the same for everyone

• No good reason to believe this

• Start with case of no other regressors:

yi=β0+β1iXi+εi

• Random assignment implies X independent of β1i

• Sometimes called random coefficients model

• Would like to estimate causal effect for everyone – this is not possible

• Can only hope to estimate some average

• Average treatment effect:

Proposition 2.5OLS estimates ATE

• Proof for single regressor:

• Full outcomes notation:

• Outcome if in control group:

y0i=γ0’Wi+u0i

• Outcome if in treatment group:

y1i=γ1’Wi+u1i

• Treatment effect is (y1i-y0i) and can be written as:

(y1i-y0i )=(γ1- γ0 )’Wi+u1i-u0i

• Note treatment effect has observable and unobservable component

• Can estimate as:

• Two separate equations

• One single equation

• We can write:

• Combining outcomes equations leads to:

• Regression includes W and interactions of W with X – these are observable part of treatment effect

• Note: error likely to be heteroskedastic

Bertrand/Mullainathan regression

• Different treatment effect for high and low quality CVs:

Units of Measurement regression

• Causal effect measured in units of ‘experiment’ – not very helpful

• Often want to convert causal effects to more meaningful units e.g. in Project STAR what is effect of reducing class size by one child

Simple estimator of this would be: regression

• where S is class size

• Takes the treatment effect on outcome variable and divides by treatment effect on class size

• Not hard to compute but how to get standard error?

IV Can Do the Job regression

• Can’t run regression of y on S – S influenced by factors other than treatment status

• But X is:

• Correlated with S

• Uncorrelated with unobserved stuff (because of randomization)

• Hence X can be used as an instrument for S

• IV estimator has form (just-identified case):

The Wald Estimator regression

• This will give estimate of standard error of treatment effect

• Where instrument is binary and no other regressors included the IV estimate of slope coefficient can be shown to be:

Partial Compliance regression

• So far:

• in control group implies no treatment

• In treatment group implies get treatment

• Often things are not as clean as this

• Treatment is an opportunity

• Close substitutes available to those in control group

• Implementation not perfect e.g. pushy parents

An Example: Moving to Opportunity regression

• Designed to investigate the impact of living in bad neighbourhoods on outcomes

• Gave some residents of public housing projects chance to move out

• Two treatments:

• Voucher for private rental housing

• Voucher for private rental housing restricted for use in ‘good’ neighbourhoods

• No-one forced to move so imperfect compliance – 60% and 40% did use it

Some Terminology regression

• Z denotes whether in control or treatment group – ‘intention-to-treat’

• X denotes whether actually get treatment

• With perfect compliance:

• Pr(X=1│Z=1)=1

• Pr(X=1│Z=0)=0

• With imperfect compliance:

1>Pr(X=1│Z=1)>Pr(X=1│Z=0)>0

What Do We Want to Estimate? regression

• ‘Intention-to-Treat’:

ITT=E(y|Z=1)-E(y|Z=0)

• This can be estimated in usual way

• Treatment Effect on Treated

Estimating TOT regression

• Can’t use simple regression of y on Z

• But should recognize TOT as Wald estimator

• Can estimated by regressing y on X using Z as instrument

• Relationship between TOT and ITT:

Most Important Results from MTO regression

• No effects on adult economic outcomes

• Improvements in adult mental health

• Beneficial outcomes for teenage girls

• Adverse outcomes for teenage boys

Sample results from MTO regression

• TOT approximately twice the size of ITT

• Consistent with 50% use of vouchers

IV with Heterogeneous Treatment Effects regression

• If treatment effect same for everyone then TOT recovers this (obvious)

• But what if treatment effect heterogeneous?

• No simple answer to this question

• Suppose model for treatment effect is:

Proposition 2.6 regressionThe IV estimate for the heterogeneous treatment case is a consistent estimate of:where:the difference in the probability of treatment for individual i when in treatment and control group

Proof regression

• Model for effect of intention to treat on being treated:

Proof (continued) regression

• Can write ‘reduced-form’ as:

• Wald estimator then becomes:

• As:

• This is weighted average of treatment effects

• ‘weights’ will vary with instrument – contrast with heterogeneous treatment case

• Some cases in which can interpret IV estimate as ATE

Proposition 2.7: IV estimate is ATE if: regression a. no heterogeneity in treatment effectb. β1i uncorrelated with πi

• Proof:

• A. This should be obvious as:

• B. Can write as:

How will IV estimate differ from ATE regression

• Previous formula says depends on covariance of β1i and πi

• In some situations can sign – but not always

• Example 1: no-one gets treatment in the absence of the programme so

• If those who get treatment when in the treatment group are those with the highest returns then:

• IV>ATE

• Example 2: regressiontreatment is voluntary for those in the control group but compulsory for those in the treatment group

• This implies

• If those who get treatment in control are those with highest returns then:

• IV<ATE

Angrist/Imbens Monotonicity Assumption regression

• Case where IV estimate is not ATE

• Assume that everyone moved in same direction by treatment – monotonicity assumption

• Then can show that IV is average of treatment effect for those whose behaviour changed by being in treatment group

• They call this the Local Average Treatment Effect (LATE)

Spill-overs/Externalities regression/General Equilibrium Effects

• Have assumed that treatment only affects outcome for person for receives it

• Many situations in which this is not true

• E.g. externalities, spill-overs, effects on market prices

• Example: Miguel and Kremer, “Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities”, Econometrica 2004

Background regression

• Infection from intestinal worms is rife among Kenyan schoolchildren

• Major cause of school absence

• Leads to lower human capital accumulation, lower growth?

• Investigation of effectiveness of anti-worming drugs on health, education

Existing studies regression

• Randomize drug treatment within schools

• But probability of re-infection affected by infection rate among contacts I.e. externalities very likely

• This research design will not capture these effects

• To see this, consider model:

Miguel/Kremer Methodology regression

• Existing methodology cannot measure externality only individual effect

• Randomize treatment across schools not individuals

• This can identify ß1+ ß2

• Could have had design in which randomized proportion of individuals within schools getting treatment

Typical Result regression

• Cannot separate externality from direct effect – but this is important for public policy

• Have non-experimental approach to this – using fact that not all kids from same village go to same school

• This gives variation in X

Some examples of how they do this: regression

• Include number of kids in local area who are in treatment schools

Problems with Experiments regression

• Expense

• Ethical Issues

• Threats to Internal Validity

• Failure to follow experiment

• Experimental effects (Hawthorne effects)

• Threats to External Validity

• Non-representative programme

• Non-representative sample

• Scale effects

Conclusions on Experiments regression

• Are ‘gold standard’ of empirical research

• Are becoming more common

• Not enough of them to keep us busy

• Study of non-experimental data can deliver useful knowledge

• Some issues similar, others different