Estimating causal effects with experimental data
1 / 61

Estimating Causal Effects with Experimental Data - PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Estimating Causal Effects with Experimental Data. Some Basic Terminology. Start with example where X is binary (though simple to generalize): X=0 is control group X=1 is treatment group Causal effect sometimes called treatment effect

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Estimating Causal Effects with Experimental Data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Estimating Causal Effects with Experimental Data

Some Basic Terminology

  • Start with example where X is binary (though simple to generalize):

    • X=0 is control group

    • X=1 is treatment group

  • Causal effect sometimes called treatment effect

  • Randomization implies everyone has same probability of treatment

Why is Randomization Good?

  • If X allocated at random then know that X is independent of all pre-treatment variables in whole wide world

  • an amazing claim but true.

  • Implies there cannot be a problem of omitted variables, reverse causality etc

  • On average, only reason for difference between treatment and control group is different receipt of treatment

Proposition 2.1:Pre-treatment characteristics must be independent of randomized treatment

  • Proof: Joint distribution of X and W is f(X,W)

  • Can decompose this into:


  • Now random assignment means

    fX│W(X│W)=fX (X)

  • This implies:

    f(X,W)=fX (X)fW(W)

  • This implies X and W independent

Why is this useful?An Example: Racial Discrimination

  • Black men earn less than white men in US

    LOGWAGE | Coef. Std. Err. t


    BLACK | -.1673813 .0066708 -25.09

    NO_HS | -.2138331 .0077192 -27.70

    SOMECOLL | .1104148 .0049139 22.47

    COLLEGE | .4660205 .0048839 95.42

    AGE | .0704488 .0008552 82.38

    AGESQUARED | -.0007227 .0000101 -71.41

    _cons | 1.088116 .0172715 63.00

  • Could be discrimination or other factors unobserved by the researcher but observed by the employer?

  • hard to fully resolve with non-experimental data

An Experimental Design

  • Bertrand/Mullainathan “Are Emily and Greg More Employable Than Lakisha and Jamal”, American Economic Review, 2004

  • Create fake CVs and send replies to job adverts

  • Allocate names at random to CVs – some given ‘black-sounding’ names, others ‘white-sounding’

  • Outcome variable is call-back rates

  • Interpretation – not direct measure of racial discrimination, just effect of having a ‘black-sounding’ name – may have other connotations.

  • But name uncorrelated by construction with other material on CV

The Treatment Effect

  • Want estimate of:

Estimating Treatment Effects: the Statistics Course Approach

  • Take mean of outcome variable in treatment group

  • Take mean of outcome variable in control group

  • Take difference between the two

  • No problems but:

    • Does not generalize to where X is not binary

    • Does not directly compute standard errors

Estimating Treatment Effects: A Regression Approach

  • Run regression:


  • Proposition 2.2 The OLS estimator of β1 is an unbiased estimator of the causal effect of X on y:

  • Proof: Many ways to prove this but simplest way is perhaps:

    • Proposition 1.1 says OLS estimates E(y|X)

    • E(y|X=0)= β0 so OLS estimate of intercept is consistent estimate of E(y│X=0)

    • E(y|X=1)= β0+β1so β1 is consistent estimate of E(y│X=1) -E(y│X=0)

  • Hence can read off estimate of treatment effect from coefficient on X

  • Approach easily generalizes to where X is not binary

  • Also gives estimate of standard error

Computing Standard Errors

  • Unless told otherwise regression package will compute standard errors assuming errors are homoskedastic i.e.

  • Even if only interested in effect of treatment on mean X may affect other aspects of distribution e.g. variance

  • This will cause heteroskedasticity

  • Heteroskedasticity does not make OLS regression coefficients inconsistent but does make OLS standard errors inconsistent

‘Robust’ Standard Errors

  • Also called:

    • Huber standard errors

    • White standard errors

    • Heteroskedastic-consistent standard errors

  • Statistics course approach

    • Get variance of estimate of mean of treatment and control group

    • Sum to give estimate of variance of difference in means

A Regression-Based Approach

  • Can estimate this by using sample equivalents

  • Note that this is same as OLS standard errors if X and ε are independent

Proposition 2.3If εand X are independent the OLS formula for the standard errors will be consistent even if the variance of ε differs across individuals.

  • Proof: If ε and X are independent

  • Putting this in expression for asymptotic variance of OLS estimator:

  • A consistent estimate of the final term is the mean of the squared residuals i.e. usual estimate of σ2

A Regression-Based Approach

  • Have to interpret residual variance differentyl – not common to all individuals but the mean across individuals

  • With one regressor can write robust standard error as:

  • Simple to use in practice e.g. in STATA:

    .reg y x, robust

Bertrand/Mullainathan:Basic Results

Summary So Far

  • Econometrics very easy if all data comes from randomized controlled experiment

  • Just need to collect data on treatment/control and outcome variables

  • Just need to compare means of outcomes of treatment and control groups

  • Is data on other variables of any use at all?

    • Not necessary but useful

Including Other Regressors

  • Can get consistent estimate of treatment effect without worrying about other variables

  • Reason is that randomization ensures no problem of omitted variables bias

  • But there are reasons to include other regressors:

    • Improved efficiency

    • Check for randomization

    • Improve randomization

    • Control for conditional randomization

    • Heterogeneity in treatment effects

The Uses of Other Regressors I: Improved Efficiency

  • Don’t just want consistent estimate of causal effect – also want low standard error (or high precision or efficiency).

  • Standard formula for standard error of OLS estimate of βis σ2(X’X)-1

  • σ2 comes from variance of residual in regression – (1-R2)* Var(y)

Proposition 2.4The asymptotic variance of βˆ is lower when W is included

  • Proof: (Will only do case where X and W are one-dimensional)

  • When W is included variance of the estimate of the treatment effect will by first diagonal element of:

Proof (continued)

  • Now:

  • Using trick from end of notes on causal effects we can write this as:

Proof (continued)

  • Inverting leads to

  • By randomization X and W are independent so:

  • The only difference is in the error variance – this must be smaller when W is included as R2 rises

The Uses of Other Regressors II: Check for Randomization

  • Randomization can go wrong

    • Poor implementation of research design

    • Bad luck

  • If randomization done well then W should be independent of X – this is testable:

    • Test for differences in W in treatment/control groups

    • Probit model for X on W

The Uses of Other Regressors III:Improve Randomization

  • Can also use W at stage of assigning treatment

  • Can guarantee that in your sample X and W are independent instead of it being just probabiliistic

  • This is what Bertrand/Mullainathan do when assigning names to CVs

The Uses of Other Regressors IV:Adjust for Conditional Randomization

  • This is case where must include W to get consistent estimates of treatment effects

  • Conditional randomization is where probability of treatment is different for people with different values of W, but random conditional on W

  • Why have conditional randomization?

    • May have no choice

    • May want to do it (c.f. stratification)

An Example: Project STAR

  • Allocation of students to classes is random within schools

  • But small number of classes per school

  • This leads to following relationship between probability of treatment and number of kids in school:

Controlling for Conditional Randomization

  • X can know be correlated with W

  • But, conditional on W, X independent of other factors

  • But must get functional form of relationship between y and W correct – matching procedures

  • This is not the case with (unconditional) randomization – see class exercize

Heterogeneity in Treatment Effects

  • So far have assumed causal (treatment) effect the same for everyone

  • No good reason to believe this

  • Start with case of no other regressors:


  • Random assignment implies X independent of β1i

  • Sometimes called random coefficients model

What treatment effect to estimate?

  • Would like to estimate causal effect for everyone – this is not possible

  • Can only hope to estimate some average

  • Average treatment effect:

Proposition 2.5OLS estimates ATE

  • Proof for single regressor:

Observable Heterogeneity

  • Full outcomes notation:

    • Outcome if in control group:


    • Outcome if in treatment group:


  • Treatment effect is (y1i-y0i) and can be written as:

    (y1i-y0i )=(γ1- γ0 )’Wi+u1i-u0i

  • Note treatment effect has observable and unobservable component

  • Can estimate as:

    • Two separate equations

    • One single equation

Combining treatment and control groups into single regression

  • We can write:

  • Combining outcomes equations leads to:

  • Regression includes W and interactions of W with X – these are observable part of treatment effect

  • Note: error likely to be heteroskedastic


  • Different treatment effect for high and low quality CVs:

Units of Measurement

  • Causal effect measured in units of ‘experiment’ – not very helpful

  • Often want to convert causal effects to more meaningful units e.g. in Project STAR what is effect of reducing class size by one child

Simple estimator of this would be:

  • where S is class size

  • Takes the treatment effect on outcome variable and divides by treatment effect on class size

  • Not hard to compute but how to get standard error?

IV Can Do the Job

  • Can’t run regression of y on S – S influenced by factors other than treatment status

  • But X is:

    • Correlated with S

    • Uncorrelated with unobserved stuff (because of randomization)

  • Hence X can be used as an instrument for S

  • IV estimator has form (just-identified case):

The Wald Estimator

  • This will give estimate of standard error of treatment effect

  • Where instrument is binary and no other regressors included the IV estimate of slope coefficient can be shown to be:

Partial Compliance

  • So far:

    • in control group implies no treatment

    • In treatment group implies get treatment

  • Often things are not as clean as this

    • Treatment is an opportunity

    • Close substitutes available to those in control group

    • Implementation not perfect e.g. pushy parents

An Example: Moving to Opportunity

  • Designed to investigate the impact of living in bad neighbourhoods on outcomes

  • Gave some residents of public housing projects chance to move out

  • Two treatments:

    • Voucher for private rental housing

    • Voucher for private rental housing restricted for use in ‘good’ neighbourhoods

  • No-one forced to move so imperfect compliance – 60% and 40% did use it

Some Terminology

  • Z denotes whether in control or treatment group – ‘intention-to-treat’

  • X denotes whether actually get treatment

  • With perfect compliance:

    • Pr(X=1│Z=1)=1

    • Pr(X=1│Z=0)=0

  • With imperfect compliance:


What Do We Want to Estimate?

  • ‘Intention-to-Treat’:


  • This can be estimated in usual way

  • Treatment Effect on Treated

Estimating TOT

  • Can’t use simple regression of y on Z

  • But should recognize TOT as Wald estimator

  • Can estimated by regressing y on X using Z as instrument

  • Relationship between TOT and ITT:

Most Important Results from MTO

  • No effects on adult economic outcomes

  • Improvements in adult mental health

  • Beneficial outcomes for teenage girls

  • Adverse outcomes for teenage boys

Sample results from MTO

  • TOT approximately twice the size of ITT

  • Consistent with 50% use of vouchers

IV with Heterogeneous Treatment Effects

  • If treatment effect same for everyone then TOT recovers this (obvious)

  • But what if treatment effect heterogeneous?

  • No simple answer to this question

  • Suppose model for treatment effect is:

Proposition 2.6The IV estimate for the heterogeneous treatment case is a consistent estimate of:where:the difference in the probability of treatment for individual i when in treatment and control group


  • Model for effect of intention to treat on being treated:

Proof (continued)

  • Can write ‘reduced-form’ as:

  • Wald estimator then becomes:

  • As:

Hence Wald estimator can be thought of as estimator as:

  • This is weighted average of treatment effects

  • ‘weights’ will vary with instrument – contrast with heterogeneous treatment case

  • Some cases in which can interpret IV estimate as ATE

Proposition 2.7: IV estimate is ATE if:a. no heterogeneity in treatment effectb. β1i uncorrelated with πi

  • Proof:

  • A.This should be obvious as:

  • B. Can write as:

How will IV estimate differ from ATE

  • Previous formula says depends on covariance of β1i and πi

  • In some situations can sign – but not always

  • Example 1: no-one gets treatment in the absence of the programme so

  • If those who get treatment when in the treatment group are those with the highest returns then:

  • IV>ATE

  • Example 2: treatment is voluntary for those in the control group but compulsory for those in the treatment group

  • This implies

  • If those who get treatment in control are those with highest returns then:

  • IV<ATE

Angrist/Imbens Monotonicity Assumption

  • Case where IV estimate is not ATE

  • Assume that everyone moved in same direction by treatment – monotonicity assumption

  • Then can show that IV is average of treatment effect for those whose behaviour changed by being in treatment group

  • They call this the Local Average Treatment Effect (LATE)

Spill-overs/Externalities/General Equilibrium Effects

  • Have assumed that treatment only affects outcome for person for receives it

  • Many situations in which this is not true

  • E.g. externalities, spill-overs, effects on market prices

  • Example: Miguel and Kremer, “Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities”, Econometrica 2004


  • Infection from intestinal worms is rife among Kenyan schoolchildren

  • Major cause of school absence

  • Leads to lower human capital accumulation, lower growth?

  • Investigation of effectiveness of anti-worming drugs on health, education

Existing studies

  • Randomize drug treatment within schools

  • But probability of re-infection affected by infection rate among contacts I.e. externalities very likely

  • This research design will not capture these effects

  • To see this, consider model:

Miguel/Kremer Methodology

  • Existing methodology cannot measure externality only individual effect

  • Randomize treatment across schools not individuals

  • This can identify ß1+ ß2

  • Could have had design in which randomized proportion of individuals within schools getting treatment

Typical Result

  • Cannot separate externality from direct effect – but this is important for public policy

  • Have non-experimental approach to this – using fact that not all kids from same village go to same school

  • This gives variation in X

Some examples of how they do this:

  • Include number of kids in local area who are in treatment schools

Problems with Experiments

  • Expense

  • Ethical Issues

  • Threats to Internal Validity

    • Failure to follow experiment

    • Experimental effects (Hawthorne effects)

  • Threats to External Validity

    • Non-representative programme

    • Non-representative sample

    • Scale effects

Conclusions on Experiments

  • Are ‘gold standard’ of empirical research

  • Are becoming more common

  • Not enough of them to keep us busy

  • Study of non-experimental data can deliver useful knowledge

  • Some issues similar, others different

  • Login