Estimating causal effects with experimental data
Sponsored Links
This presentation is the property of its rightful owner.
1 / 61

Estimating Causal Effects with Experimental Data PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Estimating Causal Effects with Experimental Data. Some Basic Terminology. Start with example where X is binary (though simple to generalize): X=0 is control group X=1 is treatment group Causal effect sometimes called treatment effect

Download Presentation

Estimating Causal Effects with Experimental Data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Estimating causal effects with experimental data

Estimating Causal Effects with Experimental Data

Some basic terminology

Some Basic Terminology

  • Start with example where X is binary (though simple to generalize):

    • X=0 is control group

    • X=1 is treatment group

  • Causal effect sometimes called treatment effect

  • Randomization implies everyone has same probability of treatment

Why is randomization good

Why is Randomization Good?

  • If X allocated at random then know that X is independent of all pre-treatment variables in whole wide world

  • an amazing claim but true.

  • Implies there cannot be a problem of omitted variables, reverse causality etc

  • On average, only reason for difference between treatment and control group is different receipt of treatment

Proposition 2 1 pre treatment characteristics must be independent of randomized treatment

Proposition 2.1:Pre-treatment characteristics must be independent of randomized treatment

  • Proof: Joint distribution of X and W is f(X,W)

  • Can decompose this into:


  • Now random assignment means

    fX│W(X│W)=fX (X)

  • This implies:

    f(X,W)=fX (X)fW(W)

  • This implies X and W independent

Why is this useful an example racial discrimination

Why is this useful?An Example: Racial Discrimination

  • Black men earn less than white men in US

    LOGWAGE | Coef. Std. Err. t


    BLACK | -.1673813 .0066708 -25.09

    NO_HS | -.2138331 .0077192 -27.70

    SOMECOLL | .1104148 .0049139 22.47

    COLLEGE | .4660205 .0048839 95.42

    AGE | .0704488 .0008552 82.38

    AGESQUARED | -.0007227 .0000101 -71.41

    _cons | 1.088116 .0172715 63.00

  • Could be discrimination or other factors unobserved by the researcher but observed by the employer?

  • hard to fully resolve with non-experimental data

An experimental design

An Experimental Design

  • Bertrand/Mullainathan “Are Emily and Greg More Employable Than Lakisha and Jamal”, American Economic Review, 2004

  • Create fake CVs and send replies to job adverts

  • Allocate names at random to CVs – some given ‘black-sounding’ names, others ‘white-sounding’

Estimating causal effects with experimental data

  • Outcome variable is call-back rates

  • Interpretation – not direct measure of racial discrimination, just effect of having a ‘black-sounding’ name – may have other connotations.

  • But name uncorrelated by construction with other material on CV

The treatment effect

The Treatment Effect

  • Want estimate of:

Estimating treatment effects the statistics course approach

Estimating Treatment Effects: the Statistics Course Approach

  • Take mean of outcome variable in treatment group

  • Take mean of outcome variable in control group

  • Take difference between the two

  • No problems but:

    • Does not generalize to where X is not binary

    • Does not directly compute standard errors

Estimating treatment effects a regression approach

Estimating Treatment Effects: A Regression Approach

  • Run regression:


  • Proposition 2.2 The OLS estimator of β1 is an unbiased estimator of the causal effect of X on y:

  • Proof: Many ways to prove this but simplest way is perhaps:

    • Proposition 1.1 says OLS estimates E(y|X)

    • E(y|X=0)= β0 so OLS estimate of intercept is consistent estimate of E(y│X=0)

    • E(y|X=1)= β0+β1so β1 is consistent estimate of E(y│X=1) -E(y│X=0)

  • Hence can read off estimate of treatment effect from coefficient on X

  • Approach easily generalizes to where X is not binary

  • Also gives estimate of standard error

Computing standard errors

Computing Standard Errors

  • Unless told otherwise regression package will compute standard errors assuming errors are homoskedastic i.e.

  • Even if only interested in effect of treatment on mean X may affect other aspects of distribution e.g. variance

  • This will cause heteroskedasticity

  • Heteroskedasticity does not make OLS regression coefficients inconsistent but does make OLS standard errors inconsistent

Robust standard errors

‘Robust’ Standard Errors

  • Also called:

    • Huber standard errors

    • White standard errors

    • Heteroskedastic-consistent standard errors

  • Statistics course approach

    • Get variance of estimate of mean of treatment and control group

    • Sum to give estimate of variance of difference in means

A regression based approach

A Regression-Based Approach

  • Can estimate this by using sample equivalents

  • Note that this is same as OLS standard errors if X and ε are independent

Estimating causal effects with experimental data

Proposition 2.3If εand X are independent the OLS formula for the standard errors will be consistent even if the variance of ε differs across individuals.

  • Proof: If ε and X are independent

  • Putting this in expression for asymptotic variance of OLS estimator:

  • A consistent estimate of the final term is the mean of the squared residuals i.e. usual estimate of σ2

A regression based approach1

A Regression-Based Approach

  • Have to interpret residual variance differentyl – not common to all individuals but the mean across individuals

  • With one regressor can write robust standard error as:

  • Simple to use in practice e.g. in STATA:

    .reg y x, robust

Bertrand mullainathan basic results

Bertrand/Mullainathan:Basic Results

Summary so far

Summary So Far

  • Econometrics very easy if all data comes from randomized controlled experiment

  • Just need to collect data on treatment/control and outcome variables

  • Just need to compare means of outcomes of treatment and control groups

  • Is data on other variables of any use at all?

    • Not necessary but useful

Including other regressors

Including Other Regressors

  • Can get consistent estimate of treatment effect without worrying about other variables

  • Reason is that randomization ensures no problem of omitted variables bias

  • But there are reasons to include other regressors:

    • Improved efficiency

    • Check for randomization

    • Improve randomization

    • Control for conditional randomization

    • Heterogeneity in treatment effects

The uses of other regressors i improved efficiency

The Uses of Other Regressors I: Improved Efficiency

  • Don’t just want consistent estimate of causal effect – also want low standard error (or high precision or efficiency).

  • Standard formula for standard error of OLS estimate of βis σ2(X’X)-1

  • σ2 comes from variance of residual in regression – (1-R2)* Var(y)

Proposition 2 4 the asymptotic variance of is lower when w is included

Proposition 2.4The asymptotic variance of βˆ is lower when W is included

  • Proof: (Will only do case where X and W are one-dimensional)

  • When W is included variance of the estimate of the treatment effect will by first diagonal element of:

Proof continued

Proof (continued)

  • Now:

  • Using trick from end of notes on causal effects we can write this as:

Proof continued1

Proof (continued)

  • Inverting leads to

  • By randomization X and W are independent so:

  • The only difference is in the error variance – this must be smaller when W is included as R2 rises

The uses of other regressors ii check for randomization

The Uses of Other Regressors II: Check for Randomization

  • Randomization can go wrong

    • Poor implementation of research design

    • Bad luck

  • If randomization done well then W should be independent of X – this is testable:

    • Test for differences in W in treatment/control groups

    • Probit model for X on W

The uses of other regressors iii improve randomization

The Uses of Other Regressors III:Improve Randomization

  • Can also use W at stage of assigning treatment

  • Can guarantee that in your sample X and W are independent instead of it being just probabiliistic

  • This is what Bertrand/Mullainathan do when assigning names to CVs

The uses of other regressors iv adjust for conditional randomization

The Uses of Other Regressors IV:Adjust for Conditional Randomization

  • This is case where must include W to get consistent estimates of treatment effects

  • Conditional randomization is where probability of treatment is different for people with different values of W, but random conditional on W

  • Why have conditional randomization?

    • May have no choice

    • May want to do it (c.f. stratification)

An example project star

An Example: Project STAR

  • Allocation of students to classes is random within schools

  • But small number of classes per school

  • This leads to following relationship between probability of treatment and number of kids in school:

Controlling for conditional randomization

Controlling for Conditional Randomization

  • X can know be correlated with W

  • But, conditional on W, X independent of other factors

  • But must get functional form of relationship between y and W correct – matching procedures

  • This is not the case with (unconditional) randomization – see class exercize

Heterogeneity in treatment effects

Heterogeneity in Treatment Effects

  • So far have assumed causal (treatment) effect the same for everyone

  • No good reason to believe this

  • Start with case of no other regressors:


  • Random assignment implies X independent of β1i

  • Sometimes called random coefficients model

What treatment effect to estimate

What treatment effect to estimate?

  • Would like to estimate causal effect for everyone – this is not possible

  • Can only hope to estimate some average

  • Average treatment effect:

Proposition 2 5 ols estimates ate

Proposition 2.5OLS estimates ATE

  • Proof for single regressor:

Observable heterogeneity

Observable Heterogeneity

  • Full outcomes notation:

    • Outcome if in control group:


    • Outcome if in treatment group:


  • Treatment effect is (y1i-y0i) and can be written as:

    (y1i-y0i )=(γ1- γ0 )’Wi+u1i-u0i

  • Note treatment effect has observable and unobservable component

  • Can estimate as:

    • Two separate equations

    • One single equation

Combining treatment and control groups into single regression

Combining treatment and control groups into single regression

  • We can write:

  • Combining outcomes equations leads to:

  • Regression includes W and interactions of W with X – these are observable part of treatment effect

  • Note: error likely to be heteroskedastic

Bertrand mullainathan


  • Different treatment effect for high and low quality CVs:

Units of measurement

Units of Measurement

  • Causal effect measured in units of ‘experiment’ – not very helpful

  • Often want to convert causal effects to more meaningful units e.g. in Project STAR what is effect of reducing class size by one child

Simple estimator of this would be

Simple estimator of this would be:

  • where S is class size

  • Takes the treatment effect on outcome variable and divides by treatment effect on class size

  • Not hard to compute but how to get standard error?

Iv can do the job

IV Can Do the Job

  • Can’t run regression of y on S – S influenced by factors other than treatment status

  • But X is:

    • Correlated with S

    • Uncorrelated with unobserved stuff (because of randomization)

  • Hence X can be used as an instrument for S

  • IV estimator has form (just-identified case):

The wald estimator

The Wald Estimator

  • This will give estimate of standard error of treatment effect

  • Where instrument is binary and no other regressors included the IV estimate of slope coefficient can be shown to be:

Partial compliance

Partial Compliance

  • So far:

    • in control group implies no treatment

    • In treatment group implies get treatment

  • Often things are not as clean as this

    • Treatment is an opportunity

    • Close substitutes available to those in control group

    • Implementation not perfect e.g. pushy parents

An example moving to opportunity

An Example: Moving to Opportunity

  • Designed to investigate the impact of living in bad neighbourhoods on outcomes

  • Gave some residents of public housing projects chance to move out

  • Two treatments:

    • Voucher for private rental housing

    • Voucher for private rental housing restricted for use in ‘good’ neighbourhoods

  • No-one forced to move so imperfect compliance – 60% and 40% did use it

Some terminology

Some Terminology

  • Z denotes whether in control or treatment group – ‘intention-to-treat’

  • X denotes whether actually get treatment

  • With perfect compliance:

    • Pr(X=1│Z=1)=1

    • Pr(X=1│Z=0)=0

  • With imperfect compliance:


What do we want to estimate

What Do We Want to Estimate?

  • ‘Intention-to-Treat’:


  • This can be estimated in usual way

  • Treatment Effect on Treated

Estimating tot

Estimating TOT

  • Can’t use simple regression of y on Z

  • But should recognize TOT as Wald estimator

  • Can estimated by regressing y on X using Z as instrument

  • Relationship between TOT and ITT:

Most important results from mto

Most Important Results from MTO

  • No effects on adult economic outcomes

  • Improvements in adult mental health

  • Beneficial outcomes for teenage girls

  • Adverse outcomes for teenage boys

Sample results from mto

Sample results from MTO

  • TOT approximately twice the size of ITT

  • Consistent with 50% use of vouchers

Iv with heterogeneous treatment effects

IV with Heterogeneous Treatment Effects

  • If treatment effect same for everyone then TOT recovers this (obvious)

  • But what if treatment effect heterogeneous?

  • No simple answer to this question

  • Suppose model for treatment effect is:

Estimating causal effects with experimental data

Proposition 2.6The IV estimate for the heterogeneous treatment case is a consistent estimate of:where:the difference in the probability of treatment for individual i when in treatment and control group



  • Model for effect of intention to treat on being treated:

Proof continued2

Proof (continued)

  • Can write ‘reduced-form’ as:

  • Wald estimator then becomes:

  • As:

Hence wald estimator can be thought of as estimator as

Hence Wald estimator can be thought of as estimator as:

  • This is weighted average of treatment effects

  • ‘weights’ will vary with instrument – contrast with heterogeneous treatment case

  • Some cases in which can interpret IV estimate as ATE

Estimating causal effects with experimental data

Proposition 2.7: IV estimate is ATE if:a. no heterogeneity in treatment effectb. β1i uncorrelated with πi

  • Proof:

  • A.This should be obvious as:

  • B. Can write as:

How will iv estimate differ from ate

How will IV estimate differ from ATE

  • Previous formula says depends on covariance of β1i and πi

  • In some situations can sign – but not always

  • Example 1: no-one gets treatment in the absence of the programme so

  • If those who get treatment when in the treatment group are those with the highest returns then:

  • IV>ATE

Estimating causal effects with experimental data

  • Example 2: treatment is voluntary for those in the control group but compulsory for those in the treatment group

  • This implies

  • If those who get treatment in control are those with highest returns then:

  • IV<ATE

Angrist imbens monotonicity assumption

Angrist/Imbens Monotonicity Assumption

  • Case where IV estimate is not ATE

  • Assume that everyone moved in same direction by treatment – monotonicity assumption

  • Then can show that IV is average of treatment effect for those whose behaviour changed by being in treatment group

  • They call this the Local Average Treatment Effect (LATE)

Spill overs externalities general equilibrium effects

Spill-overs/Externalities/General Equilibrium Effects

  • Have assumed that treatment only affects outcome for person for receives it

  • Many situations in which this is not true

  • E.g. externalities, spill-overs, effects on market prices

  • Example: Miguel and Kremer, “Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities”, Econometrica 2004



  • Infection from intestinal worms is rife among Kenyan schoolchildren

  • Major cause of school absence

  • Leads to lower human capital accumulation, lower growth?

  • Investigation of effectiveness of anti-worming drugs on health, education

Existing studies

Existing studies

  • Randomize drug treatment within schools

  • But probability of re-infection affected by infection rate among contacts I.e. externalities very likely

  • This research design will not capture these effects

  • To see this, consider model:

Miguel kremer methodology

Miguel/Kremer Methodology

  • Existing methodology cannot measure externality only individual effect

  • Randomize treatment across schools not individuals

  • This can identify ß1+ ß2

  • Could have had design in which randomized proportion of individuals within schools getting treatment

Typical result

Typical Result

  • Cannot separate externality from direct effect – but this is important for public policy

  • Have non-experimental approach to this – using fact that not all kids from same village go to same school

  • This gives variation in X

Some examples of how they do this

Some examples of how they do this:

  • Include number of kids in local area who are in treatment schools

Problems with experiments

Problems with Experiments

  • Expense

  • Ethical Issues

  • Threats to Internal Validity

    • Failure to follow experiment

    • Experimental effects (Hawthorne effects)

  • Threats to External Validity

    • Non-representative programme

    • Non-representative sample

    • Scale effects

Conclusions on experiments

Conclusions on Experiments

  • Are ‘gold standard’ of empirical research

  • Are becoming more common

  • Not enough of them to keep us busy

  • Study of non-experimental data can deliver useful knowledge

  • Some issues similar, others different

  • Login