- 83 Views
- Uploaded on
- Presentation posted in: General

Estimating Causal Effects with Experimental Data

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Estimating Causal Effects with Experimental Data

- Start with example where X is binary (though simple to generalize):
- X=0 is control group
- X=1 is treatment group

- Causal effect sometimes called treatment effect
- Randomization implies everyone has same probability of treatment

- If X allocated at random then know that X is independent of all pre-treatment variables in whole wide world
- an amazing claim but true.
- Implies there cannot be a problem of omitted variables, reverse causality etc
- On average, only reason for difference between treatment and control group is different receipt of treatment

- Proof: Joint distribution of X and W is f(X,W)
- Can decompose this into:
f(X,W)=fX│W(X│W)fW(W)

- Now random assignment means
fX│W(X│W)=fX (X)

- This implies:
f(X,W)=fX (X)fW(W)

- This implies X and W independent

- Black men earn less than white men in US
LOGWAGE | Coef. Std. Err. t

-----------+-------------------------------

BLACK | -.1673813 .0066708 -25.09

NO_HS | -.2138331 .0077192 -27.70

SOMECOLL | .1104148 .0049139 22.47

COLLEGE | .4660205 .0048839 95.42

AGE | .0704488 .0008552 82.38

AGESQUARED | -.0007227 .0000101 -71.41

_cons | 1.088116 .0172715 63.00

- Could be discrimination or other factors unobserved by the researcher but observed by the employer?
- hard to fully resolve with non-experimental data

- Bertrand/Mullainathan “Are Emily and Greg More Employable Than Lakisha and Jamal”, American Economic Review, 2004
- Create fake CVs and send replies to job adverts
- Allocate names at random to CVs – some given ‘black-sounding’ names, others ‘white-sounding’

- Outcome variable is call-back rates
- Interpretation – not direct measure of racial discrimination, just effect of having a ‘black-sounding’ name – may have other connotations.
- But name uncorrelated by construction with other material on CV

- Want estimate of:

- Take mean of outcome variable in treatment group
- Take mean of outcome variable in control group
- Take difference between the two
- No problems but:
- Does not generalize to where X is not binary
- Does not directly compute standard errors

- Run regression:
yi=β0+β1Xi+εi

- Proposition 2.2 The OLS estimator of β1 is an unbiased estimator of the causal effect of X on y:
- Proof: Many ways to prove this but simplest way is perhaps:
- Proposition 1.1 says OLS estimates E(y|X)
- E(y|X=0)= β0 so OLS estimate of intercept is consistent estimate of E(y│X=0)
- E(y|X=1)= β0+β1so β1 is consistent estimate of E(y│X=1) -E(y│X=0)

- Hence can read off estimate of treatment effect from coefficient on X
- Approach easily generalizes to where X is not binary
- Also gives estimate of standard error

- Unless told otherwise regression package will compute standard errors assuming errors are homoskedastic i.e.
- Even if only interested in effect of treatment on mean X may affect other aspects of distribution e.g. variance
- This will cause heteroskedasticity
- Heteroskedasticity does not make OLS regression coefficients inconsistent but does make OLS standard errors inconsistent

- Also called:
- Huber standard errors
- White standard errors
- Heteroskedastic-consistent standard errors

- Statistics course approach
- Get variance of estimate of mean of treatment and control group
- Sum to give estimate of variance of difference in means

- Can estimate this by using sample equivalents
- Note that this is same as OLS standard errors if X and ε are independent

Proposition 2.3If εand X are independent the OLS formula for the standard errors will be consistent even if the variance of ε differs across individuals.

- Proof: If ε and X are independent
- Putting this in expression for asymptotic variance of OLS estimator:
- A consistent estimate of the final term is the mean of the squared residuals i.e. usual estimate of σ2

- Have to interpret residual variance differentyl – not common to all individuals but the mean across individuals
- With one regressor can write robust standard error as:
- Simple to use in practice e.g. in STATA:
.reg y x, robust

- Econometrics very easy if all data comes from randomized controlled experiment
- Just need to collect data on treatment/control and outcome variables
- Just need to compare means of outcomes of treatment and control groups
- Is data on other variables of any use at all?
- Not necessary but useful

- Can get consistent estimate of treatment effect without worrying about other variables
- Reason is that randomization ensures no problem of omitted variables bias
- But there are reasons to include other regressors:
- Improved efficiency
- Check for randomization
- Improve randomization
- Control for conditional randomization
- Heterogeneity in treatment effects

- Don’t just want consistent estimate of causal effect – also want low standard error (or high precision or efficiency).
- Standard formula for standard error of OLS estimate of βis σ2(X’X)-1
- σ2 comes from variance of residual in regression – (1-R2)* Var(y)

- Proof: (Will only do case where X and W are one-dimensional)
- When W is included variance of the estimate of the treatment effect will by first diagonal element of:

- Now:
- Using trick from end of notes on causal effects we can write this as:

- Inverting leads to
- By randomization X and W are independent so:
- The only difference is in the error variance – this must be smaller when W is included as R2 rises

- Randomization can go wrong
- Poor implementation of research design
- Bad luck

- If randomization done well then W should be independent of X – this is testable:
- Test for differences in W in treatment/control groups
- Probit model for X on W

- Can also use W at stage of assigning treatment
- Can guarantee that in your sample X and W are independent instead of it being just probabiliistic
- This is what Bertrand/Mullainathan do when assigning names to CVs

- This is case where must include W to get consistent estimates of treatment effects
- Conditional randomization is where probability of treatment is different for people with different values of W, but random conditional on W
- Why have conditional randomization?
- May have no choice
- May want to do it (c.f. stratification)

- Allocation of students to classes is random within schools
- But small number of classes per school
- This leads to following relationship between probability of treatment and number of kids in school:

- X can know be correlated with W
- But, conditional on W, X independent of other factors
- But must get functional form of relationship between y and W correct – matching procedures
- This is not the case with (unconditional) randomization – see class exercize

- So far have assumed causal (treatment) effect the same for everyone
- No good reason to believe this
- Start with case of no other regressors:
yi=β0+β1iXi+εi

- Random assignment implies X independent of β1i
- Sometimes called random coefficients model

- Would like to estimate causal effect for everyone – this is not possible
- Can only hope to estimate some average
- Average treatment effect:

- Proof for single regressor:

- Full outcomes notation:
- Outcome if in control group:
y0i=γ0’Wi+u0i

- Outcome if in treatment group:
y1i=γ1’Wi+u1i

- Outcome if in control group:
- Treatment effect is (y1i-y0i) and can be written as:
(y1i-y0i )=(γ1- γ0 )’Wi+u1i-u0i

- Note treatment effect has observable and unobservable component
- Can estimate as:
- Two separate equations
- One single equation

- We can write:
- Combining outcomes equations leads to:
- Regression includes W and interactions of W with X – these are observable part of treatment effect
- Note: error likely to be heteroskedastic

- Different treatment effect for high and low quality CVs:

- Causal effect measured in units of ‘experiment’ – not very helpful
- Often want to convert causal effects to more meaningful units e.g. in Project STAR what is effect of reducing class size by one child

- where S is class size
- Takes the treatment effect on outcome variable and divides by treatment effect on class size
- Not hard to compute but how to get standard error?

- Can’t run regression of y on S – S influenced by factors other than treatment status
- But X is:
- Correlated with S
- Uncorrelated with unobserved stuff (because of randomization)

- Hence X can be used as an instrument for S
- IV estimator has form (just-identified case):

- This will give estimate of standard error of treatment effect
- Where instrument is binary and no other regressors included the IV estimate of slope coefficient can be shown to be:

- So far:
- in control group implies no treatment
- In treatment group implies get treatment

- Often things are not as clean as this
- Treatment is an opportunity
- Close substitutes available to those in control group
- Implementation not perfect e.g. pushy parents

- Designed to investigate the impact of living in bad neighbourhoods on outcomes
- Gave some residents of public housing projects chance to move out
- Two treatments:
- Voucher for private rental housing
- Voucher for private rental housing restricted for use in ‘good’ neighbourhoods

- No-one forced to move so imperfect compliance – 60% and 40% did use it

- Z denotes whether in control or treatment group – ‘intention-to-treat’
- X denotes whether actually get treatment
- With perfect compliance:
- Pr(X=1│Z=1)=1
- Pr(X=1│Z=0)=0

- With imperfect compliance:
1>Pr(X=1│Z=1)>Pr(X=1│Z=0)>0

- ‘Intention-to-Treat’:
ITT=E(y|Z=1)-E(y|Z=0)

- This can be estimated in usual way
- Treatment Effect on Treated

- Can’t use simple regression of y on Z
- But should recognize TOT as Wald estimator
- Can estimated by regressing y on X using Z as instrument
- Relationship between TOT and ITT:

- No effects on adult economic outcomes
- Improvements in adult mental health
- Beneficial outcomes for teenage girls
- Adverse outcomes for teenage boys

- TOT approximately twice the size of ITT
- Consistent with 50% use of vouchers

- If treatment effect same for everyone then TOT recovers this (obvious)
- But what if treatment effect heterogeneous?
- No simple answer to this question
- Suppose model for treatment effect is:

Proposition 2.6The IV estimate for the heterogeneous treatment case is a consistent estimate of:where:the difference in the probability of treatment for individual i when in treatment and control group

- Model for effect of intention to treat on being treated:

- Can write ‘reduced-form’ as:
- Wald estimator then becomes:
- As:

- This is weighted average of treatment effects
- ‘weights’ will vary with instrument – contrast with heterogeneous treatment case
- Some cases in which can interpret IV estimate as ATE

- Proof:
- A.This should be obvious as:
- B. Can write as:

- Previous formula says depends on covariance of β1i and πi
- In some situations can sign – but not always
- Example 1: no-one gets treatment in the absence of the programme so
- If those who get treatment when in the treatment group are those with the highest returns then:
- IV>ATE

- Example 2: treatment is voluntary for those in the control group but compulsory for those in the treatment group
- This implies
- If those who get treatment in control are those with highest returns then:
- IV<ATE

- Case where IV estimate is not ATE
- Assume that everyone moved in same direction by treatment – monotonicity assumption
- Then can show that IV is average of treatment effect for those whose behaviour changed by being in treatment group
- They call this the Local Average Treatment Effect (LATE)

- Have assumed that treatment only affects outcome for person for receives it
- Many situations in which this is not true
- E.g. externalities, spill-overs, effects on market prices
- Example: Miguel and Kremer, “Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities”, Econometrica 2004

- Infection from intestinal worms is rife among Kenyan schoolchildren
- Major cause of school absence
- Leads to lower human capital accumulation, lower growth?
- Investigation of effectiveness of anti-worming drugs on health, education

- Randomize drug treatment within schools
- But probability of re-infection affected by infection rate among contacts I.e. externalities very likely
- This research design will not capture these effects
- To see this, consider model:

- Existing methodology cannot measure externality only individual effect
- Randomize treatment across schools not individuals
- This can identify ß1+ ß2
- Could have had design in which randomized proportion of individuals within schools getting treatment

- Cannot separate externality from direct effect – but this is important for public policy
- Have non-experimental approach to this – using fact that not all kids from same village go to same school
- This gives variation in X

- Include number of kids in local area who are in treatment schools

- Expense
- Ethical Issues
- Threats to Internal Validity
- Failure to follow experiment
- Experimental effects (Hawthorne effects)

- Threats to External Validity
- Non-representative programme
- Non-representative sample
- Scale effects

- Are ‘gold standard’ of empirical research
- Are becoming more common
- Not enough of them to keep us busy
- Study of non-experimental data can deliver useful knowledge
- Some issues similar, others different