1 / 32

Review Session 10 Endogeneity

Review Session 10 Endogeneity. Catalina Martinez c atalina.martinez@graduateinstitute.ch Office hours: Tuesdays 6-8pm Rigot 27 Economics and Development MDev 2012-2013 THE GRADUATE INSTITUTE | GENEVA. Logistics. Today we will focus on endogeneity.

peggy
Download Presentation

Review Session 10 Endogeneity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Review Session 10Endogeneity Catalina Martinez catalina.martinez@graduateinstitute.ch Office hours: Tuesdays 6-8pm Rigot 27 Economics and Development MDev 2012-2013 THE GRADUATE INSTITUTE | GENEVA

  2. Logistics • Today we will focus on endogeneity. • It is an important concept that might be useful for you in your thesis as well. • It might take some time to absorb, so it is better to cover it today than next week. • If you have questions always come to ask me or send me an email. • This week I have also time on Wednesday, just send me an email and we can agree on a time to meet. • Regarding the empirical papers, just focus on the main messages of each reading (in the lecture notes), we will summarize them in the next review session.

  3. Logistics • Quiz • Did not send it to you before because it is on endogeneity, since I think it is important that you work on this concept now. • The empirical papers main results are anyway easier to understand. • Next week we will also do a review of all the topics to prepare for the exam. • Please note that health will enter the exam.

  4. Today’s RS • Empirics and econometrics • Endogeneity • Instrumental Variables • Randomization • Quiz

  5. Empirics and econometrics

  6. Causality • Empirical papers are interested in quantifying the causal effect that a variable (x) can have on another variable (y) • X is the explanatory variable • Y is the dependent variable that we are interested in explaining • In development economics we often want to identify which policies work and which do not (impact evaluation): we want to be able to precisely calculate the effect that a treatment (x) has on an outcome variable of interest (y)

  7. Causality • It is easy to see when two variables (x and y) are correlated. • But identifying causality is very difficult. • Does x cause y? • Is it the other way around? • Is the effect driven by something else that is affecting x and also y and that we cannot disentangle somehow?

  8. Causality y = a + bx + e • We want to identify the effect of x on y • If x is exogenous the impact of x on y is captured by b, which can be estimated using OLS.

  9. Causality y = a + bx + e ? • But what if there is something that we do not observe (e) affecting x and y? • e by definition is everything that we do not observe that has an impact on y • The problem arises when it also has an impact on x. This would make x endogenous. • If x is endogenous the impact of x on y is not captured by b. • b will be biased (unprecise: over or underestimated) because since we do not observe e, we cannot isolate the effect of x alone on y.

  10. Causality y = a + bx + e ? • What if we have reasons to believe that not only x has an impact on y but also the other way around? • Then it would be difficult to disentangle (Identify is the technical term), the effect of x on y and b will be biased.

  11. Why is this important? • Economists frequently study markets that are in equilibrium. • Multiple factors are being jointly determined. • For example, consider supply and demand curves. • It is the economic word for the “chicken or egg” dilemma! • Development tries to identify what works and what does not work. • Policy implications should be derived only from causal relations. • Correlations are not enough!

  12. Examples • Sachs and Warner (1997) • Human capital has a positive impact on growth • But also countries with high income levels tend to have higher human capital • How can we identify the impact of human capital? • The estimate of S&W can be BIASED because they do not take into account the potential endogeneity of human capital • It may be overestimated or underestimated

  13. Examples • Acemoglu, Johnson and Robinson (2001) • Countries with better institutions tend to grow faster • But also countries with higher income levels have better institutions • How can we identify the impact that institutions have on growth? • AJR propose an instrumental variable (IV) approach to solve this problem. • This is THE classic IV paper. • Today we will see how IV work.

  14. OLS and causality Suppose we have a regression equation y = a + b1x1+ e • We want to quantify the causal effect of x1on y • OLS would work if x1 is exogenous. • OLS would be biased (under/over estimated) if x1 is endogenous.

  15. Details: Endogeneity • Suppose we have a regression equation y= a + b1x1+ e • The variable x1 is endogenous if it is correlated with the error term (e) . • This implies that x1has an impact on y, but that we cannot calculate it correctly. • There is something else in the relationship between x1and y that we cannot observe (the error e)

  16. Details: Bias This makes our estimate of b1 biased • If the correlation between x1 and e is positive, b1 will be overestimated • Example: in the S&W case, if we assume that human capital has a positive correlation with something in the error term, then we would be overestimating the effect of human capital on growth.

  17. Details: Bias This makes our estimate of b1 biased • If the correlation between x1 and e is negative, b1 will be underestimated • Example: if we want to estimate the impact of a development program on income, and we assume that the people that receive the treatment tend to be the poorest, then there is a negative correlation with the error term, and the effect of the program will be underestimated.

  18. Sources of Endogeneity 1. Omitted variables • If the true model underlying the data is y = a + b1x1 + b2x2+ n • but you estimate the model y = a + b1x1+ e • then variable x1 will be endogenous if it is correlated with x2. • Why? Because e = f (n, x2).

  19. Sources of Endogeneity 2. Measurement error • Suppose the true model underlying the data is y = a + b1x1+ e • but you estimate the model y = a + b1x1* + e • where (x1* = x1+ j). • x1 will be endogenous ifjis correlated with it. • Example: Suppose that x1 measures hospital size (no. of beds), and that the measurement error is greater for larger hospitals. Then as x1 grows, so does j. Thus e is correlated with x1, causing endogeneity.

  20. Sources of Endogeneity • Simultaneity or double causality • Suppose that two variables (y1 andy2)are codetermined, with each affecting the other. y1 = a + b1x1 + g2y2 + e y2 = a + g1x1 + g2y1 + e • With some algebra you can rewrite these two equations in as a single equation with an endogenous regressor.

  21. Solutions: instrumental variable (IV) y = a + bx + e y = a + bx + e z • We want to find a variable that has an impact on x but DOES NOT HAVE ANY DIRECT IMPACT ON y. • This variable needs to be exogenous (don’t have any relation with e) • This variable (z) would be our instrument for x. ?

  22. IV: first stage x= νz+γ • If we can estimate the impact of this exogenous variable on x, we can estimate the exogenous part of x. • This would be ^

  23. IV: second stage y = a + bx + e • We can now estimate the original regression with that exogenous part of x • The effect of x on y is better approximated by the b in this way that in the simple OLS way. ^

  24. Examples • AJR (2001) • Initial settler mortality affect developing countries’ institutions today • So we have a z (IV) that has an impact on x • But they do not have a direct impact on growth today (exclusion restriction). • So we have a z that does not have a direct impact on y and is therefore not related with e

  25. Examples • Edward Miguel (2004): • Wants to estimate the impact of growth on conflict. • Growth is endogenous because also conflict has an impact on it (simultaneity/double causality). • Uses rainfall as an instrument for growth • The idea is that rainfall influences growth • But has no direct impact on conflict (only the impact that it has through growth)

  26. Solutions: Randomization • Addressing endogeneity is very important in impact evaluation. • We want to see if a treatment variable x has a causal effect on an outcome variable y • If we randomize x, it is easier to identify its impact on y. • Example: • Cash transfers impact on children’s education • The problem is that cash transfers tend to go to children in poor families, which tend to have lower education. • If we make the intervention random, we make sure that this correlation is as low as possible. • And it is easier to identify the impact of it.

  27. Quiz

  28. Question 1 • When is a variable said to be endogenous? • In a regression approach, we want to identify the effect of an explanatory variable (x) on the dependent variable (y). • An explanatory variable (x) is endogenous if it is correlated with the error term (ε), i.e. with something that affects the variable that we want to explain (y) but that we cannot observe or cannot disentangle.

  29. Question 2 • What can be the causes of endogeneity? • Omitted variables, measurement error, simultaneity/double causality. • In all these cases there is something in the error term (ε) that is correlated with the explanatory variable (x), therefore making this variable endogenous.

  30. Question 3 • Why is endogeneity important? • Because the usual techniques that we use (OLS) do not longer work. • The estimation of the impact of the explanatory variable (x) on the dependent variable (y) is biased (inaccurate) • It can be over or underestimated • If we want to find what works and what does not work in development, we need to take this into account (for impact evaluation) • This may have important policy implications (we may think that some interventions work, but actually the drivers of the effect are unknown, so increasing the intervention does not guarantee a positive effect)

  31. Question 4 • What is an instrumental variable? How is it used? • Is a variable (z) that has an impact on the endogenous explanatory variable (x) but does not have a direct impact on the variable that we want to explain (y) or a correlation with the error term (ε) (it is exogenous) • It is used to estimate the exogenous part of the potentially endogenous variable (x). • Once this exogenous part is estimated, the effect of this explanatory variable (x) can be better approximated.

  32. Question 5 • Why is randomization useful? • When we want to identify the impact of a treatment (x) on an outcome variable (y), randomization is useful to isolate the impact of the treatment (x) from any other variables that we cannot control for (ε). • By making the intervention exogenous, we make sure that the outcome variable (y) that we want to affect with the treatment (x) does not have any impact on the likelihood of receiving the treatment (x). • We also make sure that unobservables in the error term (ε) do not have a relation with the treatment variable (x). • Indeed, since the treatment (x) is randomly assigned, it should not have any relation with any of these variables.

More Related