PPS231S.01 Law, Economics, and Organization

PPS231S.01Law, Economics, and Organization Spring 2012 I.2. empirical methods

Background Ideas and Methods In the last section of the syllabus, we have discusses markets and the allocation or resources. In this section, we will discuss some of the methods used to test hypotheses in law and economics – and in applied microeconomics in general. (My belief is that this isn’t exclusive to applied microeconomics is that there is currently a methodological convergence in the social sciences.)

Empirics: Linear Regression and Causality Economics is a social science. As such, it uses the scientific method to progress. So what does the data have to say about our theories? Before we can answer this question – which will occupy us all semester – we need to know what we can and cannot learn from the data. This module will thus be epistemological in nature.

Empirics: Linear Regression and Causality Banerjee (2007) begins with the 2005 earthquake in Pakistan, which injured or killed around 50,000 people. Everyone wanted to help: NGOs, the UN, college students, governments, etc. Money flowed in from everywhere. But would aid get to the right people? How to be sure that people in remote and almost inaccessible places received their fair share?

Empirics: Linear Regression and Causality So a group of US-based economists decided to set up a website and call center to improve coordination. Banerjee writes: “The reaction, when it was not actually hostile, tended to be derisive: ‘Are you mad? You want us to (…) [fill] forms when people are dying?’” For Banerjee, this captured one of the core problems with aid: institutional laziness. “Aid thinking is lazy thinking.”

Empirics: Linear Regression and Causality Banerjee has another example: a 2002 World Bank sourcebook titled Empowerment and Poverty Reduction. This was supposedly a catalog of the most effective poverty-reduction strategies based on “best practices,” whatever that means. But nowhere did the book explain how we know why the things it listed (e.g., cell phones; microfinance; land titling; etc.) worked at all.

Empirics: Linear Regression and Causality “Development goes through a lot of fads. We need evidence on what works.” - Michael Kremer. Examples of fads outside of development: Smaller class sizes cause students to perform better. Carbohydrates are what makes you fat. Fewer guns means fewer homicides Capital punishment lowers the murder rate

Empirics: Linear Regression and Causality Figuring out what works is not easy. Suppose you wish to look at how an individual outcome y (e.g., welfare) changes as individuals are subjected or not to a policy D controlling for certain factors x (e.g., age, gender, education, community, etc.) The equation of interest is then (1) y = α + βx + γD + ε, where γ is the parameter of interest and represents the average treatment effect of policy D and ε represents our ignorance.

Empirics: Linear Regression and Causality Let’s keep equation1 in mind. In fact, let’s make it even simpler by writing it as (1’) y = α+ γD + ε, Where y is agricultural productivity, or rice yield (kilograms per hectare), and D is cultivated area (hectares). Under the right mathematical transformation, γ tells us the percentage increase in productivity for a 1 percent increase in cultivated area – a relationship that is generally negative in developing countries. Here is a picture from Barrett et al. (2010).

Empirics: Linear Regression and Causality

Empirics: Linear Regression and Causality But is D actually causal? necessary condition for γ to be causal is that D be uncorrelated with ε. There are three sources of correlation between D and ε: Reverse causality/simultaneity: y affects D, or they are jointly determined; Omitted variables (or unobserved heterogeneity): x fails to include important confounding factors, which are thus included in ε; Measurement error: A variable on the right-hand side of equation 1 is measured with error.

Empirics: Linear Regression and Causality Correlation between D and ε is the bane of the social sciences. The problem is generally known as endogeneity. Before we discuss potential solutions, let’s consider a few example of identification problems – scenarios where it’s difficult or impossible to ascribe causality: Volvos and California stops; Health effects of orange juice; Wages and education.

Empirics: Linear Regression and Causality At this point in the history of social sciences, the gold standard is a randomized control trial (RCT). This is what many economists now advocate (popularized by Esther Duflo). So in the notation of equation 1 above, one has a random sample from which individual units are randomly assigned to a control (D = 0) or treatment (D = 1) group. Mean outcomes μy are then compared between groups.

Empirics: Linear Regression and Causality Moreover, with an RCT, the regression in equation 1 allows estimating the average treatment effect (ATE), which is equal to (2) E(y|D=1) – E(y|D=0), or, in English, the difference in outcomes between treatment and control group. The treatment-control terminology comes from the medical literature.

Empirics: Linear Regression and Causality While RCTs represent the gold standard, they are not without problems: They can be very costly to implement; Not everything can be randomized (e.g., prices); It can be difficult to enforce compliance with either treatment or control (this leads to estimating LATEs); External validity is not guaranteed: “Something that works in Minneapolis may fail in New York City.”

Empirics: Linear Regression and Causality Still, when given the choice between running an RCT or using observational (e.g., survey data), the majority of researchers would go with an RCT. When asked by World Bank researchers whether they agreed with the statement “Experiments have no special ability to produce more credible knowledge than other methods,” here is what a sample of doctoral students and assistant professors said.

Empirics: Linear Regression and Causality

Empirics: Linear Regression and Causality Short of having experimental data, how can we deal with the fact that the world is messy? How can we deal with the endogeneity problem that plagues most of social science? We will briefly discuss four common methods, but there are others which are less common in development economics.

Empirics: Linear Regression and Causality The first method is imperfect and involves the use of panel data, i.e., observing individual units several times each. For example, we can observe individuals two or more times within the context of a longitudinal study, i.e., a study that follows individuals over time and surveys them more than once.

Empirics: Linear Regression and Causality Panel data allow removing everything that (i) is individual-specific and (ii) does not vary over time. As such, they can eliminate a great deal of the correlation between D and ε, but they are not a cure-all. Example: Cross-country growth regressions. Several countries observed over time. We use country fixed effects to control for some unobserved heterogeneity (e.g., culture, maybe; national boundaries; climate; etc.)

Empirics: Linear Regression and Causality The second method can perfectly solve the identification problem, provided certain strict conditions hold. It is called instrumental variables (IV) estimation. This requires a new variable Z which (i) is correlated with D; but (ii) uncorrelated with ε.

Empirics: Linear Regression and Causality Using such an IV, one can “exogenize” D (i.e., get rid of endogeneity) so as to identify the causal impact of D on y. As such, IV estimation is a very powerful method, but one needs an IV that is truly exogenous Example: Angrist (1990) wanted to know the impact of education on wages and used each respondent’s Vietnam draft lottery number.

Empirics: Linear Regression and Causality For those of you who are interested, Sovey and Green (AJPS, 2011) discuss the use of IVs in political science. Further, as a result of the “credibility revolution” in economics, what was acceptable in terms of IV 10 years ago is more often than not unacceptable today (Angrist and Pischke (2010). Example: Education of parents as an IV for education of individuals in a wage regression.

Empirics: Linear Regression and Causality Another method is the use of field experiments. In this case, an experiment is run alongside survey data collection, aimed at eliciting some unobservable parameter, such as one’s bargaining power, discount factor, entrepreneurial ability, risk preferences, technical ability, time preferences, trust, etc.

Empirics: Linear Regression and Causality So, in equation 1 above, let’s say we are interested in whether risk aversion (D) affects whether people are likely to adopt a new cultivation technology (y). It is possible to develop a field experiment (a game played by survey respondents for real money) which will measure their risk preferences.

Empirics: Linear Regression and Causality In this case, the impact of D on y can be estimated accurately, and one can choose to include control variables X so as to increase the precision of that estimate. Because the source of variation used in an experiment is exogenous, then D is also exogenous, and the endogeneity problem is solved, for all intents and purposes Example: Risk preferences and the adoption of Bt cotton in China (Liu, forthcoming).

Empirics: Linear Regression and Causality Another method relies on a natural experiment. This can be an accident of history (e.g., the Indonesian tsunami of 2004).

Empirics: Linear Regression and Causality Again, this can be used to solve specific endogeneity problems. Once again, the difficulty lies in finding true natural experiments, i.e., accidents of history that are truly exogenous to the outcome we wish to study. (Note: Natural Experiments of History, by Jared Diamond and Jim Robinson.)

Empirics: Linear Regression and Causality Lastly, we can use a regression discontinuity design. This usually exploits some exogenously given threshold and the assumption that units just above and just below the threshold are otherwise identical, and so any difference is due to what the threshold dictates. E.g., Maimonides’ Rule in Israel. Class size has to be less than 40, so 80 students would be split in 28/28/27. We can then test the impact of class size on performance.

Empirics: Linear Regression and Causality Another example of an RD design would involve the admission to an elite school based on a standardized test. Some students are right above the threshold of admission, some are right under. Comparing those students close to the threshold, we can plausibly assess the impact of going to an elite school. RD designs only allow looking at local effects, however.

Empirics: Linear Regression and Causality If there is one thing that I would like you to take away from this seminar, it is to have a healthy skepticism with respect to any social-scientific finding. In other words, I want you to make sure that causal effects are properly identified and that you are not getting fooled by mere correlation. When they are not, claims should be qualified with terminology such as “the data suggest that …”

Empirics: Linear Regression and Causality Even if you don’t care about eggheaded academic debates, you should still care about proper identification of causal effects. Indeed, knowing whether policies (i) actually work; and (ii) to what extent allows us to minimize costs when we intervene Example: deworming costs $3.25 per student, but a conditional cash transfer costs $6,000 per student, but both cause a student to attend school for one more year.

PPS231S.01 Law, Economics, and Organization