Exploring Randomized Experiments in Public Policy

Experimental MethodsLectureS 1-8 Eric Bettinger Stanford University

Where do we see randomized experiments in public policy? • Oportunidades/Progressa in Mexico • Reducing Crime • Juvenile Delinquency • Early Childhood development • Head start • Education (general IES focus) • Vouchers • Career Themes • Class Size • Electricity pricing • Automated Medical Response • Housing Assistance • Housing Vouchers • Job Training • Unemployment Insurance • Welfare to Work • Health Services • College Financial Aid • College Work • Mental Health Treatments

How Have Randomized Experiments Affected Public Policy? • Class Size and Tennessee STAR • California Class Size • Pre School and Perry Preschool • Head Start • Reading Curricula and Success for All • Conditional Cash Transfer and Progressa • EscolaBolsa and others • Educational Vouchers

Experiments as the “Gold Standard” • Wide recognition as the “Gold Standard” • World Bank • US Government and “No Child Left Behind” Legislature • Why the “Gold Standard”? • Identifying Causal Impacts • Eliminating Selection Bias • Simplicity • Much Easier to Explain • No Need for Complex Statistical Modeling

Are there limitations? • YES! • Some potential limitations: • General equilibrium • Interpretation • Mechanism • Fragility of design • More… • We will return to this later in the course.

Key goals this week: • Understand causal modeling • Understand relationship of randomization to causal modeling • Gain the statistical tools to analyze randomized experiments • Become aware of underlying assumptions, strengths, and weaknesses of experimental approaches • Become acquainted with key players in the development and implementation of randomized experiments • Gain the statistical tools to design randomized experiments • Understand other key issues in design and implementation of random experiments.

What is causality? • There are many potential definitions … • Morgan and Winship: C causes E if 1) Both C and E occur; and 2) If C did not occur and everything else was equal, E would not have occurred. • Fisher: Outcomes differ by treatment • Neyman: Average outcomes of treatments A and B differ • Does the treatment have to be manipulable? • Can there be effects of being female? • How do we think about mechanisms and counterfactuals? • Can differences in outcomes come from different dosages? • What is the counterfactual in the Neyman and Fisher definitions?

Defining Counterfactuals • Assume that each individual in the population has a potential outcome for each potential causal state. • In two-state model, we can define Yi =Y1i if exposed to treatment Y0i if not exposed to treatment • We could generalize this to Y2i, Y3i, . . . , Yki for k different treatments. • These are potential outcomes, and we assume they exist. • Each individual has their own outcome – heterogeneous outcomes. • We never observe more than one outcome for each person.

Counterfactual? • KEY POINT: We have to make assumptions about whether the average observed for some aggregated group is an accurate representation of the counterfactual. In essence, the strength of an identification strategy is our ability to accurately measure the counterfactual outcome.

Rubin (1986): “What ‘Ifs’ Have Causal Answers?” • Not all questions have causal answers. • Would I be here in Russia if I had studied art history instead of economics? • Would I be here in Russia if I were a woman rather than a man? • SUTVA is the key assumption for determining which questions have causal answers. • SUTVA = “stable unit treatment value assumption”

Baseline framework for understanding SUTVA • What are the ingredients of an experiment? • N units indexed from i=1,…,N • T Treatments indexed by t=1,…,T • Y is outcome and indexed by t and i. • Two Key Conditions: • SUTVA says that Y will be the same for any i no matter how t is assigned. • SUTVA also says that Y will be the same no matter what treatments the other units receive.

Condition 1. Same Y no matter on how t assigned. • Consider the following statement: If John Doe, had been born a female, his life would have been different. • How do you make John Doe a female? • Hypothetical Y to X chromosome treatment at conception • Massive doses of hormones in utero • At-birth sex-change • Does the form of the change make a difference?

Consider an educational example • Are there treatments where different versions of t matter? • Consider Class size. • If we assign a small class to a student (t), is there more than one (t) in existence. Are all small classes equal? • No – teachers differ, peers differ, and so on. • What about educational vouchers? • Vouchers are “coupons” which allow students to attend the school of their choice. • Are outcomes the same no matter the voucher? • We have to assume that all versions of t are the same under SUTVA.

Consider Tennessee class size experiment (Ding and Lehrer 2011) • Under SUTVA, should class size effect vary by the proportion of the school involved in experiment?

Consider Tennessee class size experiment • The problem is that it does. Why?

Condition 2. Yti cannot depend on whether i’ received to or t1 • Consider the example above. • Are there treatments that are diluted as more people get them?

Condition 2. Outcomes do not depend on others’ assignments. • What other examples are there? • General equilibrium effects. • Do outcomes change for the control group? • What is the effect to take this course in Russia versus Stanford? Is the treatment the same? • Key point: My treatment cannot affect your treatment.

Why do we care about SUTVA? • We can identify which questions can really be answered. Causal question is possible when SUTVA is satisfied. • SUTVA can help us sort through possible mechanisms. • It make the interpretation clearer.

Consider religious schools • Would we expect the religious school effect to change if more students attended religious school? • SUTVA does not hold if effectiveness of religious schools depends on the number/composition of students. • The distribution of students change • What then are the implications of religious school literature on voucher debates?

What should we think about SUTVA? • This is a useful framework for isolating the precise question for which we have a causal question. • It likely holds in small samples. • In large movements, we could have general equilibrium effects.

Simple Example • Recall that experiments include treatments, units of observation, and outcomes. • Consider the following scenario: If females at firm f had been male, their starting salaries would have averaged 20% higher. • How do we divide this into treatments, units, and outcomes? • It is likely not possible. • A useful motto: There is no causation without manipulation.

The Perry Preschool Experiment • Motivation for large investments in preschool. Hundreds of billions of dollars in investments. • Outline of the experiment: • 1962. • 123 black preschoolers (58 in treatment) • Low-income and low IQ scores (70-85, 1sd below mean) • Randomly assigned at ages 3-4 to a high-quality preschool program or no program. • Data collected annually from 3 through 11 and at 14, 15, 19, 27, and 40.

Overview of Perry Results Source: Schweinhart (2007)

More on Perry. Source: Schweinhart (2007)

More Perry Results Source: Schweinhart (2007)

More from Perry on Crime Source: Schweinhart (2007)

Which Causal Questions Does Perry Help Resolve? • Is attending preschool better than not attending for low-income, low-ability, African-American students in 1962? • Is attending preschool better than not attending for low-income, low-ability students? • Is attending preschool better than not attending for low-income students? • Is attending preschool better than not attending for all students? • SUTVA may not be fully satisfied but it helps us identify which questions our studies may resolve.

Is Perry valid today? Source: Schweinhart (2007)

Let’s formulate causality mathematically Yi = Y1i if Di=1 Y0i if Di=0 Yi = Y0i + (Y1i – Y0i)Di The most common approach is to compare students attending a “treatment” to those not attending a treatment. E[Yi | Di=1] – E[Yi | Di=0] = E[Y1i | Di=1] – E[Y0i | Di=1] + ( E[Y0i | Di=1] – E[Y0i | Di=0]) • 1st term is average effect on treated • 2nd term is selection bias

Thinking through the expectation E[Yi | Di=1] – E[Yi | Di=0] = E[Y1i | Di=1] – E[Y0i | Di=1] + ( E[Y0i | Di=1] – E[Y0i | Di=0]) • Example 1. Treatment = Religious Private Schooling • 1st term is average effect on treated • Average private school outcome of individuals in religious private schooling minus the average public school outcome for individuals attending private school • The latter term is the unobserved counterfactual. We have to find a way to estimate it. • 2nd term is selection bias • The difference between the public school outcome that private school attendees would have had and the public school outcome for public school attendees. • Informs us as to how different private school attendees are from public school attendees.

Thinking through the expectation E[Yi | Di=1] – E[Yi | Di=0] = E[Y1i | Di=1] – E[Y0i | Di=1] + ( E[Y0i | Di=1] – E[Y0i | Di=0]) • Example 2. Treatment = Attending Preschool • 1st term is average effect on treated • Average private school outcome of individuals in preschool minus the average outcome that students who attended preschool would have had if they had not gone to preschool • 2nd term is selection bias • The difference between the outcome that preschool attendees would have had without preschool and the outcome of students not attending preschool.

Use of the formulation • It helps us figure out what we are estimating. • Later we will augment this model with the “probability of compliance.” • It identifies the key means we need to estimate to gain causal estimates. • It helps us analyze our approach regardless of our methodology

Hypothetical Example • A recent study found in the overall population that students who attended schools of type X had higher test scores than other students. • What do we expect the selection bias to look like? This was a school requiring motivated parents. • If we randomized in the whole population, what direction should we expect the treatment effect to go? E[Yi | Di=1] – E[Yi | Di=0] = E[Y1i | Di=1] – E[Y0i | Di=1] + ( E[Y0i | Di=1] – E[Y0i | Di=0])

Abdulkadiroglu et al (2011) • Examines charter schools in Boston • Charter schools are public schools which operate more closely to private schools. • Obama has pushed for more charter schools. • Important research question to know if they work. • Charter schools are often oversubscribed. • When oversubscribed, they use lotteries to determine who gets in. • Other charters are not oversubscribed. • If a school is oversubscribed, what would you expect? • It probably does a pretty good job.

Abdulkadiroglu et al (2011)

Synthesizing • The authors found a selection of schools of type X which ran lotteries to determine who entered the schools. In these schools, the researchers exploited the randomization. The difference between winners and losers was even higher than the previous comparison. • Explain? • Basically, observed effect greater than treatment effect then two possibilities: • Selection bias is negative at no wait list schools [likelihood?] • Treatment effect is much lower. Stronger the selection effects in these other schools the lower the treatment effect.

Regression Formulation • Yi = a + b*Treatmenti + ei • Consider the OLS estimator of b. We will call it bhat. • E[bhat] = b + E[T’e/T’T] • Selection bias would suggest that e and T are correlated. Think of it as an omitted variable problem. • If T is randomly assigned, then E[T’e]=0 . No omitted variable can be correlated with T.

Multivariate Regression • What is the consequence of including more X in a regression where there is randomization? • Generally, the standard errors are lower • X is correlated with Y and once controlled for reduce the variance of Y • Estimated treatment effect should be unbiased if there is no selection bias • We will return to this.

Recap up to now. • Experiments is our key interest • We need units, treatments, and outcomes. • Causal questions need treatments that are able to be manipulated whether by the researcher or “nature.” • SUTVA are essential for identifying which questions we are asking which are causal. • Condition 1. Treatment is constant. • Condition 2. Your treatment does not affect mine. • Typical comparisons mask treatment effects and selection bias. • Randomization removes selection bias, but there are other ways to get rid of it.

Experiments vs. Observational Studies • Cox and Reid: The word experiment is used in a quite precise sense to mean an investigation where the system under study is under the control of the investigator. This means that the individuals or material investigated, the nature of the treatments or manipulations under study and the measurement procedures used are all selected, in their important features at least by the investigator. By contrast in an observational study some of these feature, and in particular the allocation of individuals to treatment groups is outside the investigator’s control.

Definition of Experiment • Notice that Cox and Reid never mentioned “randomization” in their definition. • Are there experiments which are not randomized? • Not all studies lend themselves to randomization: • Effect of parental death on child outcomes • Effect of divorce on children • Effect of no schooling • We will distinguish between “field experiments” and “natural experiments.” • Natural experiments are places where randomization occurs “by nature.” Variation between treatment and control takes place for arbitrary and seemingly random reasons. • Next week we are going to focus here. • In field experiments, the researcher “controls” the randomization.

Choosing the treatment • In designing random experiments, we start with the treatment. • Two schools of thought: • Start with theory, policy, or conceptual frameworks. • Identify opportunities for randomization and then identify questions which might be of interest to academics or policymakers. • Angrist and Kremer and divergent approaches • The difference between program evaluation and research • The importance of partners. • Oftentimes the relationship is more important than the question • Duflo article discusses partners before the basics of randomization

The importance of partners • Who are our partners? • Governments (Job Training, Negative Income Tax, Progressa, Generating new pilots) • NGO (Can focus on smaller population than governments) • Private Groups • Partners have priorities and existing beliefs. • These create limitations and opportunities in our design. • Partners have resources. • Few of us have the money or time to implement new treatments to units or to gather data on outcomes. • Partners are key to this. • Partners can help us find populations to study.

Some examples to work through • Hypothesis: students’ writing and self-confidence are linked. Self-affirming writing experiences reinforces student confidence and subsequently student outcomes. • What’s the ideal experiment? • Is it plausible? • Hypothesis: Remediation is actually counterproductive in college. • Experiment? • Plausibility? • Hypothesis: Deworming improves health and educational outcomes. • Experiment? • Plausibility? • Hypothesis: Paying students for test scores improves academic performance. • Experiment? • Plausibility? • Hypothesis: Positive (negative) verbal reinforcement improves (destroys) academic progress. • Experiment? • Plausibility?

Worms paper results • Administer a vaccine to cure students of worms. • Worms are in dirty water. They are a parasite causing health problems. School attendance declines. • Randomized which students received the treatment within a community • Randomized which communities received treatments. • Key results? • No difference within communities • Communities who received the treatment were better off.

Unit of randomization • Once we have the treatment, we can determine the optimal units. • Why do we care about “units”? • Statistical power. Typically the unit of analysis is the same as the unit of randomization. • Statistical modeling. • Contamination. • The causal question often changes with the units. • “External Validity.” Can we generalize to other populations? • Mode of delivery. • Political considerations.

Consider some simple cases • Which level (individuals, classes, schools) would be best to test the following treatments? • Adoption of a new curriculum. • Creating a conditional cash transfer program based on schooling. • Preschool subsidies. • Giving incentives to teachers. • Giving incentives to students studying in an online course.

Exploring Randomized Experiments in Public Policy