The Application of Quasi-Experimental Methods in Education & Institutional Research: Conceptual Issues Stephen L. DesJardins Professor Center for the Study of Higher and Postsecondary Education School of Education and Professor, Gerald R. Ford School of Public Policy University of Michigan AIRUM Conference Morning Workshop Session November 6, 2013
Organization of Workshop • Morning: go over a lot of the framing & conceptualization of non-experimental methods • This is a necessary but not sufficient condition for conducting this type of research • Afternoon: examine some Stata applications to estimate models discussed in morning • These applications will be introductory but we can discuss where to find other resources • Provide references to readings & sources of code to enhance post workshop learning
Objectives • Discuss the need to improve the rigor of education research • Understanding conceptual issues is necessary to successfully apply methods • Describe the logic & conceptual basis of causal inference • Review designs/methods available to help make inferences about effects of education programs, policies, & practices
Importance of Rigor in Research • Systematically improving education policies, programs, practices requires understanding of “what works” • Goal: Make causal statements • Without doing so “it is difficult to accumulate a knowledge base that has value for practice or future study” (Schneider, 2007, p. 2). • However, education research has lacked rigor & relevance Quote
Why the Lack of Rigor? • Often lack of clarity about the designs & methods optimal for making causal claims • Many education researchers were not educated in the application of these methods • Some feel there is no time to learn new methods; others may feel new methods are to complicated to learn • Hard to create & sustain norms & common discourse about what constitutes rigor
Policy Changes Driving Push Toward Rigor • NCLB Act (2001): Included definition of “scientifically-based” research & set aside funds for studies consistent with definition • Education Sciences Reform Act (2002) replaced OERI with IES • Funding from IES, NSF, & other federal agencies tied to rigorous designs/methods • In NRC & AERA reports focused on improving the quality of education research
Cause and Effect • In RCTs (define) question is: What is effect of a specific program or intervention? • Summer Bridge program (intervention) may cause an effect (improved college readiness) • Cause & effect definition by Locke • Shadish, Cook, & Campbell (2002): Rarely know all the causes of effects or how they relate to one another • Need for controls
Cause and Effect (cont’d) • Holland (1986) notes that a true cause cannot be determined unequivocally; we seek the probability that an effect will occur. • Allows opportunity to est. why some effects occur in some situations but not in others • Example: Completing higher levels of math courses in HS may improve chances of finishing college more for some students than for others • Here we are measuring likelihood that cause led to the effect; not “true” cause/effect
Determining Causation • RCTs are the “gold standard” for determining causal effects • Pros: Reduce bias & spurious findings of causality, thereby improving knowledge of what works • Cons: Ethics, external validity, cost, possible errors that are also inherent in observational studies (measurement problems; “spillover” effects, attrition) • Possibilities: Oversubscribed programs (Living Learning Communities, UROP…)
Example • May be interested in whether highest math course finished in HS is Algebra II vs. Algebra I & effect on college completion • Goal: Make conclusion (causal inference) about treatment effect (relative to no/different treatment) on outcome of interest (access) • Estimated effect is how much higher, on average, the prob(finishing college) is for “treated” students (those w/Algebra II) than the “untreated” (Algebra I only) group
The Logic of Causal Inference • Must distinguish between inference model specifying cause/effect relation & statistical methods determining strength of relation • The inference model specifies the parameters we want to estimate or test • The statistical technique describes the mathematical procedure(s) used to test hypotheses about whether a treatment produces some effect
A Common Causal Scenario Observed or Unobserved Confounding Variable(s) Cause (e.g., Treatment) Effect (e.g., Educational Outcome)
The Counterfactual Framework • Owing to Rubin (1974, 1977, 1978, 1980) • What would have happened if individual exposed to a treatment was not exposed or exposed to a different treatment? • Causal effect: Difference between outcome under treatment & outcome if individual exposed to the control condition (no treatment or other treatment) • Formally: di = Yit – Yic
The Fundamental Problem… • …of causal inference is that if we observe Yitwe cannot also observe Yic • Holland (1986) ID’d two solutions to this problem: One scientific, one statistical • Scientific: Expose i to treatment 1, measure Y; expose i to treatment 2, measure Y. Difference in outcomes is causal effect • Assumptions: Temporal stability (response constancy) & causal transience (effect of 1st treatment does not affect i’s response to 2nd treatment)
Fundamental Problem (cont’d) • Second scientific way: Assume all units are identical, thus, doesn’t matter which unit receives the treatment (unit homogeneity) • Give treatment to unit 1 & use unit 2 as control, then compare difference in Y. • These assumptions are rarely plausible when studying individuals (even when we have twins, as in the MN Twin Family Study; this is not a study of the baseball team members families!!).
The Statistical Solution • Rather than focusing on units (i), estimate the average causal effect for a population of units (i’s). Formally: di = E(Yt– Yc) • where Y’s are avg. outcomes for i’s in treatment & control groups • Assume: i’s differ only in terms of treatment group assignment, not on characteristics or prior experiences that could affect Y
Example • If we study the effects of Algebra II on access to college, maybe it’s only the motivated students who select into treatment • If we could randomly assign students to Algebra II or not in HS then we could examine causal impact of course on Y. • Why? Because group assignment would, on average, be independent of any measured or unmeasured pretreatment characteristics.
Problems with Idealized Statistical Solution • Random assignment not always possible, so independence bet. pretreat. characteristics & treatment group assignment often violated • Statistical models are often used to adjust for confounding variables (student, classroom, school characteristics that predict treatment assignment & outcomes) when outcomes for treatment/control groups are compared
Criteria for Making Causal Statements • Causal relativity: Effect of cause must be made compared to effect of another cause • Causal manipulation: Units must be potentially exposable to both the treatment & control conditions. • Temporal ordering: Exposure to cause must occur at specific time or within specific time period before effect • Elimination of alternative explanations
Issues in Employing RCTs • May be differences in treated/controls when randomization employed: Small samples • Employ regression methods to control for diffs • Cross-study comparisons & replication useful • Avg effect in population may not be of most interest: Heterogeneous treatment effects • Test for sub-group differences of treatment • Mechanism for assignment to treatment may not be independent of responses • Merit-based programs & responses
Issues in Employing RCTs (cont’d) • Responses of treated should not be affected by treatment of others (“spillover” effects) • New retention program initiated; control group responds by being demoralized (motivated), leading to bias upward (downward) of the treatment effects. • Treatment non-compliance & attrition • Random assignment of vouchers in K-12; students leave programs • ITT analysis; remove non-compliers; focus on “true compliers”
Quasi/Non-Experimental Designs • Compared to RCTs, no randomization • Many quasi-experimental designs • Many are variation of pretest-posttest structure without randomization • Apply when non-experimental (“observational”) data used, which is often case in ed. research • Pros: When properly done may be more generalizable than RCTs • Cons: Internal validity • Did the “treatment” really produce the effect?
Determining “Causation” with Obs. Data • Often difficult because of non-random assignment to “treatment” • Example: Students often self-select into courses, interventions, programs…; may result in biased estimates when “naïve” methods employed to effects of treatment • Goal: Mimic desirable properties of RCTs • Solution? Employ designs/methods that account for non-random assignment; will demonstrate some of them today
Counterfactuals • Establishing what the “counterfactual” is and finding a way to create a legitimate controld group can be difficult • The best counterfactual is one’s self! • Adam & Grace time machine example • Often why you see repeated measures designs • Twins study in MN • Idea when using observations data: Find a group that looks like the treated on as many dimensions as you can
The “Naïve” Statistical Approach • Y = a + B1X + B2T + e (1) • where Y is outcome of interest; X is set of controls; T is treatment “dummy”; a & B are parameters to be estimated, with B2being parameter estimate of interest; & e is error term accounting for unmeasured or unobservable factors affecting Y. • Problem: If T & e are correlated, then estimate of B2will be biased • (1) is known as “outcome,” “structural” equation or sometimes “stage 2”
A Structural Solution • T = a + B1X + B2Z + u (2) • If assignment to T is determined non-randomly (e.g., self-selection) then the assignment to T needs to be explicitly modeled by use of this “selection” equation (often denoted “stage 1”). • Heckman (1976; 1979) won a Nobel Prize in Economics for his work on this two-stage approach to dealing with selection bias • Use info from (2) in place of T in (1)
Selection Adjustment Methods • Fixed effects (FE) models, instrumental variables (IV), propensity score matching (PSM), & regression discontinuity (RD) designs all have been used to approximate randomized controlled experiment results • All are regression-based methods • Each have strengths/weaknesses & their applicability often depends on knowledge of DGP & richness of data available
Fixed Effects Regression • Adjust for fixed, unobserved characteristics associated with selection into treatment • Example: Effect of mother’s employment status on college completion. • Her personality (unobservable) may be related to likelihood of being employed & finishing college. More “nurturing” moms may stay at home and improve outcomes of children. • Assume personality unlikely to change over time, then it’s a “fixed” characteristic.
FE Regression (cont’d) • Mom’s employment (unemployment) can be considered treatment (control); her personality is unobserved characteristic that may be related to selection into employment & educational outcomes of her kids • Excluding this source of variation from model may bias est. of effect of mom’s employment on completion • Mom’s employment may appear to have more neg. effect on child outcomes than it really does
FE Regression (cont’d) • Could compare employment status on outcomes for siblings if mom works during the infancy of one child but stays home during infancy of another • Kids have same mother so effect personality is fixed, but unmeasured. Thus, compare differences in GPAs of siblings will provide unbiased estimate of effects of maternal employment on grades in college. • See http://www.princeton.edu/~jcurrie/publications/When_do_we_know.pdf.
Instrumental Variable (IV) Regression • Used by economists to estimate S & D curves; counteract bias from measurement error; adjust for selection bias issues • Est. effect of Algebra II on college completion: Observed relationship between grad. college & HS course is likely to be biased because it reflects omitted factors related to both variables (e.g., ability) • But what best controls for ability? Do we have it in data?
IV Example • Two HS students, Grace & Adam, Grace took Algebra II, Adam took Algebra I. • Grace straight As, went to HS counselor who rec. Alg II; Adam chose Alg I on his own. • Likely have different abilities & motivation. • Failure to account for these diffs results in making comparisons between groups that are not comparable; may biase results • Alt: Factors driving course choice may influence subsequent educational outcomes
What to Do? Employ IV Method • Regression-based approach • Uses a variable (“instrument”) to minimize bias due to endogeneity by identifying a source of exogenous variation that will help to determine impact of a treatment (e.g., AlgII) on outcome (e.g., completion). • Instruments: Local unemployment rate & state labor laws for minors (age in 10th grade).
Conditions for Effective IV Y = a + B1X + B2T + e (1) T = a + L1X + L2Z + u (2) • Exogeneitycondition: The instrument (Z) must be correlated with Y only through T & be uncorrelated with omitted variable (e). • Or, the only way the instrument affects the outcome is (conditionally) through the treatment • Relevance condition: The instrument must be correlated with the treatment (T; e.g., AlgII).
How to Estimate the Model? • This system of equations can be estimated in two stages using TSLS or a “control function” (CF) approach or simultaneously using LIML or GMM (or some others). • CF-more flexibility in each stage than “canned” commands; se’s wrong so we bootstrap • LIML-robust to weak IV & less biased estimates than some other approaches • GMM-use when independence assumption violated (clustering is present)
IV Modeling of Algebra II Effects • Research questions: Does completing Algebra II (or higher) in HS influence the prep. for, access to, & completion of college? • We then began to think about labor market outcomes & so began to explore not only the college readiness part but also the career readiness piece. • Had state SUR DB and national datasets to employ to study these questions
Framing the Research • Human capital: Application of microeconomic concepts to the study of choices/behavior of agents • Establishes conceptual relationship among schooling, individual productivity, & labor-market returns (Becker, 1965, 1993; Cohn & Geske, 1990; Mincer, 1958; Schultz, 1961) • Completing advanced coursework in HS may directly improve labor-market productivity. • May also indirectly affect productivity by increasing chances of being admitted to college/earning a degree (Rose & Betts, 2001).
Framing the Research (cont’d) • Signaling as complement to HC theory • Completing advanced HS coursework may not directly impact productivity/returns. • Completion of courses just “signals” to admissions officers &/or employers that person is capable & likely to succeed/be productive (Spence, 1973, 2002). • Signals valuable IF indicator of quality & quality costly to determine (Donath, 2007). • More able acquire education at lower cost, so use these signals more than counterparts
Research to Date • Large body of research, esp. in K-12 • Effect of HS math courses on: HS achievement, college entrance exams, HS graduation (Attewell & Domina, 2008; Rock & Pollack, 1995); • College attendance, GPA, & persistence (Adelman, 2006; Altonji, 1995; St. John & Chung, 2006); • Degree attainment (Rose & Betts, 2001) & labor market outcomes (Goodman, 2008; Levine & Zimmerman, 1995, Rose & Betts, 2004) • But most did not account for non-random assignment into HS courses
Data • Florida Department of Education: • Six cohorts (7th – 12th graders) spanning 1995-96 thru 2005-06 academic years • Very detailed SUR info such as student demographics, HS & college performance, experiences, degree attainment, & labor market outcomes. • NELS & ELS: Ditto above, but nationally representative & can compare trends over earlier (NELS) & recent (ELS) periods
Model Specification • College attendance: no college, 2-yr, 4-yr • Degree attainment: no degree, AA, or BA • Variable of interest: completed Algebra II OR HIGHER in HS or not • Covariates: demographics (gender, race, language, free-reduced lunch); ability (SAT scores, HS GPA, AP/IB crs.); college GPA & financial aid (Degree attainment model); cohort FEs & clustering by district
Model Specification (cont’d) • Multinomial regressions: “naïve model” • HS course choices are included as regressor. Likely endogenously-related to post HS ed. outcomes. If so, biased results. • Correct for endogeneity: IV model • Use unemployment rate in county of residence when in 9th grade (10th in NELS/ELS studies). Also use age IV. • Describe CF approach used.
Empirical Model Specification • Assumption 1: HS students allocate time between school, work, leisure (period 1) & between work/leisure after HS (period 2). • Assumption 2: Students who allocate more time to schooling while in HS take more difficult courses. • Time allocation in HS determined by preferences (inc. discount rate) & exogenous factors like local labor market conditions. Latter affect prep (directly) & attend./completion (indirectly). • e.g., in weak (strong) local labor market students may allocate less (more) time to work & more (less) time to study by increasing (decreasing) the quantity or difficulty of the courses that they take in HS.
Instrument Validity • The “two assumption” approach to the assessment of IV validity has long been the standard • Any IV must be strongly related to selection (Alg II take-up) but not correlated with any outcomes • But Angrist, Imbens, and Rubin (1996) suggest a 5 assumption approach to assess IV validity, which we use. • Stable unit treatment value assumption (SUTVA): A student’s Algebra II completion does not influence other students’ college outcomes (e.g., no spillover effects). • Random assignment: dist. of IV across individuals is comparable to under random assignment. Students have equal prob. of having any level of the IV. • Unlikely to move to get lower unemployment
Instrument Validity (cont’d) • Exclusion restriction: IV affects DV (enroll/grad) only through relationship with endogenous regressor (Alg II) • OverIDtest; controls for UI rate in 12thgrade; & relative stability of UI rates over obs. period as evidence • Non-zero avg. causal effect of IV on treatment: Is IV strongly related to endogenous regressor? • Numerous tests indicate no “weak IV” problem in FL data/more complex in NELS/ELS data.
Instrument Validity (cont’d) • Monotonicity: IV has a unidirectional effect on the endogenous regressor. That is, increases in unemployment rate never result in decreases in math course completion • There may be students who choose not to take higher level math courses (e.g., Alg II) when unemployment rates are high because they feel the need to spend more time searching for work (so called “defiers”). • Data analysis suggests this set of students is likely small fraction of sample. Presence of such “defiers” simply places an upward bound on our estimate of the treatment effect (for an explanation see Angrist & Pischke, 2009).
Attendance Margin Results • Naïve results: pr(2-yr attendance) is 2% lower if Alg II completed, whereas IV results indicate Alg II increases chance of attending 2-yr by 28%. • Pr(4-yr college attendance) similar for naïve/IV models (+20.6% and +20.3%) but naïve results are statistically significant & IV results are not • Naïve model badly underestimates beneficial effect of Alg II on pr(not going to college). -48 vs-18% • For similar students, completing AlgII dramatically increases college attendance in general and is related to type of PSE attended
Effects on Graduation • Naïve: Alg II increases pr(AA) & pr(bach) degree attainment by 2 & 6 percent, respectively. However, for students who take Alg II & then attend college, the pr(not receiving PSE degree is lowered by about 8 percent. • IV results tell different story: Positive relations between Alg II & 2- and 4-year degree attainment found in naïve model not sign. when student self-selection controlled. • In addition, there is no statistically significant effect on pr(not receiving PSE degree) when using IV methods. • Naïve results indicate beneficial effects of completing Alg II on degree attainment, but these effects are not apparent when we correct for self-selection.
Discussion • In FL data, naïve models often under-estimate effect of Alg II (attendance margin), but over-estimate effect on the degree margin. Similar story in NELS, but not ELS. • Correcting for endogeneity is essential in identifying effects of advanced math curriculum in HS. • IV approach seems to produce results that are LESS BIASED than naïve OLS approach • Will need to wait until next wave of ELS to examine some outcomes we studied using NELS that were not available yet in the ELS data. • Has there been a structural change in effect of Alg II over time? (e.g., due to policy changes)
Discussion (cont’d) • Evidence that HS course completion does not affect college & career outcomes in the same way. • Some evidence that Alg II may equip students with the skills required to excel later in college careers. • Seems safest to view this pattern as evidence of differences in Alg II effects between NELS & ELS, rather than proof that the benefits of advanced mathematics course-taking grow larger as time passes. • Evidence that initial wages are less for those with Alg II or greater, but that they experience more rapid growth in earnings. More work needs to be done on career margins.