630 likes | 822 Views
IES Workshop on Evaluating State and District Level Interventions. Mark W. Lipsey Director, Center for Evaluation Research and Methodology Vanderbilt University David Holdzkom Assistant Superintendent for Evaluation and Research Wake County Public School System, North Carolina.
E N D
IES Workshop on Evaluating State and District Level Interventions Mark W. Lipsey Director, Center for Evaluation Research and Methodology Vanderbilt University David Holdzkom Assistant Superintendent for Evaluation and Research Wake County Public School System, North Carolina April 24, 2008 Washington, DC
Purpose To help schools, districts, and states design and implement rigorous evaluations of the effects of promising practices, programs, and policies on educational outcomes.
Why encourage locally initiated impact evaluation? • Many interventions are not effective; users and interested others need to know. • The interventions most relevant to improving outcomes are those that schools and districts believe are promising and feasible. • IES has funding to support research initiated by schools, districts, and states.
What kinds of interventions might be evaluated? • Practices, e.g., one-on-one tutoring, educational software, acceleration of high ability students, cooperative learning. • Programs, e.g., Reading Recovery, Ladders to Literacy, Cognitive Tutor algebra, Saxon Math, Caring School Community (character education). • Policies, e.g., reduced class size, pre-K, alternative high schools, all year calendar.
Key Issues in Designing Impact Evaluations for Education Interventions
Logic model: 1. Specifying the problem the intervention addresses Nature of the need: • What and for whom (e.g., kindergarten students who aren’t ready for school). • Why (e.g., poor pre-literacy skills, inappropriate school behavior). • Rationale/evidence supporting the intervention target (e.g., at entry K students need to be ready to learn or they will begin to fall behind; research shows school readiness can be enhanced for at-risk 4 year olds).
Logic model: 2. Specifying the planned intervention What the intervention does that addresses the need: • Content: What the students should know or be able to do; why this meets the need. • Pedagogy: Instructional techniques and methods to be used; why appropriate. • Delivery system: How the intervention will arrange to deliver the instruction. • The key factors or core ingredients most essential and distinctive to the intervention.
Logic model: 3. Specifying the theory of change Target Population Intervention Proximal Outcomes Distal Outcomes Positive attitudes to school 4 year old at-risk children Pre-K with literacy curriculum Improved pre-literacy skills Increased school readiness Greater learning gains in K Learn appropriate school behavior
Mapping variables onto the intervention theory: Sample characteristics Positive attitudes to school 4 year old at-risk children Pre-K with literacy curriculum Improved pre-literacy skills Increased school readiness Greater learning gains in K Learn appropriate school behavior Sample descriptors: * Basic demographics * Diagnostic, need/eligibility identification * Baseline performance Potential moderators: * Setting, context * Personal and family characteristics * Prior experience
Mapping variables onto the intervention theory: Intervention characteristics Positive attitudes to school 4 year old at-risk children Pre-K with literacy curriculum Improved pre-literacy skills Increased school readiness Greater learning gains in K Learn appropriate school behavior Independent variable: * T vs. C comparison conditions Generic fidelity: * T and C exposure to the generic aspects of the intervention (type, amount, quality) Specific fidelity: * T and C (?) exposure to distinctive aspects of the intervention (type, amount, quality) Potential moderators: * Characteristics of personnel * Intervention setting, context e.g., class size
Mapping variables onto the intervention theory: Intervention outcomes Positive attitudes to school 4 year old at-risk children Pre-K with literacy curriculum Improved pre-literacy skills Increased school readiness Greater learning gains in K Exposed to intervention Learn appropriate school behavior Other dependent variables: * Side effects– possible unplanned positive or negative outcomes. * Mediators– DVs on causal pathways from intervention to other DVs. Focal dependent variables: * Pretests (pre-intervention). * Posttests (at end of intervention) * Follow-ups (lagged after end of intervention).
Research questions: Relationships of (possible) interest • Intervention effects: Causal relationship between intervention and outcomes. • Duration of effects post-intervention. • Moderator relationships: Differential intervention effects for different subgroups. • Mediator relationships: Stepwise causal relationship with effects on a proximal outcome causing effects on a distal outcome.
Research designs to discuss • Two strong ones 1. Randomized experiment 2. Regression-discontinuity • Two weak ones 3. Nonrandomized comparison groups with statistical controls 4. Comparative interrupted time series
High pre Med high pre Low pre 1. Randomized experiment Receive experimental intervention Outcome Research sample of students, teachers, classrooms, schools, etc. Randomly assigned Intervention effect Med low pre Do not receive experimental intervention Outcome Random assignment to conditions Pretest blocking Sample Posttest
Circumstances conducive to randomized experiments • More demand than supply for program– allocate scarce resource by lottery. • New program that can be phased in– wait list control had delayed start. • Pull-out or add-on program for selected students– randomly select from among those eligible. • Volunteers willing to opt in for a chance to receive the program.
Example: Junior high algebra curriculum • In 2000-01 the Moore Oklahoma Independent School District conducted a study of the effectiveness of the Cognitive Tutor Algebra I program on students in their junior high school system. • Students in 5 junior high schools were randomly assigned to either the Cognitive Tutor Algebra I course or the ‘practical as usual’ algebra courses. Cognitive Tutor teachers received the curriculum materials and 4 days of training. • Outcome measures included the ETS Algebra I end-of-course exam, course grades, and a survey of student attitudes towards mathematics.
Example: Alternative high school for students at risk of dropping out • Horizon High School in Las Vegas identified 9th and 10th grade students behind grade level and at risk of dropping out. • A random sample of these students was assigned to attend an alternative high school that featured a focus on cooperative learning, small group instruction, and support services. • Outcomes were compared for the alternative and regular high schools on dropout rates, self-esteem, drug use, and arrest rates.
Example: Remedial reading programs for elementary students • The Allegheny Intermediate Unit (AIU), which serves 42 suburban school districts in Allegheny County, Pennsylvania, randomly assigned 50 schools to one of four commercially available remedial reading interventions. • Within each school struggling readers in grades 3 and 5 were identified and randomly assigned to instruction as usual or the remedial reading program designated for that school. • In each program, 3 students met with a trained teacher one hour/day for 20 weeks. • Measures of reading skill were administered at the beginning and end of the school year for program and control students.
2. Regression-discontinuity (aka the cutting-point design) • When well-executed, its ability to provide an unbiased estimate of the intervention effect is strong– comparable to a randomized experiment. • It is adaptable to many circumstances where it may be difficult to apply a randomized design.
Consider first a posttest on pretest regression for a randomized experiment with no effect Posttest (Y) T C Mean Y Mean S Pretest (S) Corresponding regression equation (T: 1=treatment, 0=control)
Pretest-posttest randomized experiment, now with an intervention effect T Posttest (Y) T Mean Y C Δ C Mean Y T & C Mean S Pretest (S)
Consider now the same regression with no effect but with a cutting point applied Posttest (Y) C C Mean Y T Mean Y T Cutting Point Selection Variable (S)
Regression discontinuity scatterplot (null case) Posttest (Y) T C Cutting Point Selection Variable (S)
Now add an intervention effect Posttest (Y) C Δ T Cutting Point Selection Variable (S)
Regression discontinuity scatterplot with effect Posttest (Y) T C Cutting Point Selection Variable (S)
The effect estimated by R-D is the same as that from the randomized experiment Posttest (Y) C T Δ Cutting Point Selection Variable (S)
The selection variable for R-D • A continuous quantitative variable measured on every candidate for assignment to T or C who will participate in the study. • Assignment to T or C strictly on the basis of the score obtained and the predetermined cutting point. • Does not have to correlate highly with the outcome variable. • Can be tailored to represent an appropriate basis for the assignment decision in the setting.
Special issues with the R-D design • Correctly fitting the functional form– possibility that it is not linear • curvilinear functions • interaction with the cutting point. • Statistical power • requires about 3 times the sample size of a comparable randomized experiment • covariates correlated with the outcome but not the selection variable are helpful.
Circumstances conducive to the regression-discontinuity design • The situation involves a selection from some larger group of who will, or should, receive the intervention and who will not. • The basis for selection is or can be made explicit and systematic enough to be captured in a quantitative rating or ranking. • The allocation of the intervention can be made strictly on the basis of the selection score and cutting point in a high proportion of cases. Exceptions can be identified in advance and exempted from the scheme.
Example: Effects of universal pre-k in Tulsa, Oklahoma • Eligibility for pre-k determined strictly on the basis of age– cutoff by birthday. • Overall sample of 1,567 children just beginning pre-k plus 1,461 children just beginning kindergarten who had been in pre-k the previous year. • WJ Letter-Word, Spelling, and Applied Problems as outcome variables.
Entry into pre-k selected by birthday WJ test score ? C No Pre-K yet; tested at beginning of pre-K year T Completed pre-K; tested at beginning of K Born after September 1 Born before September 1 Age
Samples and testing Year 1 Year 2 pre-k kindergarten First cohort pre-k Second cohort Administer WJ tests
3. Nonrandomized comparison groups with statistical controls • Statistical controls: Analysis of covariance and multiple regression • Matching on the control variables • Propensity scores derived from the control variables.
Nonequivalent comparison analog to the randomized experiment Receive experimental intervention Outcome Population of students, teachers, classrooms, schools, etc. Selected through some nonrandom more-or-less natural process Intervention effect (??) Do not receive experimental intervention Outcome
Issues for obtaining good intervention effect estimates from nonrandomized comparison groups • The fundamental problem: selection bias • Knowing/measuring the variables necessary and sufficient to statistically control for the selection bias • characteristics related to the outcome on which the groups differ • Using an analysis model that properly adjusts for the selection bias, given appropriate control variables
Nonequivalent comparison groups: Pretest/covariate and posttest means Posttest (Y) T Diff in post means C Diff in pretest/cov means Pretest/Covariate(s) (X)
Nonequivalent comparison groups: Covariate-adjusted treatment effect estimate Posttest (Y) T C Δ Pretest/Covariate(s) (X)
Covariate-adjusted treatment effect estimate with a relevant covariate left out Posttest (Y) T C Δ Pretest/Covariate(s) (X)
Using control variables via matching • Groupwise matching: select control comparison to be groupwise similar to intervention group, e.g., schools with similar demographics, geography, etc. Generally a good idea. • Individual matching: select individuals from the potential control pool that match intervention individuals on one or more observed characteristics. May not be a good idea.
Potential problems with individual level matching • Basic problem with nonequivalent designs– need to match on all relevant variables to obtain a good estimate of the intervention effect. • If match on too few variables, may omit some that are important to control. • If try to match on too many variables, the sample will be restricted to the cases that can be matched; may be overly narrow. • If must select disproportionately from one tail of the treatment distribution and the other tail of the control distribution, may have regression to the mean artifact.
Regression to the mean: Matching on the pretest T C Area where matches can be found
Propensity scores as control variables • The propensity score is the probability of being in the intervention group instead of the comparison group. • It is estimated (“predicted”) from data on the characteristics of the individuals already in each group, typically using logistic regression. • It thus combines all the control variables into a single variable optimized to differentiate the intervention sample from the control sample.
One option: Use the propensity score to create matched groups Propensity Score Quintiles Treatment Group Matches Control Group
Another option: Use the propensity scoreas a covariate in ANCOVA or MR Posttest (Y) T C Δ Propensity score (P)
Circumstances appropriate for the nonequivalent comparison design • A stronger design is truly not feasible. • A sample of relatively comparable units not receiving the intervention is available. • A full account can be given of the differences between the groups potentially related to the outcomes of interest. • Data on those differences can be obtained and used for statistical control.
Example: Effects of a professional development program for teachers • In the Montgomery County Public Schools, MD, some 3d grade teachers had received the Studying Skillful Teaching training, some had not. • The reading and math achievement test scores for students of teachers with and without training were compared. • Analysis of covariance was used to test for differences in student outcomes with a propensity score control variable and covariates representing teacher credentials, student pretest, reduced/free lunch status, ethnicity, and special ed or ELL service.