Challenges arising from the analysis of randomized trials in education

Challenges arising from the analysis of randomized trials in education Professor Steve Higgins s.e.higgins@durham.ac.uk School of Education Durham University Randomised Controlled Trials in the Social Sciences 7 - 9 September 2016, University of York

Overview • The work of the Education Endowment Foundation • The EEF Archive Analysis project (with Adetayo Kasim and ZhiMin Xiao) • Trials findings and the Toolkit • Some implications for research, policy and practice

The work of the Education Endowment Foundation • Established in 2011 with £125M endowment (aim £200M total) • Independent grant-making charity dedicated to breaking the link between family income and educational achievement • Identify, test and scale successful approaches and interventions • Meta-analytic evidence database to identify promise (from the ‘Teaching and Learning Toolkit’) – “What Works Centre for improving education outcomes for school-aged children” • Projects commissioned from a range of organisations • Independently evaluated by a team from a ‘panel’ of 25 • Results fed back into Toolkit (with other similar project findings and new meta-analyses)

Projects and campaigns • Projects • Pilot • Efficacy • Effectiveness • Campaigns • Making Best Use of Teaching Assistants (£5M) • North East Literacy Campaign (with Northern Rock Foundation - £10M)

EEF since 2011 £75.4m funding awarded to date 64% of school leaders say they have used the toolkit 7,500schools participating in projects 750,000 pupils currently involved in EEF projects 60independent evaluation reports published 106 RCTs 127projects funded to date

The EEF Archive Analysis project • Independent evaluators submit project data to FFT (given appropriate permissions) • Data re-matched with National Pupil Database (NPD) data and updated as new data becomes available • Released to archive team at Durham twice a year • Aims to • undertake methodological exploration of trials data • provide comparative analyses of impact across projects • analyse long term/ follow-up impact as NPD data becomes available • develop an R package for educational trials analysis

Principles and caveats • Principles • Exploratory analysis to inform EEF’s evaluation strategy • Help to explain variation in trial impact • Independent evaluator’s peer-reviewed published estimate is always the official EEF finding for impact on attainment • Caveats • Can’t always match N (4/32 projects greater than 10% difference) • NPD data removed and re-matched • Some raw scores transformed • Gain score distribution different from post-test

R package - eefAnalytics • To support analysis of randomised trials in education (individual, cluster and multi-site) • difference-in-means • ordinary least squares • multi-level models (frequentist & Bayesian) • Permutation p-values, bootstrapped confidence intervals • Complier Average Causal Effect (CACE) • Cumulative quantile analysis • eefAnalytics: in development – available on CRAN in November – available to try out – contact us!

Initial findings • 17 EEF projects • Four analytic models • Findings • Results converge in larger, unproblematic trials • Point estimates and estimates of precision vary when trials are problematic (e.g. testing issues, randomisation), when design and analysis not matched; with different co-variates added; when outcomes are different or transformed (e.g. z-scores) • Results tend to diverge if ICC ≥ 0.2 and there are few clusters/schools • MLM total variance ‘most conservative’ model (wider CI) • Bayesian estimates identical but more precise (narrower CI)

Convergence Pre-test imbalance

Clustering Point estimates similar, CIs vary

Divergence Pre-test imbalance Post-ANCOVA models aim to correct for this Evaluator gain estimate higher

Analytic heterogeneity Stem and leaf plot of differences between evaluator and archive MLM (total variance) models 32 projects 64 outcomes Majority 0.05 or less

Archive Analysis development • R package release - November 2016 • Post/gain paper (revisiting Lord’s paradox) • Local Influence Index (a binomial index for evaluating intervention benefit) • CACE (to help interpret ITT analysis) • Follow-up data in NPD • Reflections • With ITT approach, sample sizes predicated on minimum necessary to detect a probably overestimated effect size, and MLM total variance are we setting ourselves up for disappointment?

Sutton Trust/EEF Teaching & Learning Toolkit • Best ‘buys’ on average from research • Key messages for Pupil Premium spending in schools • Currently used by over 60% of school leaders http://educationendowmentfoundation.org.uk/toolkit

Toolkit overview and aims • Cost effectiveness estimates of a range of educational approaches • Based on average effects from meta-analyses and cost estimates of additional outlay to put in place • Evidence robustness estimates as ‘padlocks’ • To inform professional decision-making about school spending • To create a framework for evidence-use • To provide a structure to improve evidence utility

Aggregating inferences

Inferences from findings across meta-analyses • Requires assumption of variation, bias and inaccuracy randomly distributed across included studies • Probably unwarranted, but the best we’ve got • Starting point to improve precision and predictability • Better than saying nothing?

Toolkit as a predictor?

So, “what works”or “what’s worked”? • Internal validity necessary for external – did it actually work there? • Causal ‘black-box’ makes replicability challenging • Defining ‘approaches’ or ‘interventions’ – unit of description • Problematic ‘populations’ – what inference for whom? • Importance of knowing what hasn’t worked (on average) • In education a null result (i.e. may not be different from 0) = as good as counterfactual • Mean or range – “on average” or better estimates of probability? • Generalisability or predictability? • Small-scale research findings may optimise rather than typify ‘in the wild’ application or scale-up

A distributional view Distribution of effects in Kluger & de Nisi (1996) from Dylan Wiliam (https://twitter.com/dylanwiliam/status/608610040086409216) Visualised distribution of Toolkit effects

Current Toolkit developments • Formalising methodology (translating/simplifying existing models) • Cochrane/ Campbell/ EPPI • PRISMA for reviews • CONSORT for trials • GRADE Guidelines for evidence • New comparable and updatable meta-analyses for each strand • Identifying factors affecting current effect size estimates • Design (sample size, randomisation, clustering) • Measurement issues (outcome complexity, outcome alignment) • Intervention (duration, intensity) • International partnerships • Australia – Australian version of Toolkit, 3 RCTs commissioned • Chile – under development

Implications for research, policy and practice • Research • Further discussion about optimal analysis approaches • Statistical Analysis Plans • May explain some of the heterogeneity in meta-analyses (limits to precision) • Importance of replication and meta-analysis • More methodological exploration! • Policy and Practice • Need to communicate uncertainty estimates carefully as often dependent on analytic approach • “What Works?” or “What’s Worked”? • Communicate range as well as mean?

References • Higgins, S. (2016) Meta-synthesis and comparative meta-analysis of education research findings: some risks and benefits Review of Education 4.1: 31–53. http://dx.doi.org/10.1002/rev3.3067 • Higgins, S. & Katsipataki, M. (2016) Communicating comparative findings from meta-analysis in educational research: some examples and suggestions International Journal of Research & Method in Education 39.3 pp 237-254 http://dx.doi.org/10.1080/1743727X.2016.1166486 • Kasim, A., Xiao, Z. & Higgins, S. (in preparation) eefAnalytics: A Package for Trial Data Analysis The R Journal • Xiao Z., Kasim, A., Higgins, S.E. (2016) Same Difference? Understanding Variation in the Estimation of Effect Sizes from Educational Trials International Journal of Educational Research 77: 1-14 http://dx.doi.org/10.1016/j.ijer.2016.02.001

View of Durham City from the train station

Challenges arising from the analysis of randomized trials in education

Challenges arising from the analysis of randomized trials in education

Presentation Transcript

Randomized controlled trials

Randomized Controlled Trials

Sensitivity Analysis of Randomized Trials with Missing Data

Randomized Controlled Trials

Group-Randomized Trials

RANDOMIZED TRIALS

Practice Guidelines from Randomized Clinical Trials

Randomized Control Trials

Survival Analysis for Randomized Clinical Trials

Survival Analysis for Randomized Clinical Trials

Sensitivity Analysis of Randomized Trials with Missing Data

Analysis Issues in Assessing Efficacy in Randomized Clinical Trials

Implementation of Randomized Trials

Liability Arising from Clinical Trials

RANDOMIZED TRIALS

Survival Analysis for Randomized Clinical Trials

Statistical Issues in Randomized Trials

Randomized Trials

Monitoring Randomized Trials

Interim Monitoring in Randomized Trials

Randomized Control Trials

Tax Challenges arising from Digitalisation of Economy