The EPPI-Centre is part of the Social Science Research Unit at the Institute of Education, University of London

Does it work or not and should we be doing it here? What conclusions can we make from systematic review evidence and how should we make them? Mark Newman Randomised Controlled Trials in the Social Sciences: Methods and Synthesis 1 October 2008 University of York The EPPI-Centre is part of the Social Science Research Unit at the Institute of Education, University of London

Evidence for Policy and Practice Information and Co-ordinating (EPPI-Centre) Conducting reviews since 1993 In health promotion, education, social care, crime, transport, work and pensions Support and tools for review groups: Education (25 groups, 70+ reviews), criminology, employment, speech and language, social care EPPI-Reviewer software Methodological work, e.g. Methods for Research Synthesis Node of the ESRC National Centre for Research Methods Formal links with Cochrane and Campbell Collaborations On-line libraries of research evidence Short courses and Masters course in evidence for public policy and practice

What do decision makers want from effectiveness reviews? • Decision makers want to know ‘does it work?’, ‘do results mean that we should do it or not?’ • To produce: • Guidance • Regulation • Recommendations • Make investment decisions • Priorities • ‘Do something’

Systematic reviews potential and difficulties • It is argued that high quality systematic reviews can provide a ‘better’ i.e. more valid, more rigorous answer to questions of effectiveness HOWEVER • In a social policy context – defining ‘it’ can be difficult • Primary research evaluations (and thus reviews) compare the effect of ‘it’ to something else in a particular context • Effectiveness reviews contain variable number of studies of varying quality from different contexts • ‘result’ = summary estimate(s) of effect size with contextual information

From effect size to did it work & should we do it? HOW?

Existing guidance • None that I could find in Campbell Collaboration Guidance • Cochrane Handbook – • Discusses at length issues in drawing implications conclusions and what should be considered but quotes Friedman • ‘A leap of faith is always required when applying any study findings to the population at large’ or to a specific person. ’In making that jump, one must always strike a balance between making justifiable broad generalizations and being too conservative in one’s conclusions.’ (Friedman 1985) • Review authors should not assume circumstances in which findings might be applied are similar to their own • Specifically does not say how much evidence of what quality might be enough to say ‘what works’ or ‘is effective’

Campbell reviews conclusions • Goerlich, Ritter, Turner, Smedslund, Lum – Insufficient good quality evidence (RCT’s) to make policy recommendations • Nye – ‘ uncovered compelling support for the use of a parent involvement programme as a viable supplementary intervention to improve children’s academic performance in school (? based on positive pooled effect size) • Wilson – The overall effect size of 0.26 indicates that selected and indicated social information processing programmes are effective for reducing aggressive and disruptive behaviour

Does how we get our conclusions matter? • The Multidimensional Treatment Foster Care (MTFC) Example • - Does MTFC reduce re-offending ? • Aos et al – Inc. MTFC as part of cost-effectiveness review of interventions for offenders • Macdonald & Turner - Campbell review of MTFC • Newman et al – Inc MTFC as part of review of interventions for juvenile offenders • High quality (RCT) Evidence base is all from Oregon MTFC – Multiple publications – 2 RCTs (1 male,1 female) about 76 subjects in intervention in total that measure offending outcomes

Does MTFC work/ should we do it conclusions in the reviews • Aos – MTFC included in ‘what works’ category could be implemented in Washington State with x$ cost benefits • MacDonald & Turner (based on 5 studies range of outcomes)–, ‘the evidence suggests that MTFC is a promising social intervention’. (But only 2 studies had offending outcome so does this still apply ?) • Newman et al - MTFC in category ‘Consistent evidence of reducing re-offending’– Effect size applied to UK offending rates. • Newman only study where explicit framework used

Possible policy interpretations • Aos et al – MTFC works and will save money if we do it here • Macdonald & Turner - MTFC may be worth trying • Newman et al – MTFC works, reducing re-offending rate by about 2% if results apply to UK.

Systems for making conclusions within social science discipline/ policy areas • Criminal Justice • Maryland Scientific Methods Scale (MSMS) • Adapted MSMS • Education • WWC / BEE • Health • GRADE

Study quality Single group post-test only - Correlation Single group pre-post test Non equivalent control group – pre-post test Non equivalent control group – pre-post test- Post hoc controls RCT Interpretation framework What works/ does not work At least two level 3-5 evaluations, stat. sig effects + preponderance of other evidence positive What is promising - At least one level 3-5 evaluations, stat. sig effects + preponderance of other evidence positive What is unknown MSMS Scale (UK Home Office)

Study quality Single group post-test only - Correlation Single group pre-post test Non equivalent control group – pre-post test Non equivalent control group – pre-post test- Post hoc controls RCT Interpretation framework What works/ does not work At least two level 3-5 evaluations, stat. sig effects + preponderance of other evidence positive What is promising - At least one level 3-5 evaluations, stat. sig effects + preponderance of other evidence positive What is unknown Systematic approaches- MSMS Scale(a vote counting scale?)

Example application of the MSMS scale: Teen courts • 1 RCT (SMS 5), others single group pre –post tests (SMS 3)

MSMS scale interpretation: teen courts 1 RCT (SMS 5), others single group pre –post tests (SMS 3) SMS – what works - At least two level 3-5 evaluations, stat. sig effects + preponderance of other evidence positive

MSMS based conclusion – ‘Teen courts’ work 1 RCT (SMS 5), others single group pre –post tests (SMS 3) SMS – what works - At least two level 3-5 evaluations, stat. sig effects + preponderance of other evidence positive SMS – Conclusion ‘Teen courts’ work

BUT I2 = 83% 1 RCT (SMS 5), others single group pre –post tests (SMS 3) Other systematic reviews say not evidence to support teen courts

Adapted MSMS (by us) Consistent evidence of reducing re-offending (1+studies) • Scores 4-5 on the SMS scale & Scores medium or high on the WoE AND • where the effect size (single or pooled summary) shows a positive effect size [favouring the intervention] and where the lower 95% confidence interval does not cross the ‘line of no effect’ Potential effects (positive or negative) limited evidence (1+ studies not mrct) • Score 4-5 on the SMS scale and medium or high quality on the WoE AND • If there is more than 1 study the direction of effect is inconsistent AND/or • the effect size(s) (pooled summary and/or individual) does not exclude ‘no difference’. Insufficient evidence • No studies that are level 4/5 on the SMS scale and medium or high WoE Notes • If there is only one study it should be Multi-centre Randomised Controlled Experiment • Sufficient homogeneity

What Works Clearinghouse:Extent of Evidence Categorization • Studies meet WWC criteria • The extent of evidence is moderate/large: • o The domain includes more than one study; AND • o The domain includes more than one school; AND • o The domain findings are based on a total sample size of at least 350 students OR, assuming 25 students in a class, a total of at least 14 classrooms across studies. • The extent of evidence is small: • o The domain includes only one study; OR • o The domain includes only one school; OR = • o The domain findings are based on a total sample size of less than 350 students AND, assuming 25 students in a class, a total of less than 14 classrooms across studies.

Best Evidence Encyclopaedia • Strong Evidence of Effectiveness • At least two studies, one of which is a large randomized or randomized quasiexperimental study, or multiple smaller studies, with a median effect size of at least +0.20. • A large study is defined as one in which at least ten classes or schools, or 250 students, were assigned to treatments. Smaller studies are counted as equivalent to a large study if their collective sample sizes is at least 250 students. • Moderate Evidence of Effectiveness • At least two qualifying studies or multiple smaller studies with a collective sample size of • 500 students, with a median effect size of at least +0.20. • Limited Evidence of Effectiveness • At least one qualifying study of any design with an effect size of at least +0.10. • Insufficient Evidence of Effectiveness • One or more qualifying study of any design with a median effect size less than +0.10. • No Qualifying Studies

GRADE system (1) • Is about making ‘guidelines’ not findings from systematic reviews • Determinants of strength of recommendation (8x2 matrix) • Balance between desirable and undesirable effects • Quality of evidence (4 levels) • Values and preferences • Costs (resource allocation)

The implications of a strong recommendation are: For patients—most people in your situation would want the recommended course of action and only a small proportion would not; request discussion if the intervention is not offered For clinicians—most patients should receive the recommended course of action For policy makers—the recommendation can be adopted as a policy in most situations The implications of a weak recommendation are: For patients—most people in your situation would want the recommended course of action, but many would not For clinicians—you should recognise that different choices will be appropriate for different patients and that you must help each patient to arrive at a management decision consistent with her or his values and preferences For policy makers—policy making will require substantial debate and involvement of many stakeholders. Systematic approach- GRADE system (2) what a recommendation means

Still not a consistent answer to how many? • How many studies of what quality needed to make ‘strong recommendation’ ‘strong evidence of effect’ • ‘One’’two’‘many’ ? • How many participants/ sites, 350, 500,? • Should we do more research on SRS to find an empirical answer?

Asking too much of Systematic Reviews? • ‘A leap of faith is always required when applying any study findings to the population at large’ or to a specific person. ’In making that jump, one must always strike a balance between making justifiable broad generalizations and being too conservative in one’s conclusions.’ (Friedman 1985)

Weiss – evaluation as ‘enlightenment’ • What we have come to realize is that policy does not take shape at a single time and place with a limited cast of characters. In democracies, many people have a hand in defining the issues, identifying the perspective from which they should be addressed, proffering potential policy solutions and pressing for particular policy responses. . . . It turns out that what evaluation can do, and sometimes really does do, is to contribute to what I have called ‘enlightenment’. Enlightenment is the percolation of new information, ideas and perspectives into the arenas in which decisions are made. (Weiss, 1999: 471)

Evidence from research Professional Experience Evidence based decision User preferences Available resources

The role of the ‘expert’ systematic reviewer • ‘Experts’ are not experts at policy but Expert at ‘discovering and making known the facts on which policy depends’ • “The essential need in other words is the improvement of the methods and conditions of debate, discussion and persuasion – that is the problem of the public. This improvement ….. Depends essentially upon freeing and perfecting the process of inquiry and dissemination of their conclusions” Dewey J (1927:204)

So what to do ? • Commissioners • be careful what you ask for – you might get it • The ‘warrant’ may however not be clear • Recognize distinction between producing a systematic review & producing ‘guidelines’ • Reviewers • Be clear, transparent and systematic about the process used to make conclusions • Use language / labels carefully • Make ‘recommendations’ with ‘experts’, ‘policymakers’, ‘practitioners’ ‘users’ as distinct part of review process

References • Dewey J (1927) The public and its problems. New York. H.Holt (Reprint Swallow Press/ Ohio University Press • Guyatt G, Oxman A, Gunn E, Kunz R, Falck-Ytter Y, Alonso-Coello P, Schunemann H (2008) GRADE: an emerging consensus on rating quality evidence and strength of recommendations BMJ; 336; 924-926 • Guyatt G, Oxman A, Gunn E, Kunz R, Falck-Ytter Y, Alonso-Coello P, Schunemann H (2008) Going from evidence to recommendations BMJ; 336; 1049-1051 • Newman M, Vigurs C, Perry AE, Hallam G, Schertler EPV, Johnson M, Wall R (2008 Forthcoming)A systematic review of selected interventions to reduce juvenile re-offending. London. Ministry of Justice. • Aos S, Miller M, Drake E (2006) Evidence–based public policy options to reduce future prison construction to reduce future prison construction criminal justice costs, and crime rates. Olympia. Washington State Institute for Public Policy • Farrington D, Gottfredson D, Sherman L, Welsh B (2002) The Maryland Scientific Methods Scale. In (eds) Farrington D, MacKenzie D, Sherman L, Welsh L.. Evidence Based Crime Prevention. London. Routledge. Pp 13-21 • What works clearinghouse extent of evidence classification. http://ies.ed.gov/ncee/wwc/pdf/extent_evidence.pdf • Robert E. Slavin RE, Lake C, Groff C (2008) Effective Programs in Middle and High School Mathematics: A Best-Evidence Synthesis. http://www.bestevidence.org/ • Weiss, C. H. (1999) ‘The Interface between Evaluation and Public Policy’, Evaluation 5(4): 468–86.

The EPPI-Centre is part of the Social Science Research Unit at the Institute of Education, University of London

The EPPI-Centre is part of the Social Science Research Unit at the Institute of Education, University of London

Presentation Transcript

RESEARCH AT THE INSTITUTE OF CHEMISTRY

The Logic of Social Science Research

The Logic of Social Science Research

The Confucius Institute at the University of Memphis

The Confucius Institute at the University of Manchester

Children at the Centre of Social Cohesion

James Thomas EPPI-Centre, Social Science Research Unit, Institute of Education, University of London

Assessment Research Centre The Hong Kong Institute of Education

EDUCATIONAL RESEARCH Paul Dowling Institute of Education University of London

Lisha Liu Institute of Education, University of London

Education and Research at the University of Pardubice

Research-led education at the University of Exeter

Assessment Research Centre The Hong Kong Institute of Education

EPPI-Centre Social Science Research Unit Institute of Education University of London

James Thomas, Ginny Brunton, Alison O’Mara-Eves EPPI-Centre, Social Science Research Unit,

Andy Green Director of ESRC-LLAKES Centre Institute of Education University of London

Richard Andrews Institute of Education University of London

The Institute of Education, University of London An introduction

The University of Sydney Institute of Marine Science

Research at the University of Iowa

The Insurance Institute of London

Tim Brosnan Institute of Education University of London