Jeffrey A. Butts John Jay College of Criminal Justice City University of New York August 7, 2012

How Researchers Generate and Interpret Evidence Jeffrey A. Butts John Jay College of Criminal Justice City University of New York August 7, 2012

THERE ARE MANY TYPES OF EVIDENCE Stage of Development Question to be Asked Evaluation Function Focus of Evidence-Based Practices and Policy Rossi, P., M. Lipsey and H. Freeman (2004). Evaluation: A Systematic Approach (7th Edition), p. 40. Sage Publications; adapted from Pancer & Westhues 1989).

THE ESSENTIAL QUESTION IN ALL EVALUATION RESEARCH IS… … COMPARED TO WHAT?

Time 1 2 3 4 5 6 OUTCOMES Data Collection Treatment Group Analyze Differences, Effect Size, etc. Random Assignment Process Control Group Data Collection OUTCOMES 1 2 3 4 5 6 Time CLASSIC EXPERIMENTAL, RANDOM-ASSIGNMENT DESIGN Begin Services ClientReferrals Eligibility for Randomization Determined by Evaluators or Using Guidelines From Evaluators No Services or Different Services Equivalent Data Collection? No Services Group? How to Randomize? Eligibility? Issues:

Time 1 2 3 4 5 6 OUTCOMES Data Collection Analyze Differences, Effect Size, etc. Matching Process Pool of Potential Comparison Cases Data Collection OUTCOMES 1 2 3 4 5 6 Time QUASI-EXPERIMENTAL DESIGN – MATCHED COMPARISON GROUPS ClientReferrals Treatment Group According to: - sex, race, age - prior services - scope of problems - etc. Comparison Group Comparison Cases? Control Services to Comparison Cases? Equivalent Data Collection? Matched on What? Issues:

SOME NON-TRADITIONAL DESIGNS CAN BE PERSUASIVE

Data Collection Points Group 1 Group 2 Group 3 X 1 2 3 4 5 6 Intervention Time QUASI-EXPERIMENTAL DESIGN – STAGGERED START X OUTCOMES X ClientReferrals OUTCOMES X OUTCOMES

MANY FACTORS ARE INVOLVED IN CHOOSING THE BEST DESIGN • Each design has to be adapted to the unique context of its setting • Experimental designs are always preferred, but rarely feasible • The critical task of evaluation design is choose the most rigorous, but realistic design possible • Key stakeholders should be involved in early deliberations over evaluation design, both to solicit their views and to gain their support for the eventual design • Design criticisms should be anticipated and dealt with early

TWO TYPES OF THREATS TO VALIDITY External: Something about the way the study is conducted makes it impossible to generalize the findings beyond this particular study.Can findings of effectiveness be transferred to the other settings and other circumstances? Internal: The study failed to establish credible evidence that the intervention (e.g., services, policy change) affected the outcomes in a demonstrable and causal way.Can we really say that A > caused > B? From “Quasi-Experimental Evaluation.” Evaluation and Data Development Strategic Policy, Human Resources Development Canada. January 1998, page 5 (SP-AH053E-01-98, see www.hrsdc.gc.ca).

THREATS TO INTERNAL VALIDITY (the intervention made a real difference in this study) Threats generated by evaluators Testing: Effects of taking a pretest on subsequent post-tests. People might do better on the second test simply because they have already taken it Also, taking a pretest may sensitize participants to a program. Participants may perform better simply because they know they are being tested — the “Hawthorne effect.” Instrumentation: Changes in the observers, scores, or the measuring instrument used from one time to the next. From “Quasi-Experimental Evaluation.” Evaluation and Data Development Strategic Policy, Human Resources Development Canada. January 1998, page 5 (SP-AH053E-01-98, see www.hrsdc.gc.ca).

THREATS TO INTERNAL VALIDITY (the intervention made a real difference in this study) Changes in the environment or in participants History: Changes in the environment that occur at the same time as the program and will change the behavior of participants (e.g., a recession might make a good program look bad). Maturation: Changes within individuals participating in the program resulting from natural biological or psychological development. From “Quasi-Experimental Evaluation.” Evaluation and Data Development Strategic Policy, Human Resources Development Canada. January 1998, page 5 (SP-AH053E-01-98, see www.hrsdc.gc.ca).

THREATS TO INTERNAL VALIDITY (the intervention made a real difference in this study) Participants not representative of population Selection: Assignment to participant or non-participant groups yield groups with different characteristics. Pre-program differences may be confused with program effect. Attrition: Participants drop out of program. Drop-outs may be different from those who stay. Statistical Regression: The tendency for those scoring extremely high or low on a selection measure to be less extreme during the next test. For example, if only those who scored worst on a reading test are included in the literacy program, they might be bound to do better on the next test regardless of the program just because the odds of doing as poorly next time are low. From “Quasi-Experimental Evaluation.” Evaluation and Data Development Strategic Policy, Human Resources Development Canada. January 1998, page 5 (SP-AH053E-01-98, see www.hrsdc.gc.ca).

INTERPRETING EFFECTS • Two important concepts: • Statistical Significance — How confident can we be that differences in outcome are really there and not just due to dumb luck? • Effect Size — How meaningful are the differences in outcome? Differences can be statistically significant, but trivial in terms of their application and benefit in the real world.

- 20% -10% 0% 10% 20% Percent Change in Recidivism

MUCH OF OUR REASONING COMES FROM KNOWLEDGE OF DISTRIBUTIONS

ANOTHER WAY TO THINK ABOUT IT… • Evaluators must assess not onlyoutcomes, but whether changing outcomes are attributable to program or policy • Outcome level is that status of an outcome at some point in time (e.g., the amount of smoking among teenagers) • Outcome change is the difference between outcome levels at different points in time or between groups • Program effect is the portion of a change in outcome that can be attributed uniquely to a program as opposed to the influence of other factors Rossi, P., M. Lipsey and H. Freeman (2004). Evaluation: A Systematic Approach (7th Edition), p. 208. Sage Publications.

CONTACT INFORMATION Jeffrey A. Butts, Ph.D. Director, Research & Evaluation CenterJohn Jay College of Criminal JusticeCity University of New York http://about.me/jbutts jbutts@jjay.cuny.edu

Jeffrey A. Butts John Jay College of Criminal Justice City University of New York August 7, 2012