1 / 31

What To Do About the Multiple Comparisons Problem? Peter Z. Schochet

What To Do About the Multiple Comparisons Problem? Peter Z. Schochet. February 2008. Overview of Presentation. Background Suggested testing guidelines. 2. Background. Overview of the Problem. Multiple hypothesis tests are often conducted in impact studies Outcomes Subgroups

tamera
Download Presentation

What To Do About the Multiple Comparisons Problem? Peter Z. Schochet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What To Do About the Multiple Comparisons Problem?Peter Z. Schochet February 2008

  2. Overview of Presentation • Background • Suggested testing guidelines 2

  3. Background

  4. Overview of the Problem • Multiple hypothesis tests are often conducted in impact studies • Outcomes • Subgroups • Treatment groups • Standard testing methods could yield: • Spurious significant impacts • Incorrect policy conclusions 4

  5. Assume a Classical Hypothesis Testing Framework • True impacts are fixed for the study population • Test H0j: Impactj = 0 • Reject H0j if p-value of t-test < =.05 • Chance of finding a spurious impact is 5 percent for each test alone 5

  6. But Suppose No True Impacts and the Tests Are Considered Together Probability 1 t-test Number of TestsaIs Statistically Significant 1 .05 5 .23 10 .40 20 .64 50 .92 aAssumes independent tests 6

  7. Impact Findings Can Be Misrepresented • Publishing bias • A focus on “stars” 7

  8. Adjustment Procedures Lower Levels for Individual Tests • Control the “combined” error rate • Many available methods: • Bonferroni: Compare p-values to (.05 / # of tests) • Fisher’s LSD, Holm (1979), Sidak (1967), Scheffe (1959), Hochberg (1988), Rom (1990), Tukey (1953) • Resampling methods (Westfall and Young 1993) • Benjamini-Hochberg (1995) 8

  9. These Methods Reduce Statistical Power- The Chances of Finding RealEffects Simulated Statistical Powera Number of Tests UnadjustedBonferroni 5 .80 .59 10 .80 .50 20 .80 .41 50 .80 .31 a Assumes 1,000 treatments and 1,000 controls, 20 percent of all null hypotheses are true, and independent tests 9

  10. Big Debate on Whether To Use Adjustment Procedures • What is the proper balance between Type I and Type II errors? 10

  11. To Adjust or Not To Adjust?

  12. February, July, December 2007 Advisory Panel Meetings Held at IES Participants: Steve Bell, Abt Howard Bloom, MDRC John Burghardt, MPR Mark Dynarski, MPR Andrew Gelman, Columbia David Judkins, Westat Jeff Kling, Brookings David Myers, AIR Larry Orr, Abt Peter Schochet, MPR Chairs: Phoebe Cottingham, IES Rob Hollister, Swarthmore Rebecca Maynard, U. of PA 12

  13. Basic Principles for a Testing Strategy

  14. The Multiplicity Problem Should Not Be Ignored • Erroneous conclusions can result otherwise • But need a strategy that balances Type I and II errors 14

  15. Limiting the Number of Outcomes and Subgroups Can Help • But not always possible or desirable • Need flexible strategy for confirmatory and exploratory analyses 15

  16. Problem Should Be Addressed by First Structuring the Data • Structure will depend on the research questions • Adjustments should not be conducted blindly across all contrasts 16

  17. Suggested Testing Guidelines

  18. The Plan Must Be Specified Up Front • Rigor requires that the strategy be documented prior to data analysis 18

  19. Delineate Separate Outcome Domains • Based on a conceptual framework that relates the intervention to the outcomes • Represent key clusters of constructs • Domain “items” are likely to measure the same underlying trait • Test scores • Teacher practices • School attendance 19

  20. Testing Strategy: Both Confirmatory and Exploratory Components • Confirmatory component • Addresses central study hypotheses • Must adjust for multiple comparisons • Must be specified in advance • Exploratory component • Identify impacts or relationships for future study • Findings should be regarded as preliminary 20

  21. Confirmatory Analysis Has Two Potential Parts • Domain-specific analysis • Between-domain analysis 21

  22. Domain-Specific Analysis

  23. Test Impacts for Outcomes as a Group • Create a composite domain outcome • Weighted average of standardized outcomes • Simple average • Index • Latent factor • Conduct a t-test on the composite 23

  24. What About Tests for Individual Domain Outcomes? • If impact on composite is significant • Test impacts for individual domain outcomes without multiplicity corrections • Use only for interpretation • If impact on composite is not significant • Further tests are not warranted 24

  25. Between-Domain Analysis

  26. Applicable If Studies Require Summative Evidence of Impacts • Constructing “unified” composites may not make sense • Domains measure different latent traits • Test domain composites individually using adjustment procedures 26

  27. Testing Strategy Will Depend on the Research Questions • Are impacts significant in all domains? • No adjustments are needed • Are impacts significant in anydomain? • Adjustments are needed 27

  28. Other Situations That Require Multiplicity Adjustments • Designs with multiple treatment groups • Apply Tukey-Kramer, Dunnett, or resampling methods to domain composites • Subgroup analyses that are part of the confirmatory analysis • Conduct F-tests fordifferences across subgroup impacts 28

  29. Statistical Power • Studies must be designed to have sufficient statistical power for all confirmatory analyses • Includes subgroup analyses 29

  30. Reporting Must Link to the Study Protocols • Qualify confirmatory and exploratory analysis findings in reports • No one way to present adjusted and unadjusted p-values • Confidence intervals may be helpful • Emphasize confirmatory analysis results in the executive summary 30

  31. Testing Approach Summary • Pre-specify plan in the study protocols • Structure the data • Delineate outcome domains • Confirmatory analysis • Within and between domains • Exploratory analysis • Qualify findings appropriately 31

More Related