1 / 25

Interactions: Types, Tests and Dangers

Interactions: Types, Tests and Dangers. By Amy Wagaman. Motivation. When trying to find the “right” treatment for a patient, researchers want to know if “treatment effects are homogeneous over various subsets of patients defined by prognostic factors.” (Gail and Simon 1985: 361).

charo
Download Presentation

Interactions: Types, Tests and Dangers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interactions: Types, Tests and Dangers By Amy Wagaman

  2. Motivation • When trying to find the “right” treatment for a patient, researchers want to know if “treatment effects are homogeneous over various subsets of patients defined by prognostic factors.” (Gail and Simon 1985: 361). • So, the logical thing to do is to investigate potential interactions.

  3. Types of Interactions • Qualitative Interaction: the direction of true treatment differences varies among subsets of patients – also called crossover interaction • Quantitative Interaction: variation in the magnitude but NOT direction of treatment effects among patient subgroups – also called a non-crossover interaction

  4. Illustration of Interactions The deltas are true treatment effects/efficacy by subgroup. This example is for 2 subgroups, say men and women. The yellow regions are regions of qualitative interaction.

  5. Why Qualitative Interactions? • Qualitative interactions illustrate that a treatment is harmful for one subgroup but beneficial for another. This is very useful information when deciding on what treatment to assign a particular patient. • The problem comes in identifying qualitative interactions.

  6. Continued • Qualitative interactions are less likely to exist than quantitative interactions. • The presence of qualitative interactions is not often found in similar trials. • “We regard observed qualitative interactions with skepticism for they are often shown to be spurious when the same comparison is made in similar trials.” (Yusuf 1991: 94)

  7. Why not Quantitative Interactions? • If a treatment is effective (significant positive treatment effect) for all subgroups, but some benefit perhaps more than others, a clinician will still prescribe that treatment for everyone. • Thus, it is argued that little attention needs to be paid to this type of interaction.

  8. Continued • “Quantitative interactions are to be expected, but may not be important clinically.” (Gail and Simon 1985: 362). • “I am almost certain a priori that a quantitative interaction will exist between a treatment and any categorization of patients which subdivides them into groups with materially different survival expectancy.” (Peto 1995: 1043).

  9. Qual. Versus Quan. • “In summary, quantitative interactions are a priori very plausible, but qualitative interactions are not and, when the overall treatment effects are not overwhelming, trials can be expected to generate a number of apparent qualitative interactions even if no interactions at all exist.” (Peto 1995: 1043).

  10. A Testing Hurdle • “All standard statistical tests for interaction are tests for quantitative interaction and significant results in them do not constitute any kind of evidence for the existence of qualitative interactions, unless in addition there were strong prior scientific reasons for anticipating qualitative interactions.” (Peto 1995: 1043)

  11. A Test for Qual. Interactions • Gail and Simon in 1985 developed a LRT for qualitative interactions. • This procedure is often used as the test for qualitative interactions. • However, it has several assumptions: • The subsets/subgroups ought to be disjoint • The subgroups must be specified in advance • “Unless such a prespecification is made, it is unlikely that sufficient numbers of patients will be available in all subsets for a meaningful assessment of interactions.” (Gail and Simon 1985: 366)

  12. Issues of Statistical Power • Based on work by Cohen, the estimated N under optimal study conditions was 128 to have 80% power to detect a medium-sized interaction. For a small-sized interaction, the required sample size is 780. • A review was done to examine 55 studies that tested for interactions. • Only 18 out of the 55 and then 3 out of the 55 studies had large enough samples to have 80% power respectively for each setting. (Moyer 2001)

  13. Another Statistical Issue • It so happens (see later slides) that people often perform MANY tests for interaction for any given study. • This helps fuel the suspicion that in many cases, researchers are finding spurious interactions – they are capitalizing on type 1 error. • Extreme example: If you ran 567 tests for interaction, you’d CERTAINLY be expected to find at least one significant interaction.

  14. Problems with Subgroups • Very few studies are hypothesis-driven with prespecified subgroups where a potential interaction would make sense. • If you use the data to “help” identify subgroups across which to look for an interaction, you’re getting into somewhat “fishy” territory. Why wouldn’t you expect an interaction across such subgroups?

  15. Subgroup Definitions • Proper subgroup: “a group of patients characterized by a common set of ‘baseline’ parameters” • Improper subgroup: “a group of patients characterized by a variable measured after randomization and potentially affected by treatment” (Yusuf 1991: 93)

  16. Another Subgroup Problem • It can be VERY misleading to look for interactions among improper subgroups. • This is because a treatment effect may have contributed to assignment to a subgroup.

  17. Misinterpretations • Two types of misinterpretations • Misinterpretation of significant interactions • Misinterpretation of non-significant but surprising interactions

  18. Example of Abuse of Test • In an analysis of data from the Beta-Blocker Heart Attack Trial, researchers tested for an interaction (using the Gail-Simon test) between “dominant” and “divergent” centers. There were 31 centers (21 dominant, 10 divergent). • Dominant means mortality rate higher for placebo. • Divergent means mortality rate higher for the treatment – propranolol. • Note that the subgroups were chosen using a study outcome. (Horwitz 1996)

  19. Ensuing Discussion • Senn and Harrell point out the “error” in the Horwitz paper. • “The ‘significant’ result … says absolutely nothing about the trial in question and everything about the practice of defining groups on the basis of extreme values after the results are in.” (Senn and Harrell 1997: 749) • Picking subsets based on an observed event rate differences is a serious violation of statistical assumptions.

  20. How Widespread is the Problem? • A review was done to examine 55 studies that tested for interactions. • 30 of those 55 studies found at least one significant interaction. • The mean number of tests performed by those 30 studies was 61 tests (median 16, range 3-567). • The mean number performed by the other 25 studies was 21 tests (median 7, range 1-186).

  21. Widespread Continued • Only TWO out of those 55 studies met the following criteria: • Hypothesis-driven • Sufficient statistical power to detect medium-size interactions • Random assignment of patients to treatments • Conducted 10 or fewer tests for interactions

  22. Term Clarification • The term “risk index” in this context is a misnomer. • Risk indices are used as predictors of outcomes, or looking for susceptible groups where new treatments are needed. • My work involves deciding between treatments for patients, not predicting an outcome and in that sense, it can be considered that I am looking for a “tailoring variable”. It’s discrimination versus prediction. Per Danny’s email and discussion with Susan

  23. Implications for Tailoring Variables • Looking for tailoring variables involves looking for subgroups of patients with similar characteristics such that the direction of treatment effect differs across the subgroups, so that you would want to assign one treatment to one group and another to a different group. You could also add a timing issue, i.e. when to switch. • Problem: This is essentially looking for qualitative interactions among unprespecified subgroups.

  24. Consider Quant. Interactions? Assume the top line is a very intensive and costly treatment, while the middle is a less-intensive/cheaper one, with the bottom being a control group. The y-axis is treatment effect, and the x-axis is some baseline variable. Based on a talk with Danny

  25. Bibliography Gail, M. and R. Simon. Testing for Qualitative Interactions between Treatment Effects and Patient Subsets. Biometrics. Vol. 41 No. 2 (June 1985): 361-372.  Green, Sylvan B. Design of Randomized Trials. Epidemiologic Reviews. Vol. 24 No. 1 (2002): 4-11.  Horwitz, et.al. Can Treatment that is Helpful on Average be Harmful to Some Patients?… Journal of Clinical Epidemiology. Vol. 49 No. 4 (1996): 395-400.  Moyer, et.al. Can Methodological Features Account for Patient-Treatment Matching Findings in the Alcohol Field? Journal of Studies on Alcohol. Vol. 62 Issue 1 (Jan. 2001): 62-82.  Peto, R. Clinical Trials. In Treatment of Cancer. Editors: Price, Sikora, and Halnan. Chapman and Hall Medical: New York. (1995): 1039-1043.  Senn, Stephen and Frank Harrell. On Wisdom after the Event. Journal of Clinical Epidemiology. Vol. 50 No. 7 (1997): 749-751.  Vach, et.al. Neural Networks and Logistic Regression: Part II. Computational Statistics and Data Analysis. Vol. 21 (1996): 683-701.  Yusuf, et.al. Analysis and Interpretation of Treatment Effects in Subgroups of Patients in Randomized Clinical Trials. Journal of the American Medical Association. Vol. 266 No. 1 (1991): 93-98. 

More Related