Selecting Evidence for Comparative Effectiveness Reviews: When to use Observational Studies

Selecting Evidence for Comparative Effectiveness Reviews:When to use Observational Studies Dan Jonas, MD, MPH MeeraViswanathan, PhD Karen Crotty, PhD, MPH RTI-UNC Evidence-based Practice Center

Sources • AHRQ Methods Guide, Chapters 4 and 8, http://www.effectivehealthcare.ahrq.gov/repFiles/2007_10DraftMethodsGuide.pdf • Draft manuscript, Norris et al., Observational Studies in Systematic Reviews of Comparative Effectiveness. • Chou R, Aronson N, Atkins D, et al. Assessing harms when comparing medical interventions: AHRQ and the Effective Health Care Program. J ClinEpidemiol 2008 Sep 25.

Overview Why should reviewers consider including observational studies (OS) in comparative effectiveness reviews (CERs)? When should OS be included in CERs? What are the differences in considering inclusion of OS for benefits as opposed to OS of harms?

Current Perspective • CERs should consider including observational studies *this should be the default strategy* • Reviewers should explicitly state the rationale for including or excluding OS

Comparative Effectiveness Reviews (CERs) • Systematic reviews that compare the relative benefits and harms among a range of available treatments or interventions for a given condition

CER Process Overview

Hierarchy of Evidence Lowest risk of bias Applicability?

Danger of Over-reliance on RCTs • May be unnecessary, inappropriate, inadequate, or impractical • May be too short in duration • May report intermediate outcomes rather than main health outcomes of interest • Often not available for vulnerable populations • Generally report efficacy rather than effectiveness • AHRQ Evidence-based Practice Centers include wide variety of study designs (not only RCTs)

Observational Studies (OS) • Definition: Studies where the investigators did not assign the exposure/intervention • i.e. non-experimental studies • Controlled clinical trials are quasi-experimental studies, not OS • We present considerations for including OS to assess benefits and to assess harms separately

OS to Assess Benefits • Often insufficient evidence from trials to answer all KQs in CERs (think PICOTS) • Population: may not be available for sub-populations and vulnerable populations • Interventions: may not be able to assign high-risk interventions randomly • Outcomes: may report intermediate outcomes rather than main health outcomes of interest • Timing: may be too short in duration • Setting: may not represent typical practice

Group Exercise • What should reviewers consider when deciding whether or not to include observational studies in CERs?

OS to Assess Benefits • Reviewers should consider 2 questions: • Are there gaps in trial evidence for the review questions under consideration? • Will observational studies provide valid and useful information to address key questions?

Are there gaps in trial evidence? Will OS provide valid and useful information? Systematic review question (including PICOTS) Always consider: Controlled Trials Are there gaps in trial evidence? Yes No Confine review to Controlled Trials Consider OS Will OS provide valid and useful information? Assess the suitability of OS: Natural history of the disease or exposure Potential biases Assess whether OS address the review question Refocus the review question on gaps

Group Exercise: Include OS? • CER of PCI vs. CABG for coronary disease identified 23 RCTs. Experts (TEP) raised concerns that the studies enrolled patients with a relatively narrow spectrum of disease relative to those having the procedures in current practice • Review of antioxidant supplementation to prevent heart disease found numerous large clinical trials, including over 20,000 elevated-risk subjects in the Heart Protection Study. No beneficial effects were seen in CV outcomes, including mortality. Findings were consistent across trials with varying populations, sizes, etc.

Group Exercise: include OS? • CER of PCI vs. CABG----Need to look for OS • OS from 10 large cardiovascular registries were identified • These confirmed that the use of the procedures in the community included patients with wider variation in disease • For patients similar to those enrolled in trials, mortality results in the registries were similar to trials (no difference between interventions) • Relative benefits of the procedures varied markedly with extent of disease, raising caution about extending trial conclusions to patients with greater or lesser disease than those in trial populations • Review of antioxidant supplementation to prevent heart disease----Trial data are sufficient

Gaps in Trial Evidence: PICOTS • Trial data may be insufficient for a number of reasons • PICOTS • Populations included (missing certain groups) • Interventions included • Outcomes reported (only intermediate) • Duration • All trials may be efficacy studies

Are Trial Data Sufficient? PICOTS and Beyond • Risk of bias (internal validity) • Degree to which the findings may be attributed to factors other than the intervention under review • Consistency • Extent to which effect size and direction vary within and across studies • Inconsistency may be due to heterogeneity across PICOTS • Directness • Degree to which outcomes that are important to users of the CER (patients, clinicians, or policymakers) are encompassed by trial data • Health outcomes generally most important

Are Trial Data Sufficient?PICOTS and Beyond • Precision • Includes sample size, number of studies, and heterogeneity within or across studies • Reporting bias • Extent to which trial authors appear to have reported all outcomes examined • Applicability • Extent to which the trial data are likely to be applicable to populations, interventions, and settings of interest to the user • The review questions should reflect the PICOTS characteristics of interest

When to Identify Gaps in Trial Evidence • Identification of gaps in trial evidence available to answer review questions can occur at a number of points in the review • When first scoping the review • Consultation with Technical Expert Panel • Initial review of titles and abstracts • After detailed review of trial data

CER Process Overview

Gaps in Trial Evidence • Operationally, may perform initial searches broadly, to identify both OS and trials, or may do searches sequentially and search for OS after reviewing trials in detail to identify gaps in evidence

2. Will observational studies provide valid and useful information to address key questions? Reviewers should: • Refocus the study question on gaps in trial evidence • specify the PICOTS characteristics for gaps in trial evidence • Assess whether available OS may address the review questions (applicable to PICOTS?) • Assess suitability of OS to answer the review questions

Valid and Useful Information • Assess suitability of OS to answer the review questions • After gaps have been identified in trial literature and that OS potentially fill those gaps • Consider the clinical context and natural history of the condition under study • Assess how potential biases may influence the results of OS

Clinical context • Fluctuating or intermittent conditions are more difficult to assess with OS • Especially if there is no comparison group • OS may be more useful for conditions with steady progression or decline

Group Exercise • Here are two very different conditions: • Acute low back pain • Amyotrophic lateral sclerosis (ALS) • How might the differences in these conditions impact whether OS would provide useful information?

Group Exercise • Main considerations here are the natural history of the condition under study • People with acute low back pain often recover spontaneously • A cohort study of treatments for acute low back pain can’t establish, with any degree of certainty, whether the treatments affected patient outcomes • ALS has a course of steady decline • An uncontrolled cohort study of treatments for ALS may well be able to demonstrate meaningful effects

Potential biases • Selection bias (and confounding by indication) • Performance bias • Detection bias • Attrition bias

Group Exercise • Suppose you’re conducting a CER of medications for rheumatoid arthritis (RA) • You find several retrospective analyses of administrative databases comparing outcomes of RA patients taking etanercept vs. methotrexate • Suppose that etanercept is restricted in many of the health systems to patients with more severe RA who have failed other treatments • Should you include these OS? • What considerations will influence your decision?

Group Exercise • Confounding by indication • A type of selection bias • When different diagnoses, severity of illness, or comorbid conditions are important reasons for physicians to assign different treatments • Common problem in pharmacoepidemiology studies comparing beneficial effects of interventions • Generally would not include this information due to a high risk of bias (poor internal validity), unless studies had a good way to adjust for severity of disease

Harms • Assessing harms can be difficult • Trials often focus on benefits, with little effort to balance assessment of benefits and harms • OS are almost always necessary to assess harms adequately • There are tradeoffs between increasing comprehensiveness of reviewing all possible harms data and decreasing quality (increasing risk of bias) for harms data

Trials to Assess Harms • Randomized controlled trials = gold standard for evaluating efficacy • But, relying solely on RCTs to evaluate harms in CERs is problematic • Most lack prespecified hypotheses for harms as they are designed to evaluate benefits • Assessment of harms is often a secondary consideration • Quality and quantity of reporting of harms is frequently inadequate • Few have sufficient sample sizes or duration to adequately assess uncommon or long-term harms

Trials to Assess Harms • Most RCTs are “efficacy” trials • they assess benefits and harms in ideal, homogenous populations and settings • patients who are more susceptible to harms are often under-represented • Few RCTs directly compare alternative treatment strategies • Publication bias and selective outcome reporting bias • RCTs may not be available

Trials to Assess Harms • Nevertheless, head-to-head RCTs provide the most direct evidence on comparative harms • In addition, placebo-controlled RCTs can provide important information • In general, CERs should routinely include both head-to-head and placebo-controlled trials for assessment of harms • In lieu of placebo-controlled RCTs, CERs may incorporate findings of well-conducted systematic reviews if they evaluated the specific harms of interest

Unpublished Supplemental Trials Data • Consider including results of completed or terminated unpublished RCTs and unpublished results from published trials • FDA website, http://www.ClinicalTrials.gov, etc. • Must contemplate ability to fully assess risk of bias • When significant # of published trials fails to report an important AE, CER authors should report this gap in the evidence and consider efforts to obtain unpublished data

OS to Assess Harms • OS are almost always necessary to assess harms adequately • Exception is when there are sufficient data from RCTs to reliably estimate harms • May provide best or only data for assessing harms in minority or vulnerable populations who are under-represented in trials • Types of OS included in a CER will vary; different types of OS might be included or rendered irrelevant by availability of data from stronger study types

Hypothesis Testing vs. Hypothesis Generating • Important consideration in determining which OS to include • Case reports are hypothesis generating • Cohort and case-control studies are well suited for testing hypotheses of whether one intervention is associated with a greater risk for an adverse event than another and for quantifying the risk* *Chou et al, JCE 2008

Hierarchy of Evidence Lowest risk of bias Hypothesis Testing Applicability? Hypothesis Generating

OS to Assess Harms • Cohort and case-control studies • CERs should routinely search for and include, except when RCT data are sufficient and valid • OS based on patient registries • OS based on analyses of large databases • Case reports and post-marketing surveillance • New medications • Other OS

OS to Assess Harms • Criteria to select OS for inclusion • there are often many more OS than trials; evaluating a large number of OS can be impractical when conducting a CER • Several criteria commonly uses in CERs to screen OS for inclusion (empirical data lacking) • Minimum duration of follow-up • Minimum sample size • Defined threshold for risk of bias • Study design (cohort and case-control) • Specific population of interest

Key Take-home Points • Often insufficient evidence from trials to answer all Key Questions in CERs • CERs should consider including OS *default strategy* • Should explicitly state the rationale for including or excluding OS • For OS to assess benefits, reviewers should consider 2 questions: • Are there gaps in trial evidence for the review questions under consideration? • Will observational studies provide valid and useful information to address key questions? • For harms, should routinely search for and include cohort and case-control studies

Selecting Evidence for Comparative Effectiveness Reviews: When to use Observational Studies