1 / 38

Systematic Review Module 11: Grading Strength of Evidence

Systematic Review Module 11: Grading Strength of Evidence. Kathleen N. Lohr , PhD Distinguished Fellow RTI International. Learning Objectives. What does grading strength of evidence (SOE) mean? Why is grading SOE important? How does grading SOE differ from rating quality of articles?

amora
Download Presentation

Systematic Review Module 11: Grading Strength of Evidence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Systematic Review Module 11:Grading Strength of Evidence Kathleen N. Lohr, PhDDistinguished Fellow RTI International

  2. Learning Objectives • What does grading strength of evidence (SOE) mean? • Why is grading SOE important? • How does grading SOE differ from rating quality of articles? • What are the primary and additional domains for grading SOE? • What variables, outcomes, and comparisons do you grade? • How are SOE domains scored? • How do you arrive at an overall SOE grade? • How do you present SOE scores?

  3. Grading SOE: Guidance • Distinct from rating quality of articles/studies • CERs as main focus (i.e., comparative effectiveness) • Content here pertains only to interventional studies, not screening or diagnostic tests • Generally applicable to all EPC systematic reviews

  4. Aims for Creating Guidance on Grading SOE • Facilitate use of the reports by diverse decisionmakers and stakeholders • Give decisionmakers a more comprehensive evaluation of the evidence than before • Provide explanation of methods to non-EPC readers • Provide authoritative citation for EPCs to use in reviews • Foster transparency and documentation • Especially important in ARRA era

  5. Three Steps to Grading SOE • Scoring four “required” domains • Risk of bias • Consistency • Directness • Precision • Considering, possibly scoring, four “additional” domains • Dose-response association • Plausible confounders • Strength of association • Publication bias • Combining scores from required domains into a single SOE score, taking scores on additional domains into account as needed

  6. Four Required Domains:Risk of Bias • Concerns both study design and study conduct for individual studies, rated by usual methods • Assesses the aggregate quality of studies within each major study design and integrate those assessments into an overall risk-of-bias score • Scores: high, medium, or low • High risk of bias lowers SOE grade • Low risk of bias raises SOE grade

  7. Four Required Domains:Consistency (I) • Degree of similarity in the effect sizes of different studies within an evidence base • Consistent: • Same direction of effect (same side of “no effect”) • Narrow range of effect sizes • Inconsistent: nonoverlapping confidence intervals, significant unexplained clinical or statistical heterogeneity, etc.

  8. Four Required Domains:Consistency (II) • Scores (levels) • Consistent (i.e., no inconsistency) • Inconsistent • Unknown or not applicable (single study cannot be assessed) • Meta-analysis: use appropriate tests

  9. Four Required Domains:Directness (I) • Whether evidence reflects a single, direct link between the interventions of interest and the ultimate health outcome under consideration or relies on multiple links • Using analytic frameworks is important • SOE can be only as strong as weakest link if multiple links are involved

  10. Four Required Domains:Directness (II) • Scores: • Direct: is based on health outcomes • Indirect: relies on surrogate/proxy outcomes (implies more than one body of evidence is needed)

  11. Four Required Domains: Directness in Comparisons • Direct: e.g., A vs. B, A vs. C, and B vs. C • Head-to-head studies in the evidence base • Generally assumes use of health outcomes, not surrogate/proxy outcomes • Better SOE • Indirect: e.g., A vs. B, B vs. C, but not A vs. C • No head-to-head studies that cover all interventions or outcomes of interest • Problematic situation • SOE not as strong as with direct evidence

  12. Four Required Domains:Precision (I) • Degree of certainty for estimate of effect with respect to a specific outcome • Complicated concept • What can decisionmakers conclude about whether one treatment is, clinically speaking, inferior, equivalent (neither inferior nor superior), or superior to another • Does include statistical significance

  13. Four Required Domains:Precision (II) • Scores: separately for each important outcome as presented in “summary estimate” • Precise: estimate allows a clinically useful conclusion • Imprecise: confidence interval so wide it could include clinically distinct (even conflicting) conclusions

  14. Additional Domains: General • Four domains: • Dose-response association • Plausible confounders • Strength of association • Publication bias • Domains are “discretionary”: use when they are • Applicable • Helpful in reaching conclusions about overall grades for SOE

  15. Additional Domains: Dose-response Association (I) • Pattern of a larger effect with greater exposure (dose, duration, adherence) either across or within studies • Rate if studies give levels of exposure

  16. Additional Domains: Dose-response Association (II) • Scores: • Present: dose-response pattern observed • Not present: no dose-response pattern observed (dose-response relationship not present) • NA (not applicable or not tested)

  17. Additional Domains:Plausible Confounding (I) • In an observational study, sometimes plausible confounding factors work in the direction opposite that of the observed effect • Had such “effect-weakening” confounders not been present, the observed effect would have been even larger than the one observed • In such a case, an EPC may want to upgrade the level of evidence • So, consider whether plausible confounding exists that would decrease the observed effect

  18. Additional Domains:Plausible Confounding (II) • Scores: • Present: confounding factors that would decrease the observed effect may be present • Absent: confounding factors that would decrease the observed effect are not likely to be present

  19. Additional Domains:Strength of Association (I) • Magnitude of effect: likelihood that the observed effect is large enough that it cannot have occurred solely as a result of bias from potential confounding factors • Consider when effect size is particularly large

  20. Additional Domains:Strength of Association (II) • Scoring • Strong: large effect size that is unlikely to have occurred in the absence of a true effect of the intervention • Weak: small enough effect size that it could have occurred solely as a result of bias from confounding factors

  21. Additional Domains:Publication Bias (I) • Studies may have been published selectively (e.g., only a small proportion of relevant trials [or other studies] has been published) • Thus, • Estimated effects of an intervention based on published studies do not reflect true effect • Publication bias may undermine the overall robustness of a body of evidence

  22. Additional Domains:Publication Bias (II) • Scores: • Need not be formally computed but can influence ratings of required domains • Take these possible publication bias factors into account in • Rating for consistency • Calculating a summary confidence interval for an effect • Comment on publication bias when circumstances suggest that relevant empirical findings, particularly negative or no-difference findings, have not been published or are not otherwise available

  23. Applicability (I) • Evaluate external validity • Judge applicability intended for different decisionmakers and user groups • Take into account how well the evidence maps to a variety of contexts, specifically • Patient populations, diseases or conditions, interventions, comparators, outcomes, and settings

  24. Applicability (II) • Make judgments about applicability explicit and separate from assessments of other domains • Make clear when any statements about evidence are based on applicability rather than on other aspects of the evidence

  25. Procedures for Assessing Domains • Use two or more reviewers with the appropriate clinical and methodological expertise • Assess separately • Each required domain (or each optional domain, as relevant) • For each major outcome, including benefits and harms • Resolve differences by consensus or mediation by an additional expert; consensus scores appear in tables • Record and save each reviewer’s individual judgments about domains as background documentation

  26. Strength of Evidence Grades (I) • Global assessment that • Takes the required domains directly into account • As needed, incorporates judgments about the additional domains as well

  27. Strength of Evidence Grades (II) • For each comparison of interest, rate SOE for • Each major benefit (e.g., positive impact on health outcomes such as physical function or quality of life or effects on laboratory measures or other surrogate variables) • Each major harm (ranging from rare, serious, or life-threatening adverse events to common but bothersome effects) • For both benefits and harms, focus on outcomes most relevant to patients, clinicians, and policymakers

  28. Strength of Evidence Grades and Definitions • High: High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect. • Moderate: Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate. • Low: Low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and is likely to change the estimate. • Insufficient: Evidence either is unavailable or does not permit a conclusion.

  29. Strength of Evidence Grades: Additional Points • Using high, moderate, or low SOE • Implies that a body of evidence actually exists • Is intended to convey how confident reviewers are about decisions that might be made based on evidence graded one way or another • Requires using only one designation, not range (e.g., not “low to moderate”) • Using insufficient • Applies when reviewers truly cannot draw conclusions about an outcome, comparison, or other question • Arises when • No evidence is available at all • When evidence is just too feeble or insubstantial to permit drawing conclusions (e.g., opposing results from studies with similar risk of bias ratings; wide and overlapping confidence intervals)

  30. Scoring and Reporting: General Guidance • May use different approaches to incorporate multiple domains into an overall strength-of-evidence grade • Specifically, can use • GRADE algorithm itself • EPC’s own weighting system • Some qualitative approach • Use (at least) two reviewers • Assess resulting inter-rater reliability for each domain score; keep records

  31. Guiding Principles:Risk of Bias (I) • Risk of bias (given design and conduct of available studies) is the essential component in determining a SOE grade • First consider which study design is most appropriate to reduce bias for each question • Next consider the risk of bias from available studies

  32. Guiding Principles:Risk of Bias (II)—Example Drug comparisons: with RCTs, either placebo or active comparator, as appropriate design • Evidence from well-done studies will have less risk of bias than evidence based on observational studies • So may start with a rating of low for risk of bias and change the assessment of this domain if the RCTs have important flaws • Then, observational data may generally start with a rating of high risk of bias, but can change assessment depending on how well studies were conducted

  33. Further Guiding Principles to Scoring • Be explicit about whether the evidence grade will be determined by • A point system for combining ratings of the domains or • A qualitative consideration of the domains • Carefully document procedures • Keep records of procedures and results for each review so that they may contribute to the overall EPC expertise and science-of-grading evidence

  34. Further Guiding Principles for Reporting (I) • Explain • Rationale for approach and which domains were important in upgrading or downgrading strength of evidence • Judgments about the degree to which any additional domains altered overall strength-of-evidence grade • Provide enough detail within the report to ensure that users can grasp the methods

  35. Further Guiding Principles for Reporting (II) • Use the terms: high, moderate, low, or insufficient • Not Roman numerals or other symbols • Use or adapt the illustrative tabular approach to reporting • Next slide or • See chapter (AHRQ EHC website) or article (eprint as of late 2009) for an example

  36. Grading SOE: Table for Presentation of Results Table 4. Treatment 1 vs. Treatment 2: Numbers of Studies and Subjects, Strength of Evidence Domains, Magnitude of Effect, and Strength of Evidence for Key Outcomes

  37. Sources of Information Cross-EPC Authors: Douglas K. Owens, Kathleen N. Lohr, David Atkins, Jonathan R. Treadwell, James T. Reston, Eric B. Bass, Stephanie Chang, Mark Helfand • Article: Grading the strength of a body of evidence when comparing medical interventions—Agency for Healthcare Research and Quality and the Effective Health Care Program. J ClinEpidemiol. 2009 Jul 10. [Epub ahead of print]. • Chapter on AHRQ website: Grading the strength of a body of evidence when comparing medical interventions. http://effectivehealthcare.ahrq.gov/repFiles/2009_0805_grading.pdf.

  38. Summary: Grading Strength of Evidence • Is a critical last step in analysis and presentation • Is done after rating quality of articles and by at least two independent reviewers • Helps users of systematic reviews understand the body of evidence and how much confidence they can have in making decisions based on that evidence • Uses scores on four primary (mandatory) domains and four additional (discretionary) domains • Focuses on major outcomes and comparisons • Is denoted in terms of high, moderate, or low strength or insufficient evidence • Presents SOE grades in tabular form

More Related