Grading the Strength of a Body of Evidence on Diagnostic Tests

Grading the Strength of a Body of Evidence on Diagnostic Tests Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Medical Test Reviews Methods Guide www.ahrq.gov

Overview of a Medical Test Review • Analyze and Synthesize Studies • Assess Risk of Bias as a Domain of Quality • Assess Applicability • Grade the Body of Evidence • Meta-analysis of Test Performance Evidence With a “Gold Standard” — or — • Meta-analysis of Test Performance Evidence With an Imperfect Reference Standard • Decision Modeling Extract Data From Studies • Prepare Topic • Develop the Topic and Structure the Review • Choose the Important Outcomes • Search for and Select Studies for Inclusion • Search for Studies Research Sources Report Medical Test Review Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Learning Objectives • Explain the value of grading the bodies of evidence that influence decisionmaking. • Describe the required domains involved in assessing strength of evidence. • Understand the challenges specific to grading the strength of a body of evidence on medical test performance. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Overview • Grading the strength of evidence • Challenges specific to diagnostic tests • Principles of grading the strength of a body of evidence • Illustrative examples Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Grading the Strength of Evidence (1 of 4) • “Grading” refers to assessment of the strength of the body of evidence supporting a given statement or conclusion. • It does not refer to the quality of an individual study. • Grading allows decisionmakers know if they are working with bodies of evidence that are of high, moderate, or low strength. • It makes their recommendations about interventions/tests more defensible. • This module focuses on grading diagnostic tests (used in diagnostic and management strategies), not prognostic tests (predict responsiveness to treatment). Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Grading the Strength of Evidence (2 of 4) • There are two approaches to grading strength of evidence: • GRADE (Grading of Recommendations Assessment, Development, and Evaluation) • EPC (Evidence-based Practice Center), which is a derivation of GRADE • This module discusses the EPC approach while pointing out the minor differences between it and GRADE. • It is important to differentiate between assessing the quality of an individual study and the strength of an entire body of evidence. • “Strength” assessment includes the quality of an individual study plus other domains. • The EPC approach uses the term “strength of evidence” rather than the term “quality of evidence” used in the GRADE approach to prevent mixing up these ideas. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Agency for Health Care Research and Quality. Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Available at www.effectivehealthcare.ahrq.gov/methodsguide.cfm.

Grading the Strength of Evidence (3 of 4) • Four principle domains are used to assess strength of evidence: • Risk of bias • Consistency • Directness* • Precision • Additional domains to be considered as needed: • Publication bias • Dose-response association • Existence of plausible unmeasured confounders • Strength of association (i.e. magnitude of effect) * The GRADE approach considers “applicability” to be part of directness; the EPC approach considers it a component separate from strength of evidence. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Grading the Strength of Evidence (4 of 4) • In the Evidence-based Practice Center approach, strength of evidence is graded for each outcome and comparison in the Key Questions. • At least one Key Question usually relates to test performance (e.g., sensitivity and specificity). • Other key question topics that can also be evaluated using the methods outlined here: • Analytic validity • Clinical validity • Clinical utility Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Agency for Health Care Research and Quality. Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Available at www.effectivehealthcare.ahrq.gov/methodsguide.cfm. Samson D, Schoelles KM. Developing the topic and structuring systematic reviews of medical tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Challenges Specific to Diagnostic Tests (1 of 2) • Challenge 1: Tools for assessing the quality of evidence for an intervention are not easily applied to evidence for a diagnostic test. • Evidence directly relates to the overarching Key Question for interventions; there is an indirect relationship for diagnostic tests. • Establishment of logical chains with analytic frameworks is needed, as is assessment of strength of evidence for each “link.” • The strength of evidence of the overarching question of whether a test improves clinical outcomes depends on: • Total body of evidence • Body of evidence for weakest link in chain • Often there is no direct linkage between diagnostic accuracy and clinical outcomes. • Example: When tests are used as triage, accuracy of risk classification is more important than accuracy of diagnosis. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Challenges Specific to Diagnostic Tests (2 of 2) • Challenge 2: A second challenge to reviewers is applying the strength-of-evidence domains to studies of diagnostic tests. • Example: It is difficult to judge the precision of a particular confidence interval in estimates of test performance. • This difficulty is due to the logarithmic nature of diagnostic performance measurements such as: • Sensitivity • Specificity • Likelihood ratios • Diagnostic odds ratios • Relatively wide confidence intervals (suggesting imprecision) may not translate into clinically meaningful impacts. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Challenge 2 Example: Impact of the Precision of Sensitivity on Negative Predictive Value • This example shows wide confidence intervals that did not result in a clinically meaningful impact on the clinic outcome. • A 10-percent reduction in the sensitivity of various biopsy techniques (98%  88%) changes the estimated probability of having cancer after a negative test by <5 percent. Bruening W, Schoelles K, Treadwell J, et al. Comparative Effectiveness of Core-Needle and Open Surgical Biopsy for the Diagnosis of Breast Lesions. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principles of Grading the Strength of a Body of Evidence • Methods for grading intervention studies can be adapted for diagnostic tests with clinical outcomes. • Consider carefully what test characteristic measures are the most appropriate intermediate outcomes for assessing the impact of a test on clinical outcomes and their precision in the clinical context represented by the Key Question. • The principle domains of the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach can be adapted to assess a body of evidence on diagnostic test accuracy. • Additional GRADE domains can be adapted to assess a body of evidence on diagnostic test accuracy. • Multiple domains should be incorporated into an overall assessment in a transparent way. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principle 1: Methods for Grading Intervention Studies Can Be Adapted for Diagnostic Tests With Clinical Outcomes (1 of 2) • Can assess a body of evidence evaluating diagnostic test outcomes (e.g., diagnostic thinking, therapeutic choice, clinical outcomes) in a way similar to therapeutic interventions. • Grading issues for studies measuring these diagnostic test outcomes are more straightforward than for accuracy outcomes. • Trial evidence can directly assess the effect of tests on clinical outcomes (rarely done). • In this case, the application of grading criteria should not differ significantly from the methods used for intervention evidence. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principle 1: Methods for Grading Intervention Studies Can Be Adapted for Diagnostic Tests With Clinical Outcomes (2 of 2) • When direct evidence linking the test to the outcome of interest is not available: • For intervention studies, intermediate outcomes (e.g., accuracy outcomes) are “indirect” evidence and reduce the strength of the grade • Linking accuracy outcomes (e.g., false/true-positive outcomes) to clinical outcomes partly depends on factors such as benefits and harms of treatment. • No consensus exists on the approach to use with only indirect evidence. • Accuracy outcomes may be enough to determine if there is a benefit on clinical outcomes • Otherwise, discuss the benefits of including this evidence and the specific methods to be used Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Samson D, Schoelles KM. Developing the topic and structuring systematic reviews of medical tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principle 2: Consider Appropriate Test Characteristic Measures and Their Precision (1 of 4) • Consider how surrogates (e.g., accuracy outcomes) will lead to changes in clinical outcomes. • Doing so is consistent with both the EPC and GRADE principles of emphasizing outcomes important to patients. • Using an analytic framework and decision models helps clarify the linkage between accuracy and clinical outcomes. • True-positive/negative and false-positive/negative outcomes are easily translated to other accuracy outcomes (e.g., sensitivity, specificity, positive/negative predictive value). • Choose accuracy outcomes that: • Are most directly related to clinical outcomes • Have the necessary precision levels Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Samson D, Schoelles KM. Developing the topic and structuring systematic reviews of medical tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Trikalinos TA, Kulasingam S, Lawrence WF. Deciding whether to complement a systematic review of medical tests with decision modeling. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principle 2: Consider Appropriate Test Characteristic Measures and Their Precision (2 of 4) • Occasionally, it is more important to rule out a disease with severe consequences than to diagnose what disease the patient has. • Triage tests with high sensitivity and high negative predictive value are useful. • Multiple tests are often used to ensure accuracy when high risks exist with treatment of a disease. • Tests used alone need both high sensitivity and specificity (or high positive and high negative predictive value). • If no test meets these criteria, the interest may be in benefits and harms of adding on a test. • The accuracy outcome of interest then is high specificity or high positive predictive value. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principle 2: Consider Appropriate Test Characteristic Measures and Their Precision (3 of 4) • More invasive tests have greater harms, with further harms resulting from misdiagnosis. • False-positive and false-negative measurements for a test become important. • The degree of harms depends on: • False-negative results • Severity of disease (for missed diagnosis) • Risks of testing (if test is invasive and has harms itself) • False-positive results • Invasiveness of further testing/treatment • Cognitive/emotional effects of inaccurate disease labeling Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principle 2: Consider Appropriate Test Characteristic Measures and Their Precision (4 of 4) • A simplified example of precision in context: A test with outcomes data resulting from true-positive, true-negative, false-positive, and false-negative test results • Using a decision model, find the threshold of test sensitivity and specificity combinations where testing versus not testing is a tossup. • Net benefits are equivalent to net harms at the threshold. • There is enough precision if the confidence intervals for sensitivity and specificity from the data in the body of evidence are completely on one side of the threshold. • This example is oversimplified for many situations. • Tests are usually part of an algorithm. • The accuracy outcome of interest depends on how the test is used in an algorithm and its mechanism of action on outcomes. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Pauker SG, Kassirer JP. N Engl J Med 1980 May 15;302(10):1109-17. PMID: 7366635.

Principle 3: The Principle Domains of GRADE Can Be AdaptedTo Assess a Body of Evidence on Diagnostic Test Accuracy (1 of 5) • The risk of bias, consistency, directness, and precision domains of GRADE can be adapted to assess a body of evidence on diagnostic test performance. • Risk of bias • How study type/design/conduct may have contributed to systematic bias must be considered. • Different technical methods used in diagnostic (particularly laboratory) tests can lead to heterogeneous results. For example: • Using different antibodies in immunoassays • Using standards with different values and units Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Samson D, Schoelles KM. Developing the topic and structuring systematic reviews of medical tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principle 3: The Principle Domains of GRADE Can Be AdaptedTo Assess a Body of Evidence on Diagnostic Test Accuracy (2 of 5) • Consistency is homogeneity in the direction and magnitude of results across studies. • It can be applied to diagnostic test performance studies (the method of measurement may differ). For example: • Intervention studies visually assess consistency with forest plots. • Diagnostic test performance reviews use a summary receiver operating characteristic (ROC) curve. • ROC curves show sensitivity and specificity results from studies. • The consistency of diagnostic accuracy can be shown by bubble plots of true-positive versus false-positive rates in ROC space. • As in intervention studies, strength of evidence is reduced by unexplained heterogeneity (i.e., heterogeneity not explained by different study designs, methodological quality, diversity in subject characteristics, or study context). Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principle 3: The Principle Domains of GRADE Can Be AdaptedTo Assess a Body of Evidence on Diagnostic Test Accuracy (3 of 5) • Evidence is direct is when it reflects a single, direct link between the medical tests and the ultimate health outcome under consideration. • If no studies link the test to clinical outcome, then diagnostic accuracy outcomes are considered indirect. • If grading evidence of intermediate outcomes like diagnostic accuracy, then do not “downgrade” for being indirect • Directness does apply to how the use of the test compares to others. For example: • Comparison of d-dimer test and venous ultrasound as two opposing methods for diagnosis of venous thromboembolism • Comparison of interest may actually be use of d-dimer test as triage for venous ultrasound vs. ultrasound alone Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Agency for Health Care Research and Quality. Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Available at www.effectivehealthcare.ahrq.gov/methodsguide.cfm.

Principle 3: The Principle Domains of GRADE Can Be AdaptedTo Assess a Body of Evidence on Diagnostic Test Accuracy (4 of 5) • Schemas such as GRADE rate directness based on: • Whether the test evaluated is used in practice • If test accuracy is calculated in a population or for a use that is different than those evaluated in the review (applicability) • The Evidence-base Practice Center (EPC) approach does not include these factors when assessing directness for strength of evidence (SOE). • Reports prepared by EPC reviewers are intended for use by a broad spectrum of stakeholders. • Describing applicability of evidence separately from SOE allows a wide variety of decisionmakers to consider how the evidence relates to their particular test and population. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Hartman KE, Matchar DB, Chang SM. Assessing applicability of medical test studies in systematic reviews. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principle 3: The Principle Domains of GRADE Can Be AdaptedTo Assess a Body of Evidence on Diagnostic Test Accuracy (5 of 5) • Precision is affected by the width of the confidence intervals for diagnostic accuracy measurements. • It is integrally related to sample size. • It is important to consider how clinically meaningful outcomes may be impacted by imprecision. • This may involve calculation of post-test probabilities over a range of values for sensitivity and specificity (see Slide 11). • A more formal analysis with a decision model could be beneficial. • The strength-of-evidence grade for precision should not be downgraded if: • The impact of imprecision on outcomes is negligible. • There is enough precision that decisions based on the test are unaffected. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Owens DK, Lohr KN, Atkins D, et al. J Clin Epidemiol 2002 May;63(5):513023. PMID: 19595577. Trikalinos TA, Kulasingam S, Lawrence WF. Deciding whether to complement a systematic review of medical tests with decision modeling. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principle 4: Additional GRADE Domains Can Be Adapted To Assess a Body of Evidence on Diagnostic Test Accuracy • Additional domains can be adapted • Publication bias • Use when negative or no-difference findings appear to be unpublished/unavailable • Dose-response association • For tests with continuous outcomes/multiple cutoffs (e.g., serum prostate-specific antigen), may support underlying mechanism of detection • Existence of plausible unmeasured confounders • May be relevant to testing strategies that predict outcomes • Example: a study biased to find low diagnostic accuracy actually finds very high diagnostic accuracy • Strength of association (i.e., magnitude of effect) • Use when comparing accuracy of two different tests where one is more accurate Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

GRADE Domains and Descriptions (1 of 2) Adapted from Agency for Health Care Research and Quality. Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Available at www.effectivehealthcare.ahrq.gov/methodsguide.cfm.

GRADE Domains and Descriptions (2 of 2) Adapted from Agency for Health Care Research and Quality. Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Available at www.effectivehealthcare.ahrq.gov/methodsguide.cfm.

Additional GRADE Domains and Descriptions (1 of 2) Adapted from Agency for Health Care Research and Quality. Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Available at www.effectivehealthcare.ahrq.gov/methodsguide.cfm.

Additional GRADE Domains and Descriptions (2 of 2) Adapted from Agency for Health Care Research and Quality. Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Available at www.effectivehealthcare.ahrq.gov/methodsguide.cfm.

Principle 5: Multiple Domains Should Be Incorporated Into an Overall Assessment in a Transparent Way (1 of 2) • Strength-of-evidence grades reflect a global assessment of principle and associated domains. • This assessment should generate an overall summary grade (high, moderate, low, or insufficient evidence). • Focus on providing an overall grade for the relevant Key Question link in the analytical chain or in outcomes relevant to patients/decisionmakers. • These outcomes should be identified beforehand. • Consider how to incorporate multiple domains into the overall assessment. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principle 5: Multiple Domains Should Be Incorporated Into an Overall Assessment in a Transparent Way (2 of 2) • There is no empirical evidence for a difference between assigning grades based on qualitative versus quantitative approaches. • GRADE approach • The ordinal rankings are high  moderate  low  very low. • These grades/overall rankings were developed using the eight GRADE domains. • EPC approach • The approach allows greater flexibility in grading. • Whichever approach is used, the rationale for weighting the domains in assigning summary grades should be described. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Approach to Evidence Grading Adapted from Agency for Health Care Research and Quality. Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Available at www.effectivehealthcare.ahrq. gov/methodsguide.cfm.

Evidence Grading Examples:Considering Accuracy Outcomes (1 of 6) • When grading a body of evidence on diagnostic test accuracy, carefully consider which accuracy outcomes are linked to clinical outcomes. • The number of false-positive results are important when the diagnostic test is followed by an invasive procedure. • Positive predictive value, negative predictive value, and likelihood ratios also important when the test is part of a management strategy. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Evidence Grading Examples:Considering Accuracy Outcomes (2 of 6) Identifying Norovirus Infection in a Health Care Setting * Data taken from MacCannell T, Umscheid CA, Agarwal RK, et al. Infect Control Hosp Epidemiol 2011 Oct;32(10): 939-69. PMID: 21931246. † These outcomes were considered the most critical by the guideline developers. Data taken from Turcios RM, Widdowson MA, Sulka AC, et al. Clin Infect Dis 2006 Apr 1;42(7):964-9. PMID: 16511760. ‡ These modifiers can impact the GRADE rating by 1 or 2 points. Data taken from Turcios et al. (2011). § If applicable, consider the additional domains of strength of association, dose-response, and impact of plausible confounders. CON = consistency; DIAG = diagnostic test; DIR = directness; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; NPV = negative predictive value; PPV = positive predictive value; PRE = precision; PUB = publication bias; RB = risk of bias Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Evidence Grading Examples:Considering Accuracy Outcomes (3 of 6) • This example concerns accuracy unrelated to outcomes when the test is part of a treatment versus watchful-waiting algorithm. • A review was conducted on the use of noninvasive imaging in addition to a standard workup after detection of a breast lesion. • Studies were intended to triage women into three groups: “biopsy,”“watchful waiting,”, or “return to normal screening intervals.” • Although the accuracy of the exact diagnosis was low, clinical management was the same. • This finding holds true if the post-test probability does not cross the decision management threshold. • It shows need to evaluate the surrogate measure in the context of the clinical outcome. Bruening W, Uhl S, Fontanarosa J, et al. Comparative Effectiveness Review No. 47. Available at www.ncbi.nlm.nih.gov/books/NBK84530/pdf/TOC.pdf. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Evidence Grading Examples:Considering Accuracy Outcomes (4 of 6) • Two reviewers should independently score the major outcomes and comparisons within each Key Question. • Each reviewer should assess each of the following domains to grade each outcome: • Directness • Precision • Consistency • Risk of bias • Publication bias (as needed) • Dose-response association (as needed) • Strength of association (as needed) • Impact of unmeasured confounders (as needed) Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Evidence Grading Examples:Considering Accuracy Outcomes (5 of 6) • The outcomes reviewers choose as critical affect how the domains of directness, precision, and consistency are graded. • In a comparison of multisplice coronary computed tomography scans versus coronary angiography: • True-positive, true-negative, and false-positive results were considered direct evidence. • There was little uncertainty about the clinical implications of these results. • False-negative results were considered indirect evidence. • There was uncertainty about the clinical implications of these results. • The overall results were: a “low” strength-of-evidence grade for false-negative results, and a “moderate” strength-of-evidence grade for other accuracy outcomes. Schunemann HJ, Oxman AD, Brozek J, et al. BMJ 2008 May 17;336(7653):1106-10. PMID: 18483053. Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Evidence Grading Examples:Considering Accuracy Outcomes (6 of 6) • It is reasonable to consider either: • A more flexible qualitative approach to grading • A standard ordinal approach (ranging from high to insufficient strength of evidence) • Resolve differences in domain assessments and outcome grades. • Describe how the consensus score was reached (e.g., discussion, third-party adjudication). • Arrive at a single summary grade through transparent and systematic methods. • Consider the impact of various accuracy outcomes on the overall strength-of-evidence grade. • Identify which of these outcomes are “key.” Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Summary • The greatest challenge is assessing multiple links in a chain of evidence connecting test performance to changes in clinical outcomes. • This module focused on diagnostic test performance, a crucial link in the chain. • No one system for grading the strength of evidence for diagnostic tests has been shown to be superior to any other; many are still early in development. • Reviews are made more useful to decisionmakers by: • Applying the consistent and transparent system of grading using the discussed domains • Giving an explicit rationale for the choice of grades based on these domains Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Key Messages (1 of 3) • The GRADE (Grading of Recommendations Assessment, Development, and Evaluation) system can be used for diagnostic tests. • Clinical outcomes to consider (if available): • Effectiveness • Harm • When using intermediate accuracy outcomes, an analytic framework is necessary to: • Describe how the test is related to clinical outcomes • Delineate the individual questions that can be answered Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Key Messages (2 of 3) • Reviewers can use these domains to grade the strength of evidence of a diagnostic test: • Risk of bias • Directness • Consistency • Precision • Publication bias • Dose-response association • Strength of association • Plausible unmeasured confounders Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Key Messages (3 of 3) • Consider the test’s clinical context when: • Selecting accuracy outcomes (i.e., sensitivity, specificity, positive predictive value, negative predictive value, true-positive/negative outcomes, and false-positive/negative outcomes) • Determining the needed precision levels of these outcomes • Reviewers should consider explaining: • The rationale for a particular summary grade • Domains that were weighted the most in assigning the grade Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Practice Question 1 (1 of 2) • A summary receiver operating characteristic (ROC) curve can be used to assess consistency of diagnostic accuracy among studies. • True • False

Practice Question 1 (2 of 2) Explanation for Question 1: The statement is true. The ROC curve can display the sensitivity and specificity results from various studies. This offers a helpful visual to assess consistency or homogeneity in the direction and magnitude of results across different studies.

Practice Question 2 (1 of 2) • When diagnostic tests are used as a triage test to other more invasive tests, there is a direct link between the accuracy of a diagnostic test and clinical outcomes. • True • False

Practice Question 2 (2 of 2) Explanation for Question 2: The statement is false. Diagnostic tests are often used in triage or as part of a decision tree. In that case, the accuracy of risk classification is more important than the accuracy of diagnosis. The triage test is not linked directly to the clinical outcome.

Practice Question 3 (1 of 2) • If the question is asking about the effect of testing on clinical outcomes, then a body of evidence that measures sensitivity and specificity as the outcomes should be considered indirect. • True • False

Practice Question 3 (2 of 2) Explanation for Question 3: The statement is true. Sensitivity and specificity are considered intermediate outcomes, or accuracy outcomes, that are used to assess diagnostic tests. They are considered indirect evidence of clinical outcomes.

Practice Question 4 (1 of 2) • It is preferable to use a quantitative approach to assigning grades when feasible as opposed to using a qualitative approach. • True • False

Practice Question 4 (2 of 2) Explanation for Question 4: The statement is false. There is no empirical evidence of a difference between quantitative and qualitative approaches to assigning grades.

Authors • This presentation was prepared by Brooke Heidenfelder, Rachael Posey, Lorraine Sease, Remy Coeytaux, Gillian Sanders, and Alex Vaz, members of the Duke University Evidence-based Practice Center. • The module is based on Singh S, Chang SM, Matchar DB, et al. Grading a body of evidence on diagnostic tests. In: Chang SM and Matchar DB, eds. Methods guide for medical test reviews. Rockville, MD: Agency for Healthcare Research and Quality; June 2012. p. 7.1-15. AHRQ Publication No. 12-EHC017. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Grading the Strength of a Body of Evidence on Diagnostic Tests