GRADE approach to diagnostic tests and strategies

Holger Schünemann, MD, PhD Professor and Chair, Dept. of Clinical Epidemiology & Biostatistics Professor of Medicine Michael Gent Chair in Healthcare Research McMaster University, Hamilton, Canada GRADE approach to diagnostic tests and strategies November 09, 2011

Disclosure • Co-chair GRADE Working Group • Work with various guideline groups using GRADE • American College of Physicians (ACP) Clinical Practice Guidelines Committee • American College of Chest Physicians (ACCP) • World Health Organization: Advisory Committee for Health Research, Expert Advisory Panel on Clinical Practice Guidelines and Clinical Research Methods and Ethics & Chair of various guideline panels; funding for guideline development • No direct/personal for profit payments

Agenda 09.00 — 10.00 Benvenuto e Introduzione: Formulareunastrategia per lo sviluppodellelineeguidaclinico-diagnostiche: il GRADE, T. Trenti (Modena) S. Pregno (Modena) 10.00 — 11.00 Large group session and Discussion, H. Schunemann (McMaster University, Ontario, Canada): The GRADE approach: Introduction 11.00 — 11.15 Break 11.15 — 12.00 Large group session and Discussion: The GRADE approach to diagnostic tests and strategies 12.00 — 12.30 Small group session: Asking a question, specifying outcomes 12.30 — 13.30 Lunch 13.30 — 15.15 Small group session: Grading quality of evidence and making recommendations: review of SoF Tables 15.15 — 15.30 Break 15.30 — 16.00 Large group session: Discussion, questions, feedback 16.00 — 17.30 Round table discussion/Tavolarotonda: cosa fare nella nostra realtaDipartimenti, Aziende, Areevaste, Collaborazioninazionaliedinternazionali per tradurrel'appropriatezza in praticaclinica?: Moderatori: A. Liberati (Modena), M. Plebani (Padova), Partecipano: G. Longo (Modena), G. Pinelli (Modena), H. Schunemann (Canada), Massimo Gion (Venezia), L. Casolari (Modena) Conclusione G. Caroli (Modena)

The Department of Clinical Epidemiology & Biostatistics History • 1967 – Founded by David Sackett • 6 chairs since • Instrumental in specialty of Clinical Epidemiology, INCLEN • Birthplace of “Evidence-Based Medicine” People 45 full time and joint faculty ~ 120 associate & part time faculty; 19 emeritus ~ 180 staff ~ 200 PhD and Master students (3 programs) Faculty of Health Sciences amongst top 50 in the world

The GRADE approach: Introduction • Large group session and DiscussionThe GRADE approach to diagnostic tests and strategies • Quality criteria • Outcome selection • Recommendations • Directness

Today’s presentations The GRADE approach: an introduction The GRADE approach to diagnostic tests and strategies: QoE Evidence to action = recommendations

Create evidence profile with GRADEpro Summary of findings & estimate of effect for each outcome Guideline development Gradeoverall quality of evidence across outcomes based on lowest quality of critical outcomes Rate quality of evidence for each outcome Outcomes across studies Formulate question Rate importance Select outcomes Risk of bias Inconsistency Indirectness Imprecision Publication bias P I C O Outcome Critical High  Outcome Critical Moderate O Grade down Low OO Outcome Important Very low OOO Outcome Not important Large effect Dose response Confounders Grade up Input? Panel • Grade recommendations • For or against (direction)  • Strong or conditional/weak (strength) • By considering balance of: • Quality of evidence • Balance benefits/harms • Values and preferences • Revise if necessary by considering: • Resource use (cost) Systematic review Guideline • Formulate Recommendations ( | …) • “We recommend using…” | “Clinicians should…” • “We suggest using…” | “Clinicians might…” • “We suggest not using…”| “Clinicians … not…” • “We recommend not using…”| “Clinicians should not…”

M BMJ 17 May 2008 Volume 336 | Page 1106-1110

GRADE for diagnosis • Shares the fundamental logic of assessment for treatment • However, assessments present unique challenges • Examples and solutions for how to deal with challenges • Focus on importance to patients and consumers!

Testing makes a variety of contributions to patient care • Clinicians use tests that are usually referred to as “diagnostic” • signs and symptoms, imaging, biochemistry, pathology, and psychological • Some tests naturally report positive and negative results (pregnancy) • Other tests report their results in categories (e.g. imaging) • Today we assume a diagnostic approach that ultimately categorizes test results as positive or negative

HUNTINGTON’sChorea

S • Scott Redford 18 years 33 years © John D Badger

sensitivity = 98.8% specificity = 100% pre-testprobabilityinchildren: 50% disease present disease absent 494 0 + DNA test 6 500 – 500 500

longer life fewersymptoms fewercomplications betterquality of life sensitivity specificity

no prevention no treatment

? • wouldyourecommend • genetictesting • for children?

Types of questions Background Questions Definition: What is NT-pro-BNP? Mechanism: How does the ECLIA work? Foreground Questions What is the test accuracy? Benefit > harm: In patients suspected of CHF, does use of NT-pro-BNP compared with not using the test improve survival, …?

triage current test replacement current test current test add on P. Bossuyt et al. BMJ 2006

Purpose of a test • Triage • to minimize use of an invasive or expensive test • Replacement • to replace test that is if, e.g., harmful or costly • Add-on • to improve diagnosis beyond what is already done Bossuyt et al. BMJ 2006

Test accuracy is a surrogate for patient important outcomes • When clinicians think about diagnostic tests, they focus on their accuracy • Underlying assumption: obtaining a better idea of whether a target condition is present or absent will result in superior patient management and improved outcome.

Study designs for diagnosis • If a test fails to improve important outcomes: no reason to use it, whatever its accuracy • Best way to assess diagnostic strategy: randomized controlled trial in which investigators focus on patient-important outcomes

Study designs I Focus on: mortality, morbidity, symptoms, and quality of life GRADE approach for treatment or intervention

Study designs II

accuracy sensitivity & specificity Patient-important consequences TP TN FP FN (treated…) (reassured…) (needlessly treated…) (not treated...) Inconclusiveresults Complications with tests Cost (resource use)

longer life fewersymptoms fewercomplications betterquality of life • surrogate sensitivity specificity

Today’s presentations The GRADE approach: an introduction The GRADE approach to diagnostic tests and strategies: QoE Evidence to action = recommendations

GRADE: recommendation – quality of evidence Clear separation: 1) Recommendation: 2 grades – weak/conditional/optional or strong (for or against an intervention)? • Balance of benefits and downsides, values and preferences, resource use and quality of evidence 2) 4 categories of quality of evidence:  (High), (Moderate), (Low), (Very low)? • methodological quality of evidence • likelihood of bias • by outcome and across outcomes *www.GradeWorking-Group.org

GRADE Quality of Evidence In the context of making recommendations: • The quality of evidence reflects the extent of our confidence that the estimates of an effect are adequate to support a particular decision or recommendation.

Likelihood of and confidence in an outcome

Confidence in evidence /A/High:We are very confident that the true estimate lies close to that of the effect estimate. /B/Moderate:: We are moderately confident in the effect estimate: The true effect is likely to be close to the obtained estimate, but there is a possibility that it is substantially different. /C/Low : Our confidence in the effect estimate is low: The true effect may be substantially different from the observed estimate of the effect. /D/Very low : We have very low confidence in the effect estimate: The true effect is likely to be substantially different from the observed estimate of effect.

Definition of grades of evidencefor research • /A/High: Further research is very unlikely to change confidence in the observed estimate of effect. • /B/Moderate: Further research can have an important impact on confidence in the estimate of effect and may change the estimate. • /C/Low: Further research is very likely to have an important impact on confidence in the estimate of the observed effect and is likely to change the estimate. • /D/Very low: Further research will have an important impact on the confidence in the estimate of effect and further research will is extremely likely to change the observed estimate. Any estimate of effect is very uncertain.

Determinants of quality for diagnostic questions • RCTs and direct observational studies:  • 5 factors that can lower quality • limitations in detailed design and execution (risk of bias criteria) • Inconsistency (or heterogeneity) • Indirectness (PICO and applicability) • Imprecision (number of events and confidence intervals) • Publication bias • Theoretically 3 factors can increase quality • large magnitude of effect • all plausible residual confounding and bias may be working to reduce the demonstrated effect or increase the effect if no effect was observed • dose-response gradient

Factors that decrease the quality of evidence (and how they differ from treatment approach) • risk of bias • publicationbias • inconsistency • imprecision • indirectness TP FP FN TN

1. Risk of bias or study limitations Study design Different quality criteria for accuracy studies Valid accuracy studies: • Diagnostic uncertainty • Consecutive patients • Evaluators should be blinded • Quadas I

Risk of bias/study limitations

QUADAS-2

Risk of bias Example of a risk of bias assessment using QUADAS1 criteria (from: modified from Steingart KR, et al. Commercial serological tests for the diagnosis of active pulmonary and extrapulmonary tuberculosis: an updated systematic review and meta-analysis. PLoS Med 2011;8(8):e1001062.)

Would you downgrade for risk of bias? No, there are no serious limitations Yes, there are serious limitations Yes, there are very serious limitations

Risk of bias (example)

2. Publication bias • A high risk of publication bias (e.g. evidence only from small studies showing excellent properties, or asymmetry in a funnel plot) can lower the quality of evidence. • Various methods to evaluate – none perfect, but clearly a problem • E.g. However, it is prudent to assume some degree of publication bias, as studies showing poor performance of serological tests may have been less likely to be published, especially because several studies were industry supported.

Publication bias (example)

3. Inconsistency • Similar quality criteria & judgments but: other measures • I2? • P-value for test of heterogeneity • Overlap in CI • Difference in estimates

Unexplained heterogeneity? • PICO • Threshold levels (I)

GRADE approach to diagnostic tests and strategies

GRADE approach to diagnostic tests and strategies

Presentation Transcript

Diagnostic Tests

Diagnostic Tests

Diagnostic Tests

Diagnostic Tests

Diagnostic and Screening Tests

Diagnostic Tests

Diagnostic tests

Further diagnostic tests

Diagnostic tests

Diagnostic tests

DIAGNOSTIC TESTS

Further Diagnostic Tests

Laboratory and Diagnostic Tests

Diagnostic tests

Diagnostic Tests

DIAGNOSTIC TESTS

DIAGNOSTIC TESTS

Diagnostic Tests

Diagnostic Tests

Diagnostic Approach To Jaundice

Diagnostic tests

Interpreting Diagnostic Tests