Diagnostic Testing

Diagnostic Testing Ethan Cowan, MD, MS Department of Emergency Medicine Jacobi Medical Center Department of Epidemiology and Population Health Albert Einstein College of Medicine

The Provider Dilemma • A 26 year old pregnant female presents after twisting her ankle. She has no abdominal or urinary complaints. The nurse sends a UA and uricult dipslide prior to you seeing the patient. What should you do with the results of these tests?

The Provider Dilemma • Should a provider give antibiotics if either one or both of these tests come back positive?

Why Order a Diagnostic Test? • When the diagnosis is uncertain • Incorrect diagnosis leads to clinically significant morbidity or mortality • Diagnostic test result changes management • Test is cost effective

Clinician Thought Process • Clinician derives patient prior prob. of disease: • H & P • Literature • Experience • “Index of Suspicion” • 0% - 100% • “Low, Med., High”

Probability of Disease 0% 100% Testing Zone P(+) P(-) Threshold Approach to Diagnostic Testing • P < P(-) Dx testing & therapy not indicated • P(-) < P < P(+) Dx testing needed prior to therapy • P > P(+) Only intervention needed Pauker and Kassirer, 1980, Gallagher, 1998

Probability of Disease 0% 100% Testing Zone P(+) P(-) Threshold Approach to Diagnostic Testing • Width of testing zone depends on: • Test properties • Risk of excess morbidity/mortality attributable to the test • Risk/benefit ratio of available therapies for the Dx Pauker and Kassirer, 1980, Gallagher, 1998

Reliability Inter observer Intra observer Correlation B&A Plot Simple Agreement Kappa Statistics Validity Sensitivity Specificity NPV PPV ROC Curves Test Characteristics

Reliability • The extent to which results obtained with a test are reproducible.

Reliability Not Reliable Reliable

Intra rater reliability • Extent to which a measure produces the same result at different times for the same subjects

Inter rater reliability • Extent to which a measure produces the same result on each subject regardless of who makes the observation

Correlation (r) • For continuous data • r = 1 perfect • r = 0 none O1 O1 = O2 O2 Bland & Altman, 1986

Correlation (r) • Measures relation strength, not agreement • Problem: even near perfect correlation may indicate significant differences between observations O1 r = 0.8 O1 = O2 O2 Bland & Altman, 1986

Bland & Altman Plot O1 – O2 • For continuous data • Plot of observation differences versus the means • Data that are evenly distributed around 0 and are within 2 STDs exhibit good agreement 10 0 -10 [O1 + O2] / 2 Bland & Altman, 1986

a b c d Simple Agreement Rater 1 Rater 2 • Extent to which two or more raters agree on the classifications of all subjects • % of concordance in the 2 x 2 table (a + d) / N • Not ideal, subjects may fall on diagonal by chance - + total - a + b + c + d total a + c b + d N

a b c d Kappa Rater 1 Rater 2 • The proportion of the best possible improvement in agreement beyond chance obtained by the observers • K = (pa – p0)/(1-p0) • Pa = (a+d)/N (prop. of subjects along the main diagonal) • Po = [(a + b)(a+c) + (c+d)(b+d)]/N2 (expected prop.) - + total - a + b + c + d total a + c b + d N

K=1 K > 0.80 0.60 < K < 0.80 0.40 < K < 0.60 0 < K < 0.40 K = 0 K < 0 Perfect Excellent Good Fair Poor Chance (pa = p0) Less than chance Interpreting Kappa Values

n11 n12 ... n1C n21 n22 ... n2C . . . . ... ... . . nC1 nC2 ... nCC Weighted Kappa Rater 1 Rater 2 1 2 ... C total • Used for more than 2 observers or categories • Perfect agreement on the main diagonal weighted more than partial agreement off of it. 1 n1. 2 n2. . . . . C nC. total n.1 n.2 ... n.C N

Validity • The degree to which a test correctly diagnoses people as having or not having a condition • Internal Validity • External Validity

Validity Valid, not reliable Reliable and Valid

Internal Validity • Performance Characteristics • Sensitivity • Specificity • NPV • PPV • ROC Curves

2 x 2 Table Disease Status TP = True Positives FP = False Positives total noncases cases positives Test Result + TP FP negatives - FN TN total cases noncases N TN = True Negatives FN = False Negatives

Gold Standard • Definitive test used to identify cases • Example: traditional agar culture • The dipstick and dipslide are measured against the gold standard

Sensitivity (SN) Disease Status • Probability of correctly identifying a true case • TP/(TP + FN) = TP/ cases • High SN, Negative test result rules out Dx (SnNout) total noncases cases positives Test Result + TP FP negatives - FN TN total cases noncases N Sackett & Straus, 1998

Specificity (SP) Disease Status • Probability of correctly identifying a true noncase • TN/(TN + FP) = TN/ noncases • High SP, Positive test result rules in Dx (SpPin) total noncases cases positives Test Result + TP FP negatives - FN TN total cases noncases N Sackett & Straus, 1998

Problems with Sensitivity and Specificity • Remain constant over patient populations • But, SN and SP convey how likely a test result is positive or negative given the patient does or does not have disease • Paradoxical inversion of clinical logic • Prior knowledge of disease status obviates need of the diagnostic test Gallagher, 1998

Positive Predictive Value (PPV) Disease Status • Probability that a labeled (+) is a true case • TP/(TP + FP) = TP/ total positives • High SP corresponds to very high PPV (SpPin) total noncases cases positives Test Result + TP FP negatives - FN TN total cases noncases N Sackett & Straus, 1998

Negative Predictive Value (NPV) Disease Status • Probability that a labeled (-) is a true noncase • TN/(TN + FN) = TP/ total negatives • High SN corresponds to very high NPV (SnNout) total noncases cases positives Test Result + TP FP negatives - FN TN total cases noncases N Sackett & Straus, 1998

Vulnerable to Disease Prevalence (P) Shifts Do not remain constant over patient populations As P PPV NPV As P PPV NPV Predictive Value Problems Gallagher, 1998

Flipping a Coin to Dx AMI for People with Chest Pain ED AMI Prevalence 6% SN = 3 / 6 = 50%SP = 47 / 94 = 50% PPV= 3 / 50 = 6%NPV = 47 / 50 = 94% Worster, 2002

Flipping a Coin to Dx AMI for People with Chest Pain CCU AMI Prevalence 90% SN = 45 / 90 = 50% SP = 5 / 10 = 50% PPV= 45 / 50 = 90%NPV = 5 / 50 = 10% Worster, 2002

1.0 Sensitivity (TPR) 0.0 0.0 1.0 1-Specificity (FPR) Receiver Operator Curve • Allows consideration of test performance across a range of threshold values • Well suited for continuous variable Dx Tests

Receiver Operator Curve • Avoids the “single cutoff trap” Sepsis Effect No Effect WBC Count Gallagher, 1998

Area Under the Curve (θ) 1.0 • Measure of test accuracy • (θ) 0.5 – 0.7 no to low discriminatory power • (θ) 0.7 – 0.9 moderate discriminatory power • (θ) > 0.9 high discriminatory power Sensitivity (TPR) 0.0 0.0 1.0 1-Specificity (FPR) Gryzybowski, 1997

Problem with ROC curves • Same problems as SN and SP “Reverse Logic” • Mainly used to describe Dx test performance

Physical Exam + OR CT Scan - - + No Appy Appy Appendicitis Example • Study design: • Prospective cohort • Gold standard: • Pathology report from appendectomy or CT finding (negatives) • Diagnostic Test: • Total WBC Cardall, 2004

Appendicitis Example SN 76% (65%-84%) SP 52% (45%-60%) PPV 42% (35%-51%) NPV 82% (74%-89%) Cardall, 2004

Physical Exam + OR CT Scan - - + No Appy Appy Appendicitis Example • Patient WBC: • 13,000 • Management: • Get CT with PO & IV Contrast Cardall, 2004

Abdominal CT

Follow UP • CT result: acute appendicitis • Patient taken to OR for appendectomy

But, was WBC necessary? Answer given in talk on Likelihood Ratios

Diagnostic Testing