Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy

Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Chapter 8 – Multiple Tests and Multivariable Decision Rules Chapter 5 – Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/19/2006

Outline of Topics • Combining results of multiple tests: importance of test non-independence • Recursive Partitioning • Logistic Regression • Published “rules” for combining test results: importance of validation separate from derivation • Biases in studies of diagnostic test accuracy Overfitting bias Incorporation bias Referral bias Double gold standard bias Spectrum bias

Warning: Different Example Example of combining two tests in this talk: Prenatal sonographic Nuchal Translucency (NT) and Nasal Bone Exam (NBE) as dichotomous tests for Trisomy 21* Example of combining two tests in book**: Premature birth (GA < 36 weeks) and low birth weight (BW < 2500 grams) as dichotomous tests for neonatal morbidity *Cicero, S., G. Rembouskos, et al. (2004). "Likelihood ratio for trisomy 21 in fetuses with absent nasal bone at the 11-14-week scan." Ultrasound Obstet Gynecol23(3): 218-23. **Soon to be replaced

If NT ≥ 3.5 mm Positive for Trisomy 21* *What’s wrong with this definition?

In general, don’t make multi-level tests like NT into dichotomous tests by choosing a fixed cutoff • I did it here to make the discussion of multiple tests easier • I arbitrarily chose to call ≥ 3.5 mm positive

One Dichotomous Test Trisomy 21 Nuchal D+ D- LR Translucency ≥ 3.5 mm 212 478 7.0 < 3.5 mm 121 4745 0.4 Total 333 5223 Do you see that this is (212/333)/(478/5223)? Review of Chapter 3: What are the sensitivity, specificity, PPV, and NPV of this test? (Be careful.)

Nuchal Translucency • Sensitivity = 212/333 = 64% • Specificity = 4745/5223 = 91% • Prevalence = 333/(333+5223) = 6% (Study population: pregnant women about to under go CVS, so high prevalence of Trisomy 21) PPV = 212/(212 + 478) = 31% NPV = 4745/(121 + 4745) = 97.5%* * Not that great; prior to test P(D-) = 94%

Clinical Scenario – One TestPre-Test Probability of Down’s = 6%NT Positive Pre-test prob: 0.06 Pre-test odds: 0.06/0.94 = 0.064 LR(+) = 7.0 Post-Test Odds = Pre-Test Odds x LR(+) = 0.064 x 7.0 = 0.44 Post-Test prob = 0.44/(0.44 + 1) = 0.31

Clinical Scenario – One Test Pre-Test Probability of Tri21 = 6%NT PositivePost-Test Probability of Tri21 = 31% Using Probabilities Using Odds Pre-Test Odds of CAD = 0.064EECG Positive (LR = 7.0)Post-Test Odds of CAD = 0.44

Clinical Scenario – One TestPre-Test Probability of Tri21 = 6%NT Positive NT + (LR = 7.0) |---------------> +-------------------------X---------------X------------------------------+ | | | | | | | Log(Odds) 2 -1.5 -1 -0.5 0 0.5 1 Odds 1:100 1:33 1:10 1:3 1:1 3:1 10:1 Prob 0.01 0.03 0.09 0.25 0.5 0.75 0.91 Odds = 0.064 Prob = 0.06 Odds = 0.44 Prob = 0.31

Nasal Bone Seen NBE Negative for Trisomy 21 Nasal Bone Absent NBE Positive for Trisomy 21

Second Dichotomous Test Nasal Bone Tri21+ Tri21- LR Absent 229 129 27.8 Present 104 5094 0.32 Total 333 5223 Do you see that this is (229/333)/(129/5223)?

Pre-Test Probability of Trisomy 21 = 6%NT Positive for Trisomy 21 (≥ 3.5 mm)Post-NT Probability of Trisomy 21 = 31%NBE Positive for Trisomy 21 (no bone seen)Post-Nuclide Probability of Trisomy 21 = ? Clinical Scenario –Two Tests Using Probabilities

Clinical Scenario – Two Tests Using Odds Pre-Test Odds of Tri21 = 0.064NT Positive (LR = 7.0)Post-Test Odds of Tri21 = 0.44NBE Positive (LR = 27.8?)Post-Test Odds of Tri21 = .44 x 27.8? = 12.4? (P = 12.4/(1+12.4) = 92.5%?)

Clinical Scenario – Two TestsPre-Test Probability of Trisomy 21 = 6%NT ≥ 3.5 mm AND Nasal Bone Absent NT + (LR = 6.96) |---------------> NBE + (LR = 27.8) |---------------------------> NT + NBE + Can we do this? |--------------->|---------------------------> NT + and NBE + +---------------X----------------X----------------------------X-+ | | | | | | | Log(Odds) 2 -1.5 -1 -0.5 0 0.5 1 Odds 1:100 1:33 1:10 1:3 1:1 3:1 10:1 Prob 0.01 0.03 0.09 0.25 0.5 0.75 0.91 Odds = 0.064 Prob = 0.06 Odds = 12.4 Prob = 0.925 Odds = 0.44 Prob = 0.31

Question Can we use the post-test odds after a positive Nuchal Translucency as the pre-test odds for the positive Nasal Bone Examination? i.e., can we combine the positive results by multiplying their LRs? LR(NT+, NBE +) = LR(NT +) x LR(NBE +) ? = 7.0 x 27.8 ? = 194 ?

Answer = No Not 194

Non-Independence Absence of the nasal bone does not tell you as much if you already know that the nuchal translucency is ≥ 3.5 mm.

Clinical Scenario Using Odds Pre-Test Odds of Tri21 = 0.064NT+/NBE + (LR =68.8)Post-Test Odds = 0.064 x 68.8 = 4.40 (P = 4.40/(1+4.40) = 81%, not 92.5%)

Non-Independence NT + |---------------> NBE + |---------------------------> NT + NBE + if tests were independent|--------------->|----------------------------> NT + and NBE + since tests are dependent|-----------------------------------> +---------------X----------------X------------------X----------+ | | | | | | | Log(Odds) 2 -1.5 -1 -0.5 0 0.5 1 Odds 1:100 1:33 1:10 1:3 1:1 3:1 10:1 Prob 0.01 0.03 0.09 0.25 0.5 0.75 0.91 Prob = 0.81

Non-Independence of NT and NBE Apparently, even in chromosomally normal fetuses, enlarged NT and absence of the nasal bone are associated. A false positive on the NT makes a false positive on the NBE more likely. Of normal (D-) fetuses with NT < 3.5 mm only 2.0% had nasal bone absent. Of normal (D-) fetuses with NT ≥ 3.5 mm, 7.5% had nasal bone absent. Some (but not all) of this may have to do with ethnicity. In this London study, chromosomally normal fetuses of “Afro-Caribbean” ethnicity had both larger NTs and more frequent absence of the nasal bone. In Trisomy 21 (D+) fetuses, normal NT was associated with the presence of the nasal bone, so a false negative on the NT was associated with a false negative on the NBE.

Non-Independence Instead of looking for the nasal bone, what if the second test were just a repeat measurement of the nuchal translucency? A second positive NT would do little to increase your certainty of Trisomy 21. If it was false positive the first time around, it is likely to be false positive the second time.

Reasons for Non-Independence Tests measure the same aspect of disease. Consider exercise ECG (EECG) and radionuclide scan as tests for coronary artery disease (CAD) with the gold standard being anatomic narrowing of the arteries on angiogram. Both EECG and nuclide scan measure functional narrowing. In a patient without anatomic narrowing (a D- patient), coronary artery spasm could cause false positives on both tests.

Reasons for Non-Independence Spectrum of disease severity. In the EECG/nuclide scan example, CAD is defined as ≥70% stenosis on angiogram. A D+ patient with 71% stenosis is much more likely to have a false negative on both the EECG and the nuclide scan than a D+ patient with 99% stenosis.

Reasons for Non-Independence Spectrum of non-disease severity. In this example, CAD is defined as ≥70% stenosis on angiogram. A D- patient with 69% stenosis is much more likely to have a false positive on both the EECG and the nuclide scan than a D- patient with 33% stenosis.

Counterexamples: Possibly Independent Tests For Venous Thromboembolism: • CT Angiogram of Lungs and Doppler Ultrasound of Leg Veins • Alveolar Dead Space and D-Dimer • MRA of Lungs and MRV of leg veins

Unless tests are independent, we can’t combine results by multiplying LRs

Ways to Combine Multiple Tests On a group of patients (derivation set), perform the multiple tests and determine true disease status (apply the gold standard) • Measure LR for each possible combination of results • Recursive Partitioning • Logistic Regression

Determine LR for Each Result Combination *Assumes pre-test prob = 6%

Determine LR for Each Result Combination 2 dichotomous tests: 4 combinations 3 dichotomous tests: 8 combinations 4 dichotomous tests: 16 combinations Etc. 2 3-level tests: 9 combinations 3 3-level tests: 27 combinations Etc.

Determine LR for Each Result Combination How do you handle continuous tests? Not practical for most groups of tests.

Recursive PartitioningMeasure NT First

Recursive PartitioningExamine Nasal Bone First

Recursive PartitioningExamine Nasal Bone FirstCVS if P(Trisomy 21 > 5%)

Recursive Partioning • Same as Classification and Regression Trees (CART) • Don’t have to work out probabilities (or LRs) for all possible combinations of tests, because of “tree pruning”

Tree Pruning: Goldman Rule* 8 “Tests” for Acute MI in ER Chest Pain Patient : • ST Elevation on ECG; • CP < 48 hours; • ST-T changes on ECG; • Hx of MI; • Radiation of Pain to Neck/LUE; • Longest pain > 1 hour; • Age > 40 years; • CP not reproduced by palpation. *Goldman L, Cook EF, Brand DA, et al. A computer protocol to predict myocardial infarction in emergency department patients with chest pain. N Engl J Med. 1988;318(13):797-803.

8 tests  28 = 256 Combinations

Recursive Partitioning • Does not deal well with continuous test results* *when there is a monotonic relationship between between the rest result and the probability of disease

Logistic Regression Ln(Odds(D+)) = a + bNTNT+ bNBENBE + binteract(NT)(NBE) “+” = 1 “-” = 0 More on this later in ATCR!

Logistic Regression Approach to the “R/O ACI patient” *Selker HP, Griffith JL, D'Agostino RB. A tool for judging coronary care unit admission appropriateness, valid for both real-time and retrospective use. A time-insensitive predictive instrument (TIPI) for acute cardiac ischemia: a multicenter study. Med Care. Jul 1991;29(7):610-627. For corrected coefficients, see http://medg.lcs.mit.edu/cardiac/cpain.htm

Clinical Scenario* 71 y/o man with 2.5 hours of CP, substernal, non-radiating, described as “bloating.” Cannot say if same as prior MI or worse than prior angina. Hx of CAD, s/p CABG 10 yrs prior, stenting 3 years and 1 year ago. DM on Avandia. ECG: RBBB, Qs inferiorly. No ischemic ST-T changes. *Real patient seen by MAK 1 am 10/12/04

What Happened to Pre-test Probability? Typically clinical decision rules report probabilities rather than likelihood ratios for combinations of results. Can “back out” LRs if we know prevalence, p[D+], in the study dataset. With logistic regression models, this “backing out” is known as a “prevalence offset.” (See Chapter 8A.)

Optimal Cutoff for a Single Continuous Test Depends on • Pre-test Probability of Disease • ROC Curve (Likelihood Ratios) • Relative Misclassification Costs Cannot choose an optimal cutoff with just the ROC curve.

Optimal Cutoff Line for Two Continuous Tests

Choosing Which Tests to Include in the Decision Rule Have focused on how to combine results of two or more tests, not on which of several tests to include in a decision rule. Options include: • Recursive partitioning • Automated stepwise logistic regression* Choice of variables in derivation data set requires confirmation in a separate validation data set.

Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy