1 / 42

Studies of Diagnostic Tests

Studies of Diagnostic Tests. Thomas B. Newman, MD, MPH October 15, 2009. Reminders/Announcements. Door must be closed Write down answers to problems in the book and check your answers! Final exam to be passed out 12/3, reviewed 12/10 Send questions!. Overview.

ban
Download Presentation

Studies of Diagnostic Tests

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 15, 2009

  2. Reminders/Announcements • Door must be closed • Write down answers to problems in the book and check your answers! • Final exam to be passed out 12/3, reviewed 12/10 • Send questions!

  3. Overview • Common biases of studies of diagnostic test accuracy • Prevalence, spectrum and nonindependence • Meta-analysis of diagnostic tests • Checklist & systematic approach • Examples: • Physical examination for presentation • Pain with percussion, hopping or cough for appendicitis • Pertussis • Predicting hyperbilirubinemia

  4. Bias #1 Example • Study of BNP to diagnose congestive heart failure (CHF, Chapter 4, Problem 3)

  5. Bias #1 Example • Gold standard: determination of CHF by two cardiologists blinded to BNP • Chest x-rays found to be highly predictive of CHF • Is there a problem with assessing accuracy of chest x-rays to diagnose CHF in this study? *Maisel AS, Krishnaswamy P, Nowak RM, McCord J, Hollander JE, Duc P, et al. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med 2002;347(3):161-7.

  6. Bias #1: Incorporation bias • Cardiologists not blinded to Chest X-ray • Probably used (incorporated) it to make final diagnosis • Incorporation bias for assessment of Chest X-ray (not BNP) • Biases both sensitivity and specificity upward

  7. Bias #2 Example: • Visual assessment of jaundice in newborns • Study patients who are getting a bilirubin measurement • Ask clinicians to estimate extent of jaundice at time of blood draw

  8. Sensitivity of jaundice below the nipple line for TSB ≥ 12 mg/dL = 97% Specificity = 19% What is the problem? Visual Assessment of jaundice*: Results Editor’s Note: The take-home message for me is that no jaundice below the nipple line equals no bilirubin test, unless there’s some other indication. --Catherine D. DeAngelis, MD *Moyer et al., APAM 2000; 154:391

  9. Bias #2: Verification bias • Inclusion criterion for study: gold standard test was done • in this case, blood test for bilirubin • Subjects with positive index tests are more likely to be get the gold standard and to be included in the study • clinicians don’t order blood test for bilirubin if the jaundice is minimal • How doe this affect sensitivity and specificity?

  10. Bias #2: Verification Bias* Sensitivity, a/(a+c), is biased ___. Specificity, d/(b+d), is biased ___. *AKA Work-up, Referral Bias, or Ascertainment Bias

  11. Bias #3 • Example: Pioped study of accuracy of V/Q scan to diagnose pulmonary embolus* • Study Population: All patients presenting to the ED who received a V/Q scan • Test: V/Q Scan • Disease: Pulmonary embolism (PE) • Gold Standards: • 1. Pulmonary arteriogram (PA-gram) if done (more likely with more abnormal V/Q scan) • 2. Clinical follow-up in other patients (more likely with normal VQ scan *PIOPED. JAMA 1990;263(20):2753-9.

  12. Double Gold Standard Bias • Two different “gold standards” • One gold standard (e.g., surgery, invasive test) is more likely to be applied in patients with positive index test, • Other gold standard (e.g., clinical follow-up) is more likely to be applied in patients with a negative index test. • There are some patients in whom the tests do not give the same answer • spontaneously resolving disease • newly occurring disease

  13. Double Gold Standard Bias: effect of spontaneously resolving cases Sensitivity, a/(a+c) biased __ Specificity, d/(b+d) biased __ Double gold standard compared with follow-up for all Double gold standard compared with PA-Gram for all

  14. Double Gold Standard Bias: effect of newly occurring cases Sensitivity, a/(a+c) biased __ Specificity, d/(b+d) biased __ Double gold standard compared with follow-up for all Double gold standard compared with PA-Gram for all

  15. Double Gold Standard Bias: Ultrasound diagnosis of intussusception

  16. What if 10% of the 86 U/S- followed subjects actually had intussusceptions that resolved spontaneously?

  17. Spectrum of Disease, Nondisease and Test Results • Disease is often easier to diagnose if severe • “Nondisease” is easier to diagnose if patient is well than if the patient has other diseases • Test results will be more reproducible if ambiguous results excluded

  18. Spectrum Bias • Sensitivity depends on the spectrum of disease in the population being tested. • Specificity depends on the spectrum of non-disease in the population being tested. • Example: Absence of Nasal Bone (on 13-week ultrasound) as a Test for Chromosomal Abnormality

  19. Spectrum Bias Example: Absence of Nasal Bone as a Test for Chromosomal Abnormality* Sensitivity = 229/333 = 69% BUT the D+ group only included fetuses with Trisomy 21 Cicero et al., Ultrasound Obstet Gynecol 2004;23: 218-23

  20. Spectrum Bias: Absence of Nasal Bone as a Test for Chromosomal Abnormality • D+ group excluded 295 fetuses with other chromosomal abnormalities (esp. Trisomy 18) • Among these fetuses, sensitivity 32% (not 69%) • What decision is this test supposed to help with? • If it is whether to test chromosomes using chorionic villus sampling or amniocentesis, these 295 fetuses should be included!

  21. Spectrum Bias:Absence of Nasal Bone as a Test for Chromosomal Abnormality, effect of including other trisomies in D+ group Sensitivity = 324/628 = 52% NOT 69% obtained when the D+ group only included fetuses with Trisomy 21

  22. Quiz: What if we considered the nasal bone absence as a test for Trisomy 21? • Then instead of excluding subjects with other chromosomal abnormalities or including them as D+, we should count them as D-. Compared with excluding them, • What would happen to sensitivity? • What would happen to specificity?

  23. Prevalence, spectrum and nonindependence • Prevalence (prior probability) of disease may be related to disease severity • One mechanism is different spectra of disease or nondisease • Another is that whatever is causing the high prior probability is related to the same aspect of the disease as the test

  24. Prevalence, spectrum and nonindependence • Examples • Iron deficiency • Diseases identified by screening • Urinalysis as a test for UTI in women with more and fewer symptoms (high and low prior probability)

  25. Overfitting

  26. Meta-analyses of Diagnostic Tests • Systematic and reproducible approach to finding studies • Summary of results of each study • Investigation into heterogeneity • Summary estimate of results, if appropriate • Unlike other meta-analyses (risk factors, treatments), results aren’t summarized with a single number (e.g., RR), but with two related numbers (sensitivity and specificity) • These can be plotted on an ROC plane

  27. MRI for the diagnosis of MS Whiting et al. BMJ 2006;332:875-84

  28. Studies of Diagnostic Test Accuracy: Checklist • Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? • Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? • Was the reference standard applied regardless of the diagnostic test result? • Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2nd ed. (NY: Churchill Livingstone), 2000. p 68

  29. Systematic Approach • Authors and funding source • Research question • Study design • Study subjects • Predictor variable • Outcome variable • Results & Analysis • Conclusions

  30. A clinical decision rule to identify children at low risk for appendicitis (Problem 5.6) • Study design: prospective cohort study • Subjects • Of 4140 patients 3-18 years presenting to Boston Children’s Hospital ED with CC abdominal pain • 767 (19%) received surgical consultation for possible appendicitis • 113 Excluded (Chronic diseases, recent imaging) • 53 missed • 601 included in the study (425 in derivation set) Kharbanda et al. Pediatrics 2005; 116(3): 709-16

  31. A clinical decision rule to identify children at low risk for appendicitis • Predictor variable • Standardized assessment by PEM attending • Focus on “Pain with percussion, hopping or cough” (complete data in N=381) • Outcome variable: • Pathologic diagnosis of appendicitis for those who received surgery (37%) • Follow-up telephone call to family or pediatrician 2-4 weeks after the ED visit for those who did not receive surgery (63%) Kharbanda et al. Pediatrics116(3): 709-16

  32. A clinical decision rule to identify children at low risk for appendicitis • Results: Pain with percussion, hopping or cough • 78% sensitivity seems low to me. Is it valid for me in deciding whom to image? Kharbanda et al. Pediatrics116(3): 709-16

  33. Checklist • Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? • Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? • Was the reference standard applied regardless of the diagnostic test result? • Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2nd ed. (NY: Churchill Livingstone), 2000. p 68

  34. Systematic approach • Study design: prospective cohort study • Subjects • Of 4140 patients 3-18 years presenting to Boston Children’s Hospital ED with CC abdominal pain • 767 (19%) received surgical consultation for possible appendicitis Kharbanda et al. Pediatrics116(3): 709-16

  35. A clinical decision rule to identify children at low risk for appendicitis • Predictor variable • “Pain with percussion, hopping or cough” (complete data in N=381) • Outcome variable: • Pathologic diagnosis of appendicitis for those who received surgery (37%) • Follow-up telephone call to family or pediatrician 2-4 weeks after the ED visit for those who did not receive surgery (63%) Kharbanda et al. Pediatrics116(3): 709-16

  36. Issues • Sample representative? • Verification bias? • Double-gold standard bias? • Spectrum bias

  37. For children presenting with abdominal pain to SFGH 6-M • Sensitivity probably valid (not falsely low) • But whether all of them tried to hop is not clear • Specificity probably low • PPV is high • NPV is low • Does not address surgical consultation decision

  38. Does this coughing patient have pertussis? • RQ (for us): what are LR for coughing fits, whoop, and post-tussive vomiting in adults with persistent cough? • Design (for one study we reviewed*): Prospective cross-sectional study • Subjects: 217 adults ≥18 years with cough 7-21 days, no fever or other clear cause for cough enrolled by 80 French GPs. • In a subsample from 58 GPs, of 710 who met inclusion criteria only 99 (14%) enrolled *Gilberg S et al. J Inf Dis 2002;186:415-8

  39. Petussis diagnosis • Predictor variables: “GPs interviewed patients using a standardized questionnaire.” • Outcome variable: Evidence of pertussis based on • Culture (N=1) • PCR (N=36) • Or ≥ 2-fold change in anti-pertussis toxin IgG (N=40) • Total N = 70/217 with evidence of pertussis *Gilberg S et al. J Inf Dis 2002;186:415-8

  40. 89% in both groups met CDC criteria for pertussis Results

  41. Issues • Verification (selection) bias: only 14% of eligible subjects included • Questionable gold standard (internally inconsistent) • Nice illustration of difficulty doing a systematic review!

  42. Questions?

More Related