A Skeptic’s Guide to Reading the Medical Literature Joe Lex, MD, FACEP, MAAEM Clinical Professor of Emergency Medicine Temple University School of Medicine Philadelphia, PA USA Joe@JoeLex.net @JoeLex5
Disclaimer No commercial interests to declare
Peer Review = Gold Standard • Slow • Expensive • Ineffective • “Somewhat of a lottery” • Prone to bias and abuse • Hopeless at spotting error and fraud PEER REVIEW
Peer Review = Gold Standard “…peer review is impossible to define in operational terms…Peer review is thus like poetry, love, or justice.” PEER REVIEW Smith R. J R Soc Med. 2006 Apr;99(4):178-82.
Peer Review = Gold Standard “When I was editor of the BMJ I was challenged by two of the cleverest researchers in Britain to publish an issue of the journal comprised only of papers that had failed peer review and see if anybody noticed. I wrote back ‘How do you know I haven’t already done it?’” PEER REVIEW Richard Smith – Editor, BMJ
Peer Review = Gold Standard Two reviewers, same paper Reviewer A: I found this paper an extremely muddled paper with a large number of deficits Reviewer B: It is written in a clear style and would be understood by any reader PEER REVIEW Richard Smith – Editor, BMJ
Peer Review = Gold Standard • Benefits: ??? • Editors are convinced that it is invaluable • “If (peer review) was a drug it would never get onto the market.” PEER REVIEW Drummond Rennie, Deputy Editor of JAMA
Positive-Outcome Bias • Dummy article, 210 reviewers • Same except for conclusion • Half got “positive conclusion” • Half got “no difference” • Intentional errors included • Methods section same PEER REVIEW Emerson GB et al. Arch Int Med 2010, 170: 1934
Positive-Outcome Bias PEER REVIEW Emerson GB et al. Arch Int Med 2010, 170: 1934
Positive-Outcome Bias • “Compromise(s) integrity of the literature” • Inflates treatment effects when included in meta-analysis PEER REVIEW Emerson GB et al. Arch Int Med 2010, 170: 1934
Peer Review Misses Things • Fake manuscript • Contained 10 major and 13 minor errors • Sent to 262 reviewers • 203 returned PEER REVIEW Baxt WB et al. Ann Emerg Med 1998:32, 310
Peer Review Misses Things • 15 ‘accept’ • 17% major, 12% minor • 117 ‘reject’ • 39% major, 21% minor • 67 ‘revise’ • 30% major, 22% minor PEER REVIEW Baxt WB et al. Ann Emerg Med 1998:32, 310
Peer Review Misses Things • 68% did not recognize that conclusion was not supported by data • Failed to identify 68% of major errors PEER REVIEW Baxt WB et al. Ann Emerg Med 1998:32, 310
Cochrane Analysis “We could not identify any methodologically convincing studies assessing the core effects of peer review” “Little empirical evidence to support use of editorial peer review to ensure quality.” PEER REVIEW Jefferson T, et al. Cochrane Database
…strong associations …independent predictors …citation bias …amplification …invention …composite endpoints …workup bias …spectrum bias …referral bias …surrogate markers …publication bias …absolute vs relative risk Watch out for…
Strong Association • There is a “strong association” between hanging out in bars and developing lung cancer • People who hang out in bars tend to smoke and drink a lot • Does hanging out in bars cause lung cancer? ASSOCIATION
Strong Association • There is a “strong association” between the rooster crowing at 06:15 and the sun rising at 06:20 • Does the rooster’s crow cause the sun to rise? ASSOCIATION
Strong Association No correlation with… …sensitivity …specificity …accuracy …usefulness ASSOCIATION
Independent Predictor • If patients are matched in all other parameters, the group with abnormal test results will have more outcome events than the group with normal test results PREDICTOR
Independent Predictor • In the ICU, three organ failure at three days carries a 97% mortality rate • So who don’t I have to treat, since it’s so futile? PREDICTOR
Predictors & Associations Does not mean: “Your test is negative, you can go home now.” “Your test is positive, you need to stay and have more studies.” • Novalue in making diagnostic or therapeutic decisions
Negative Predictive Value • Probability of no disease among patients with a negative test NPV = TN / (TN + FN) • Denominator is number of patients who test negative • So if disease prevalence low… N P V
Negative Predictive Value • NPV does not tell you test value • If disease prevalence 5%, coin flip has NPV = 95% • If disease prevalence 85%, coin flip has PPV = 85% N P V
Citation Bias • Citation bias: not citing papers that refute or weaken belief • Amplification: expand belief system by citing papers that present no data addressing it • Invention: hypothesis converted into fact through citation alone CITATION
Redundant Publication • Ondansetron in post-op vomiting • 84 trials with 11,980 patients • On closer inspection: 70 trials with 8645 patients • 17% duplicate publications • 28% inflation of patients 2 DIP Tramer MR et al. BMJ 1997:315;635-640.
Redundant Publication • Four identical trials published with different authors • Duplicated publications most likely to have positive results • NNT in nonduplicated trials: 9.5 • NNT in duplicated trials: 3.9 2 DIP Tramer MR et al. BMJ 1997:315;635-640.
Multiple Hypotheses • 10,674,945 Ontario residents aged between 18 and 100 years • Derivation cohort 5,337,472 • Validation cohort 5,337,473 • 895 diagnoses for which patients had emergent / urgent hospitalization MULTIPLES
Born Under a Bad Sign • Goal: identify two diagnoses for each astrological sign for which the probability of hospitalization was statistically significantly greater compared to residents born under the remaining 11 signs MULTIPLES
Born Under a Bad Sign • Found 72 such associations • Number of diagnoses ranged from 2 (Scorpio) to 10 (Taurus), with a mean of 6 diagnoses for each astrological sign • The P-values for 72 significant associations: 0.0003 to 0.0488 MULTIPLES
Born Under a Bad Sign • Validation cohort: residents born under sign of Leo had significantly higher probability of hospitalization due to GI hemorrhage compared to other residents of Ontario • Relative risk = 1.15 (p = 0.0483) MULTIPLES
Born Under a Bad Sign • Validation cohort: residents born under sign of Sagittarius had significantly higher probability of hospitalization for fractures of humerus compared to residents born under the remaining 11 astrological signs • Relative risk = 1.38 (p = 0.0125) MULTIPLES
Do the Math!! • If an ineffective therapy is tested for 20 indications, by chance it is likely to demonstrate a significant effect for at least one of these MULTIPLES Ioannidis JP, et al. BMJ. 2010 Sep 13;341:c4875.
Do the Math!! • If 10 outcomes are tested for each indication, a statistically significant effect will be observed for at least one outcome for nearly half of the indications MULTIPLES Ioannidis JP, et al. BMJ. 2010 Sep 13;341:c4875.
Do the Math!! • In 91 trials discontinued early, the true effect was, on average, only 70% of that suggested by trials • It was <50% of perceived effect when trials were discontinued after occurrence of <200 events MULTIPLES Ioannidis JP, et al. BMJ. 2010 Sep 13;341:c4875.
Workup Bias • In routine care the diagnostic workup of a patient is by definition determined by previous test results • Positive screening test is referred to receive verification of diagnosis by “gold standard” WORKUP
Workup Bias • Reference test is always interpreted with knowledge of preceding test information • Example: treadmill stress test • Family practitioner’s office: sensitivity 40%, specificity 85% • Cardiologist’s office: sensitivity 70%, specificity 70% WORKUP
Spectrum Bias • Diagnostic test has different sensitivities or specificities in patients with different clinical manifestations of the disease for which the test is intended SPECTRUM
Spectrum Bias • Patient obviously has disease: test is positive • Patient obviously does not have disease: test is negative • “I don’t know if the patient has the disease” – test in between • Example: BNP SPECTRUM
Spectrum Bias • If pretest probability <5%, you’re more accurate than BNP • If pretest probability >95%, you’re more accurate than BNP B N P The “In Between” Group Sensitivity: 88% Specificity: 55%
Spectrum Bias If the serum BNP is… …>500 pg/mL rule in CHF …<100 pg/mL rule out CHF if low pretest probability …100 – 500 pg/mL: not sensitive, not specific, and not very helpful B N P
Referral Bias • Sanford Guide says: “80% of cat bites get infected” • Dog bite: self-referral because of wound size, rabies worry, etc. • Cat bite: “It’s infected” • Headache in oncologist’s office • Step on a nail REFERRAL
Other Biases • Length bias: indolent disease is more likely to be detected in a screening program than aggressive disease, leading to apparent improved outcome BIASES
Other Biases • Lead-time bias: survival of a screened population is measured from the date of screening, whereas survival of an unscreened population is measured from detection of symptomatic disease BIASES
Absolute vs relative • Can statins prevent VTE? • JUPITER trial says “yes!!” • “Patients taking rosuvastatin (Crestor) 20 mg/day have 43% lower risk of DVT or PE” ARR vs RRR
Absolute vs relative • These patients relatively healthy • ABSOLUTE risk drops from about 0.7% to 0.4% • Treat 333 patients for one year to prevent one DVT or PE ARR vs RRR
Surrogate Endpoints • Some surrogate end points may be part of disease or risk factor • Sometimes surrogate endpoint confused with disease • Example: sitagliptin enhances incretins and enhances physiological glucose control SURROGATE