Screening and Prognostic Tests. Thomas B. Newman, MD, MPH October 20, 2005. Overview. Questions from last time; administrative stuff Screening tests Introduction Biases in observational studies Biases in randomized trials Conclusion – ecologic view Prognostic tests
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Screening and Prognostic Tests Thomas B. Newman, MD, MPH October 20, 2005
Overview • Questions from last time; administrative stuff • Screening tests • Introduction • Biases in observational studies • Biases in randomized trials • Conclusion – ecologic view • Prognostic tests • Differences from diagnostic tests and risk factors • Quantifying prediction: calibration and discrimination • Value of information • Common problems
TN Biases • “When your only tool is a hammer, you tend to see every problem as a nail.” • Biggest gains in longevity have been PUBLIC HEALTH interventions, not interventions aimed at individuals • Biggest threats are still public health threats • Interventions aimed at individuals are overemphasized
Cultural characteristics "We live in a wasteful, technology driven, individualistic and death-denying culture." --George Annas, New Engl J Med, 1995
What is screening? • Common definition: testing to detect asymptomatic disease • Better definition*: application of a test to detect a potential disease or condition in people with no known signs or symptoms of that disease or condition. • Disease vs condition • Asymptomatic vs no known signs or symptoms *Common screening tests. David M. Eddy, editor. Philadelphia, PA: American College of Physicians, 1991
Types of screening • Unrecognized symptomatic disease screening: what IS making the person sick. • Disease screening: what WILL make the person sick. • Risk factor screening: what MIGHT make the person sick.
Examples and overlap • Continuum related to both certainty and timing of symptoms • May vary with age • Unrecognized symptomatic disease: vision and hearing problems in young children; iron deficiency anemia, depression • Presymptomatic disease: neonatal hypothyroidism, syphilis, HIV • Risk factor: hypercholesterolemia, hypertension • Somewhere between: prostate cancer, breast carcinoma in situ, more severe hypertension
Disease vs. Risk factor screening. 3 *May be political as well as scientific decision
Possible harms from screening • To all tested • To those with negative results • To those with positive results • To those not tested • See course book
Forces behind excessive screening -1 • Companies selling machines to do the test • Companies selling the test itself • Companies selling products to treat the condition • Clinicians who treat the condition • Politicians who are (or want to appear) sympathetic
Forces behind excessive screening -2 • Disease research and advocacy groups • Academics who study the condition • Clinicians doing or interpreting the test • Managed care organizations • The public
E-mail excerpt -1 > PLEASE, PLEASE, PLEASE TELL ALL YOUR FEMALE FRIENDS AND RELATIVES TO INSIST ON A CA-125 BLOOD TEST EVERY YEAR AS PART OF THEIR ANNUAL PHYSICAL EXAMS. Be forewarned that their doctors might try to talk them out of it, saying, "IT ISN'T NECESSARY." > > …Insist on the CA-125 BLOOD TEST; DO NOT take "NO" for an answer!
Biases in Observational Studies of Screening Tests • Volunteer bias • Lead time bias • Length time bias • Stage migration bias • Pseudodisease
Volunteer Bias • People who volunteer for studies differ from those who do not • Examples • HIP Mammography study: women who volunteered for mammography had lower heart disease death rates • Coronary drug project: Men who took their medicine had about half the mortality of men who didn't, whether they were on drug or placebo
Lead time bias Source: EDITORIAL: Finding and Redefining Disease.Effective Clinical Practice, March/April 1999. Available at: ACP- Online http://www.acponline.org/journals/ecp/marapr99/primer.htm accessed 8/30/02
Length Bias (Different natural history bias) • Screening picks up prevalent disease • Prevalence = incidence x duration • Slowly growing tumors have greater duration in presymptomatic phase, therefore greater prevalence • Therefore, cases picked up by screening will be disproportionately those that are slow growing
Length bias Source: EDITORIAL: Finding and Redefining Disease.Effective Clinical Practice, March/April 1999. Available at: ACP- Online http://www.acponline.org/journals/ecp/marapr99/primer.htm
Length Bias Slower growing tumor with better prognosis ? Early detection Higher cure rate
Stage migration bias Old tests New tests
Stage migration bias • Also called the "Will Rogers Phenomenon" • "When the Okies left Oklahoma and moved to California, they raised the average intelligence level in both states." -- Will Rogers • Documented with colon cancer at Yale • Other examples abound – the more you look for disease, the higher the prevalence and the better the prognosis • More generally, be careful with stratified analyses Best reference on this topic: Black WC and Welch HG. Advances in diagnostic imaging and overestimation of disease prevalence and the benefits of therapy. NEJM 1993;328:1237-43.
A more general example of Stage Migration Bias • VLBW (< 1500 g), LBW (1500-2499g) and NBW (>= 2500g) fetuses exposed to Factor X all have decreased mortality compared with those not exposed • Is factor X good? • Maybe not! Factor X could be cigarette smoking! • Smoking moves babies to lower birthweight strata • Compared with other causes of LBW (i.e., prematurity) it is not as bad
Pseudodisease • A condition that looks just like the disease, but never would have bothered the patient • In an individual treated patient it is impossible to distinguish pseudodisease from successfully treated asymptomatic disease • Existence of pseudodisease can only be detected in groups of treated patients • Treating pseudodisease can only cause harm because (by definition) it is unnecessary
Example: Mayo Lung Project (MLP) • RCT of lung cancer screening • Enrollment 1971-76 • 9,211 male smokers • Two study arms • Intervention arm: chest x-ray and sputum cytology every 4 months for 6 years (75% compliance) • Usual care (control) arm: at trial entry only, a recommendation to receive same tests annually
MLP Extended Follow-up Results* • Intervention group: more cancers diagnosed at early, resectable stage • Better survival of those with lung cancer *Marcus et al., JNCI 2000;92:1308-16
MLP Extended Follow-up Results* • Slight increase in lung-cancer mortality (P=0.09 by 1996) *Marcus et al., JNCI 2000;92:1308-16
What happened? • Lead-time bias? • Length bias? • Volunteer bias? • Overdiagnosis (pseudodisease) Black, WC. Overdiagnosis: An unrecognized cause of confusion and harm in cancer screening. JNCI 2000;92:1280-1
NHLBI National Lung Screening Trial • 46,000 participants randomized in 2 years • Equal randomization • Three annual screens • Spiral CT versus chest x-ray!
Each year, 182,000 women are diagnosed with breast cancer and 43,300 die. One woman in eight either has or will develop breast cancer in her lifetime... If detected early, the five-year survival rate exceeds 95%. Mammograms are among the best early detection methods, yet 13 million women in the U.S. are 40 years old or older and have never had a mammogram. 39,800 Clicks per mammogram (Sept, ’04)
RCTs of screening tests, Example: Mammography • New York TimesExpert Panel Cites Doubts On Mammogram's Worth • Washington Post • Mammography Review Shatters the Status Quo Doubts About Its Value Alarm Many
Is screening for breast cancer with mammography justifiable?* • Meta-analysis of randomized trials • Methodologic issues raised • Quality of randomization • Post-randomization exclusions • Choice of outcome variable: Breast cancer mortality vs. total mortality *Gotzsche P,Olsen O. Lancet 2000;355:1293
Poor Quality Randomization. Example: Edinburgh trial • Randomization by practice (N=87?), not by woman • 7 practices changed allocation status • Highest SES • 26% of women in control group • 53% of women in screening group • 26% reduction in cardiovascular mortality in mammography group
Example 2: Biased post-randomization exclusion for previous beast cancer • New York Trial • N=853 in screened group • N=336 in control group • Breast cancer mortality difference at 18 years: 44 deaths • Edinburgh trial • N=338 in screened group • N=177 in control group
Explanation for differences in NY Trial* • In screened group women with previous breast cancer excluded at entry • In control group, women with previous breast cancer excluded only if they developed breast cancer • Thus, women with previous breast cancer in who did NOT develop breast cancer were included in the denominator of the control group but not the mammography group • Therefore, bias against mammography * Fletcher SW, Gilmore JG. Mammography screening for breast cancer. NEJM 2003;348:1672-80. (Appendix 2)
Problems with breast cancer mortality as an endpoint • Assignment of cause of death is subjective • Unblinded in NY, Two-county trials • Treatment may have effects on other causes of death
Meta-analysis of radiotherapy for early breast cancer* • Meta-analysis of 40 RCTs • Central review of individual-level data; N = 20,000 • Breast cancer mortality reduced (20-yr ARR 4.8%; P = .0001) • Mortality from other causes increased (20-yr ARR -4.3%; P = 0.003) *Early Breast Cancer Trialists Collaborative Group. Lancet 2000;355:1757
13-year total mortality, > 50 y.o. Breast cancer deaths, 7 yr
NCI Position* • “NCI recommends mammography for women starting in their 40s” -- Dr. Peter Greenwald, NCI director of cancer prevention • "Everyone agrees that mammography detects breast cancer when it's smaller, when it's earlier. There's no debate about that," Greenwald added. "And everybody agrees mammography detects more cancers. • "The debate is whether that has an impact on mortality later on. It is the only real method that we have, other than clinical exam, that's useful as screening for early detection in healthy women." *Washington Post, January 24, 2002
TN Conclusions on Screening • Screening decisions are heavily influenced by politics, economics, emotion and wishful thinking • Most screening occurs without informed consent • High quality RCTs are needed • Low power to discern effect on total mortality • Big debate about efficacy. But even if proponents are right, much screening is not cost-effective and its disadvantages are consistently downplayed
Cost per QALY • Mammography, age 40-50: $105,000* • Mammography, age 50-69: $21,400* • Smoking cessation counseling: $2000** • HIV prevention in Africa: $1-20*** *Salzman P et al. Ann Int Med 1997;127:955-65 (Based on optimistic assumptions about mammography.) **Cromwell J et al. JAMA 1997;278:1759-66 ***Marseille E et al. Lancet 2002; 359: 1851-56
Return to George Annas* • Need to begin to think differently about health. Two dysfunctional metaphors: • Military metaphor – battle disease, no cost too high for victory, no room for uncertainty • Market metaphor -- medicine as a business; health care as a product; success measured economically *Annas G. Reframing the debate on health care reform by replacing our metaphors. NEJM 1995;332:744-7
Ecology metaphor • Sustainability • Limited resources • Interconnectedness • More critical of technology • Move away from domination, buying, selling, exploiting • Focus on the big picture • Populations rather than individuals • Causes rather than symptoms
Assessment of Prognostic Tests Difference from diagnostic tests and risk factors Quantifying accuracy Value of prognostic information Common problems
Potential confusion: “cross-sectional” means 2 things Cross-sectional sampling means sampling does not depend on either the predictor variable or the outcome variable. (E.g., as opposed to case-control sampling) Cross-sectional time dimension means that predictor and outcome are measured at the same time -- opposite of longitudinal
Longitudinal rather than cross-sectional time dimension Incidence rather than prevalence Sensitivity, specificity, prior probability confusing Time to an event may be important Harder to quantify accuracy in individuals Exceptions: short time course, continuous outcomes Difference from Diagnostic Tests
Difference from Risk Factors • Causality not important • Absolute risk very important • Sampling scheme makes a much bigger difference because absolute risks are less generalizable than relative risks • Can be informative even if no bad outcomes!
How accurate are the predicted probabilities? Assemble a group Compare actual and predicted probabilities Calibration is important for decision making and giving information to patients Like absolute risk in this way – less generalizable Quantifying Prediction 1: Calibration