Managing Statistical Significance in Cancer Research<br>

Statistomics and Cancer Graham Byrnes Biostatistics Group

It’s not all about p-values (quoi que…) • Suppose you have a PSA test • If you have 1ng/ml… 50% of healthy men have more • 2.5ng/ml, 18% • 4ng/ml, 6% • 10ng/ml, 1.7%

P-values • Those are the p-values against the hypothesis that you are healthy • How small a p-value would convince you to publish (or have a biopsy)? • Not informative about your risk of having PrCa: need info about prevalence

But • Similar for research: if there are very few things to find, almost everything published will be false positive • The traditional 5% threshold slows the flood. • Does NOT imply only 5% of published results are false

Multiple Comparison • Omics technologies present us with several 100,000 experiments at once. • If we set the threshold at 5% for each, we will get 5000 « positives » even if there is nothing to find. • So we need to be more stringent: Bonferroni or Benjamini-Hochberg FDR

What about power? • Imagine a biomarker predicting cancer • Risk of cancer between 1st & 5th quintiles 2.0 • Equates to a per-SD OR of 1.35 • If we hoped to detect this among a number of candidate molecules using 200 cases and 800 controls?

Power estimates • T= 101, p<5x10-3: 95% • T= 102, p<5x10-4: 83% • T= 103, p<5x10-5: 64% • T= 104, p<5x10-6: 44% • T= 105, p<5x10-7: 27%

Effect size • For comparison, CRP gives OR=1.3 for 1st vs 5th quintile • About 1.1 / population SD • Power to test it alone: 24% • To pick out of 100 candidates: 1.3%

Does FDR save us? • Same threshold if only 1 to find • For 50% power to find CRP among 1000 candidates, would need to raise the per-test threshold to 0.20 • FDR=99.93% • Expect to find 200 « positives » almost certainly NOT including CRP

What can we do? Hope to find something with a really huge effect OR Be clever!

Big effects • If there are really biomarkers able to act as useful screening tools, they must have bif effects • They will be findable • Further work will be needed to establish specificity, but association will be obvious

How to be clever? Need to reduce the number of hypotheses • Use prior knowledge • Use associations with known environmental risk factors • Cluster related biomarkers and test for association with the cluster rather than the individual biomarkers

Clustering etc • One thing we will have: lots of controls • Discovery of biomarkers of exposure does not require cases • This discovery process has no impact on false associations with cancer • The cohort setting is crucial, to avoid reverse causality

Thank you!

Managing Statistical Significance in Cancer Research<br>

Managing Statistical Significance in Cancer Research<br>

Presentation Transcript

Women and Cancer

Cancer and Behavior

Viruses and Cancer

Cancer and Exercise

Skin Pre-Cancer and Cancer

Cancer genomics and cancer epidemiology

DEPRESSION AND CANCER

Cancer and Nanotechnology

Cancer and Pregnancy

Juicing and Cancer

Tobacco and Cancer

CANCER and Anti Cancer Agents

Colorectal Cancer and Anal Cancer

Cancer and Listeria

Anticoagulation and cancer

PROPOLIS and CANCER

Cancer screening and cancer diagnosis

Cancer and Autophagy

Marijuanna and Cancer | Natural Cancer Treatment

Cranberries And Cancer