150 likes | 226 Views
Learn about the pitfalls of relying solely on p-values in cancer research, and strategies to improve power and reduce false positives. Explore the importance of effect size, multiple comparison adjustments, and innovative methods for biomarker discovery. Remember, being thorough and strategic in hypothesis testing can lead to more meaningful results.<br>
E N D
Statistomics and Cancer Graham Byrnes Biostatistics Group
It’s not all about p-values (quoi que…) • Suppose you have a PSA test • If you have 1ng/ml… 50% of healthy men have more • 2.5ng/ml, 18% • 4ng/ml, 6% • 10ng/ml, 1.7%
P-values • Those are the p-values against the hypothesis that you are healthy • How small a p-value would convince you to publish (or have a biopsy)? • Not informative about your risk of having PrCa: need info about prevalence
But • Similar for research: if there are very few things to find, almost everything published will be false positive • The traditional 5% threshold slows the flood. • Does NOT imply only 5% of published results are false
Multiple Comparison • Omics technologies present us with several 100,000 experiments at once. • If we set the threshold at 5% for each, we will get 5000 « positives » even if there is nothing to find. • So we need to be more stringent: Bonferroni or Benjamini-Hochberg FDR
What about power? • Imagine a biomarker predicting cancer • Risk of cancer between 1st & 5th quintiles 2.0 • Equates to a per-SD OR of 1.35 • If we hoped to detect this among a number of candidate molecules using 200 cases and 800 controls?
Power estimates • T= 101, p<5x10-3: 95% • T= 102, p<5x10-4: 83% • T= 103, p<5x10-5: 64% • T= 104, p<5x10-6: 44% • T= 105, p<5x10-7: 27%
Effect size • For comparison, CRP gives OR=1.3 for 1st vs 5th quintile • About 1.1 / population SD • Power to test it alone: 24% • To pick out of 100 candidates: 1.3%
Does FDR save us? • Same threshold if only 1 to find • For 50% power to find CRP among 1000 candidates, would need to raise the per-test threshold to 0.20 • FDR=99.93% • Expect to find 200 « positives » almost certainly NOT including CRP
What can we do? Hope to find something with a really huge effect OR Be clever!
Big effects • If there are really biomarkers able to act as useful screening tools, they must have bif effects • They will be findable • Further work will be needed to establish specificity, but association will be obvious
How to be clever? Need to reduce the number of hypotheses • Use prior knowledge • Use associations with known environmental risk factors • Cluster related biomarkers and test for association with the cluster rather than the individual biomarkers
Clustering etc • One thing we will have: lots of controls • Discovery of biomarkers of exposure does not require cases • This discovery process has no impact on false associations with cancer • The cohort setting is crucial, to avoid reverse causality