reproducibility and ranks of true positives in large scale genomics experiments n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments PowerPoint Presentation
Download Presentation
Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments

Loading in 2 Seconds...

play fullscreen
1 / 27

Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments. Russ Wolfinger 1 , Dmitri Zaykin 2 , Lev Zhivotovsky 3 , Wendy Czika 1 , Susan Shao 1 1 SAS Institute, Inc., 2 National Institute of Environmental Health Sciences, 3 Vavilov Institute of General Genetics

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments' - marcy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
reproducibility and ranks of true positives in large scale genomics experiments

Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments

Russ Wolfinger1, Dmitri Zaykin2, Lev Zhivotovsky3,

Wendy Czika1, Susan Shao1

1SAS Institute, Inc., 2National Institute of Environmental Health Sciences, 3Vavilov Institute of General Genetics

MCP Vienna

July 11, 2007

criticism of statistical methods in genomics
Criticism of Statistical Methods in Genomics
  • Two labs run the same microarray experiment, and resulting lists of significant genes barely overlap.
  • Significant SNPs from a genetic study are not validated in subsequent follow up studies.

Conclusions from scientific community:

Statistical results are not reproducible.

Genomics technology is not reliable.

p vs fc controversy
“P vs FC” Controversy
  • Occurred recently within the FDA-driven Microarray Quality Control Consortium (MAQC)
  • Biologists, chemists, regulators concerned with lack of reproducibility of significant gene lists, and have observed that lists based on fold change (FC) are more consistent than those based on p-values (P)
  • Statisticians usually seek an optimal tradeoff between specificity (Type 1) and sensitivity (Type 2, power), often portrayed in a Receiver Operating Characteristics (ROC) plot
outline
Outline
  • Reproducibility versus specificity and sensitivity
  • Rank distribution of a single true positive
  • P-value combination methods for multiple true positives

All results are based on simulation.

questions
Questions
  • Should statisticians concern themselves with reproducibility, the hallmark of science? YES!
  • How to define reproducibility?
  • How does it relate to specificity and sensitivity?
  • Is it possible to dialectically reconcile conflicting perspectives, or at least provide an explanatory (and hence mollifying) framework?
simulation study 1 based on maqc phase 1 experiment
Simulation Study 1: Based on MAQC Phase 1 Experiment
  • Initially designed and implemented by Wendell Jones, Expression Analysis Inc.
  • Two treatment groups, n=5 in each
  • 15,000 genes, 1000 truly changed with varying degrees of expression that mimic real data
  • Coefficient of variation (CV) on original data scale set to varying percentages: (2, 10, 30, 100)
simulation study 1 continued
Simulation Study 1 (continued)
  • For sake of simplicity, we focus only on gene-selection rules based on fold change (FC, same as effect size) or simple t-test p-values
  • Note that gene lists can be constructed in many other ways; e.g. shrunken t-statistics
  • Use Proportion of Overlapping Genes (POG) as a measure of reproducibility, based on simple Venn diagram
  • Compute POG on simulated pairs of gene lists; list sizes range from 10 to 15000
  • Require direction of FC to match
simulated pog vs gene list size
Simulated POG vs. Gene List Size

FC Ranking

P-Value Ranking

three dimensions cv 2
Three Dimensions CV=2%

FC Ranking

P-Value Ranking

discussion 1
Discussion 1
  • Reproducibility is not monotonically related to specificity and sensitivity.
  • There appear to be tradeoffs in all three dimensions: specificity, sensitivity, and reproducibility.
  • The weight attached to each dimension depends on the objectives of the study.
  • Simple rules based on both FC and P-value cutoffs appear viable as a starting compromise.
  • Challenge you to …
enter the third dimension

Enter the Third Dimension

Specificity – Sensitivity - Reproducibility

volcano plots help visualize ranking rules
Volcano Plots Help Visualize Ranking Rules

“Dormant” Volcano from Two-Sample T-Test (df=4) on 10,000 Genes

outline1
Outline
  • Reproducibility versus specificity and sensitivity
  • Rank distribution of a single true positive
  • P-value combination methods for multiple true positives

All results are based on simulation.

simulation study 2a number of best t test results required to cover a single true positive
Simulation Study 2A: Number of Best T-Test Results Required to Cover a Single True Positive
  • Compare different ranking rules based on P, FC, or functional combination
  • Two treatment groups, n=100 in each
  • 38,500 t-tests (4 df), only 1 truly changed
  • Power for the one true positive set to (80, 90, 95, 99, and 80-Śidák) at alpha=5%
slide15

Simulation Study 2A ResultsNumber of best t-test (df=4) results out of 38,500 required to cover a single true positive with 95% probability

p: p-value; d: effect size; a*: 1-(1-0.05)(1/38500)

simulation study 2b number of best chi square test results required to cover a single true positive
Simulation Study 2B: Number of Best Chi-Square Test Results Required to Cover a Single True Positive
  • Again compare different ranking rules based on p-value, effect size, or a functional combination
  • Two binomial proportions, n=500 in each group
  • 200,000 chi-square 1-df tests, only 1 true association
  • Genetic allele frequency for true negatives simulated to be uniform [0.05,0.95]
  • Genetic allele frequency for true positive control group set to 0.1 or 0.5. Frequency for case group set higher to achieve power of (80, 90, 95, 99, and 80-Śidák) at alpha=5%
slide17

Simulation Study 2B ResultsNumber of best chi-square (1 df) test results out of 200,000 required to cover a single true positive with 95% probability TP case frequency 0.1

p: p-value; d: effect size; a*: 1-(1-0.05)(1/200,000)

slide18

Simulation Study 2B ResultsNumber of best chi-square (1 df) test results out of 200,000 required to cover a single true positive with 95% probability TP case frequency 0.5

p: p-value; d: effect size; a*: 1-(1-0.05)(1/200,000)

discussion 2
Discussion 2
  • Incorporating effect size into ranking rules can improve ranking performance, particularly when variance of true positives is comparatively larger than variance of true negatives
  • Possible Empirical Bayes effect
outline2
Outline
  • Reproducibility versus specificity and sensitivity
  • Rank distribution of a single true positive
  • P-value combination methods for multiple true positives

All results are based on simulation.

simulation study 3 compare power of p value combination methods with multiple true positives
Simulation Study 3: Compare Power of P-Value Combination Methods with Multiple True Positives
  • 5,000 Chi-Square (1 df) tests
  • Number of true associations ranges from 10 to 200 with various powers
  • Compare Sidak, Simes, Fisher Combination, and three more modern methods:
    • Gamma Method (GM)
    • Truncated Product Method (TPM)
    • Rank Truncated Product (RTP)
gamma method gm
Gamma Method (GM)
  • Generalization of Fisher and Stouffer
  • Sum inverse Gamma-transformed 1-pi
  • Tune using Soft Truncation Threshold, accommodates effect heterogeneity
truncated product method tpm
Truncated Product Method (TPM)
  • Combine only the subset of p-values less than some threshold
  • Assess significance by evaluating product distribution via Monte Carlo on uniforms.
  • Upon rejecting the null, can claim true positives are in the subset
rank truncated product rtp
Rank Truncated Product (RTP)
  • Combine the K smallest p-values
  • Assess significance by evaluating product distribution with Monte Carlo
  • K=1 same as Sidak, K=max same as Fisher
  • On rejecting the null, cannot claim true positives are in the subset
slide25
Simulation Study 3 ResultsPower of different p-value combination methods from 5,000 chi-square (1 df) tests
discussion 3
Discussion 3
  • Gamma Method competitive as a global test
  • Truncated Product Method enables more specific inference.
reproducibility and ranks of true positives in large scale genomics experiments1

Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments

Russ Wolfinger1, Dmitri Zaykin2, Lev Zhivotovsky3,

Wendy Czika1, Susan Shao1

1SAS Institute, Inc., 2National Institute of Environmental Health Sciences, 3Vavilov Institute of General Genetics

MCP Vienna

July 11, 2007