1 / 57

Significance Testing of High-Throughput Data

Significance Testing of High-Throughput Data. CSHL Data Analysis 2012 Mark Reimers. Goals of Testing High-Throughput Data. To identify those genes most likely changed To prioritize candidates for focused follow-up studies

mimis
Download Presentation

Significance Testing of High-Throughput Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Significance Testing of High-Throughput Data CSHL Data Analysis 2012 Mark Reimers

  2. Goals of Testing High-Throughput Data • To identify those genes most likely changed • To prioritize candidates for focused follow-up studies • To characterize functional changes reflected in changes in gene regulation • In practice we don’t need exactp-values… …but we do need critical thinking!

  3. Outline • Family wide error rates • False discovery rates • Benjamini-Hochberg • Storey positive FDR • Correlated errors • Permutations and empirical p-values • Empirical Bayes approaches • Power to detect differences

  4. Characterizing False Positives • Family-Wide Error Rate (FWE) or ‘corrected p-values’ • probability of at least one false positive arising from the selection procedure • Strong control of FWE: • Bound on FWE independent of number changed • False Discovery Rate: • Proportion of false positives arising from selection procedure • This is unknown; we can only estimate this!

  5. Catalog of Type I Error Rates • Per-family Error Rate PFER = E(V) • Per-comparison Error Rate PCER = E(V)/m • Family-wise Error Rate FWER = p(V ≥ 1) • False Discovery Rate i) FDR = E(Q), where Q = V/R if R > 0; Q = 0 if R = 0 (Benjamini-Hochberg) ii) FDR = E( V/R | R > 0) (Storey)

  6. Simple Multiple Testing Example • Suppose 10,000 genes on a chip • Suppose no genes really changed • all samples drawn from same population • Each test statistic has a 5% chance of exceeding the threshold at a p-value of .05 • Type I error • The test statistics for 500 genes should exceed .05 threshold ‘by chance’

  7. Part I of Demo

  8. What is the Distribution of Null P-Values? • The ‘p-value’ is the probability, if there is no real difference, of getting a test statistic at least as extreme as the one observed • If one Null test has a p-value of 0.3, then 30% of all Null tests should have bigger test stats, hence smaller p-values • Therefore 30% of p-values are under 0.3

  9. Distributions of p-values Real Microarray Data Random Data Expected histogram height under Null

  10. Distribution of Numbers of p-values • Each bin of width w contains a random number of p-values • The expected number is Nw • Each p-value has a probability w of lying in the bin • The distribution follows the Poisson law • SD ~ (mean)1/2

  11. When Might it not be Uniform? • When actual distribution of test statistic departs from reference distribution • Outliers in data may give rise to more extremes • More small p-values • Approximate tests – often conservative • P-values are larger than occurrence probability • Distribution shifted right

  12. General Issues for Multiple Comparisons • FWER vs FDR • Are you willing to tolerate some false positives • FDR: E(FDR) or P(FDR < Q)? • Actual (random) FDR has a long-tailed distribution • But E(FDR) methods are simpler and cleaner • Correlations • Many procedures surprise you when tests are correlated • Always check assumptions of procedure! • Models for Null distribution: a matter of art • Strong vs weak control • Will the procedure work for any combination of true and false null hypotheses?

  13. FWER - Setting a Higher Threshold • Suppose want to test N independent genes at overall level a • What level a* should each gene be tested at? • Want to ensure • P( any false positive) < a • i.e. 1 – a = P( all true negatives ) • = P( any null accepted )N • = ( 1 – a* ) N • Solve for a* = 1 – (1 – a )1/N

  14. Expectation Argument • P( any false positive ) • <= E( # false positives ) • = N E( any false positive) • = N a* • So we set a* = a / N • NB. No assumptions about joint distribution

  15. ‘Corrected’ p-Values for FWE • Sidak (exact correction for independent tests) • pi* = 1 – (1 – pi)N if all pi are independent • pi* @ 1 – (1 – Npi + …) gives Bonferroni • Bonferroni correction • pi* = Npi, if Npi < 1, otherwise 1 • Expectation argument • Still conservative if genes are co-regulated (correlated) • Both are too conservative for array use!

  16. Traditional Multiple Comparisons Methods • Key idea: sequential testing • Order p-values: p(1), p(2), … • If p(1) significant then test p(2) , etc … • Mostly improvements on this simple idea • Complicated proofs

  17. Holm’s FWER Procedure • Order p-values: p(1), …, p(N) • If p(1) < a/N, reject H(1) , then… • If p(2) < a/(N-1), reject H(2) , then… • Let k be the largest n such that p(n) < a/n, for all n <= k • Reject p(1) … p(k) • Then P( at least one false positive) < a • Proof doesn’t depend on distributions

  18. Hochberg’s FWER Procedure • Find largest k: p(k) < a / (N – k + 1 ) • Then select genes (1) to (k) • More powerful than Holm’s procedure • But … requires assumptions: independence or ‘positive dependence’ • When one type I error, could have many false positives

  19. Holm & Hochberg Adjusted P • Order p-valuespr1 , pr2, …, prM • Holm (1979)step-down adjusted p-values p(j)* = maxk = 1 to j {min ((M-k+1)p(k), 1)} Adjust out-of-order p-values in relation to those lower (‘step-down’) • Hochberg (1988) step-up adjusted p-values p(j)* = mink = j to M {min ((M-k+1)p(k), 1) } Adjust out-of-order p-values in relation to those higher (‘step-up’)

  20. Demo Part 2

  21. False Discovery Rates

  22. False Discovery Rate • In genomic problems a few false positives are often acceptable. • Want to trade-off power .vs. false positives • Could control: • Expected number of false positives • Expected proportion of false positives • What to do with E(V/R) when R is 0? • Actual proportion of false positives

  23. Truth vs. Decision Decision Truth

  24. Estimating False Discovery Rate • Expect 20; see 75

  25. Demo Part III

  26. Benjamini-Hochberg Procedure • Can’t know what FDR is for a particular sample • B-H suggest procedure controlling average FDR • Order the p-values : p(1), p(2), …, p(N) • If any p(k) < k a /N • Then select genes (1) to (k) • NB: acceptable FDR may be much larger than acceptable p-value (e.g. 0.10 ) • NB: Theorem guarantees FDR for the procedure: set the target, then select the threshold and genes • Most people apply it adaptively – fiddle with level until get a gene list they like • The B-H theorem does not validate this use

  27. Benjamini-Hochberg Example • FDR target 0.1; N = 1,000 • P-value Threshold Condition • 2e-4 1e-4 F • 2.4e-4 2e-4 F • 2.5e-4 3e-4 T • 3.2e-4 4e-4 T • 6e-4 5e-4 F

  28. Argument for B-H Method • If no true changes (all H0’s hold) • Q = 1 condition of Simes’ lemma holds • Therefore probability < a • Otherwise Q = 0 • If all true changes (no H0 holds) • Q = 0 < a • Build argument by induction from both ends and up from N = 2

  29. Simes’ Lemma • Suppose we order the p-values from N independent tests using random data: • p(1), p(2), …, p(N) • Pick a target threshold a • P( p(1) < a /N || p(2) < 2 a /N || p(3) < 3 a /N || … ) = a a/2 P = P( min(p1,p2) < a/2) + P(min(p1,p2) > a/2 & max(p1,p2) < a) Area = (a/2 + a/2 – a2/4 ) + a2/4 p2 a/2 p1

  30. Simes’Test for Some Non-Nulls • Pick a target threshold a • Order the p-values : p(1), p(2), …, p(N) • If for any k, p(k) < k a /N, reject complete Null • Test valid against complete Null hypothesis, if tests are independent or ‘positively dependent’ • Doesn’t give strong control (i.e. if some alternatives are true) • Somewhat non-conservative if tests are negatively correlated

  31. Practical Issues • Actual proportion of false positives varies from data set to data set • Mean FDR could be low but could be high in your data set

  32. Distributions of numbers of p-values below threshold • 10,000 genes; • 10,000 random drawings • L: Uncorrelated R: Highly correlated

  33. Controlling the Number of FP’s in One Study • B-H procedure only guarantees long-term average value of E(V/R|R>0)P(R>0) • can be quite badly wrong in individual studies • Korn’smethod gives confidence bound on FDR for individual studies • also addresses issue of correlations • Builds on Westfall-Young approach to control tail probability of proportion of false positives (TPPFP)

  34. Korn’s Procedure • To guarantee no more than k false positives • Construct null distribution as in Westfall-Young • Order p-values: p(1), …,p(M) • Reject H(1), …,H(k) • For next p-values • Compare p-value to full null • N.B. This gives strong control • Continue until one H not rejected

  35. Issues with Korn’s Procedure • Valid if select k first then follow through procedure, not if try a number of different k and pick the one with most genes – as people actually proceed • Only approximate FDR • Computationally intensive • Available in BRB

  36. Storey’s pFDR • Storey argues that E(Q | V > 0 ) is what most people think FDR means • Sometimes quite different from B-H FDR • Especially if number of rejected nulls needs to be quite small in order to get acceptable FDR • E.G. if P(V=0) = 1/2 , then pFDR = 2*FDR

  37. A Bayesian Interpretation • Suppose nature generates true nulls with probability p0 and true alternatives with P = p1 • Then define the FDR as the probability of a false positive among the rejected tests • pFDR= P( H0true | test statistic) • Issue: We don’t know p0 • Storeysuggests estimating p0by examining the right end of the p-value distribution Expected density if all H0 true Observed density in right half

  38. Storey’s Procedure • Estimate #of true Nulls (p0) as 2 (#p > ½ ) • Try several p-value thresholds p1 • ‘fishing’ is OK with this procedure (unlike B-H) • Estimate probability of p-value for true Null in rejection region • Form ‘naïve’ ratio: p0 p1 M/ {# p < p1} • ‘Adjust’ for small numbers • Bootstrap ratio to obtain confidence interval for pFDR

  39. Q-Values • P-value is minimum test level at which a gene gets selected (declared ‘significant’) • Q-value is minimum FDR at which a gene is included in the selected set • In Storey’s procedure this is a Bayesian posterior probability • Term is commonly applied to B-H procedure

  40. Confidence in pFDR • Storey estimates confidence intervals for his procedure by bootstrapping p-values • This is relatively easy to do • However the bootstrap correct procedure is to resample then re-compute p- and q-values • The confidence intervals obtained this way are very different than by resampling p-values, if the tests are moderately correlated, as is often the case

  41. Correlated Tests

  42. Correlated Tests and FWER • Typically tests are correlated • Extreme case: all tests highly correlated • One test is proxy for all • ‘Corrected’ p-values are the same as ‘uncorrected’ • Intermediate case: some correlation • Usually probability of obtaining a p-value by chance is in between Sidak and uncorrected values

  43. Symptoms of Correlated Tests P-value Histograms

  44. Distributions of numbers of p-values below threshold • 10,000 genes; • 10,000 random drawings • L: Uncorrelated R: Highly correlated

  45. Permutation Tests • We don’t know the true distribution of gene expression measures within groups • We simulate the distribution of samples drawn from the same group by pooling the two groups, and selecting randomly two groups of the same size we are testing. • Need at least 5 in each group to do this!

  46. How To Do Permutation Tests • Suppose samples 1,2,…,10 are in group 1 and samples 11 – 20 are from group 2 • Permute 1,2,…,20: say • 13,4,7,20,9,11,17,3,8,19,2,5,16,14,6,18,12,15,10 • Construct mean differences (or t-scores) for each gene based on these groups • Repeat many times to obtain Null distribution of random mean differences (or t-scores) • This will be a z- or t-distribution if the original distribution is roughly Normal (has no outliers)

  47. Critiques of Permutations • Variances of permuted values for really separate groups are inflated • Permuted t -scores for many genes may be lower than from random samples all drawn from the same population • Therefore somewhat too conservative p-values for some genes

  48. Multivariate Permutation Tests • Want a null distribution with same correlation structure as given data but no real differences between groups • Permute group labels among samples • redo tests with pseudo-groups • repeat ad infinitum (10,000 times)

  49. Westfall-Young Approach • Procedure analogous to Holm, except that at each stage, they compare the smallest p-value to the smallest p-value from an empirical null distribution of the hypotheses being tested. • How often is smallest p-value less than a given threshold if tests are correlated to the same extent and all Nulls are true? • Construct permuted samples: n = 1,…,N • Determine p-values pj[n] for each sample n

  50. Westfall-Young Approach – 2 • Construct permuted samples: n = 1,…,N • Determine p-values pj[n] for each sample n • To correct the i-th smallest p-value, drop those hypotheses already rejected (at a smaller level) • The i-th smallest p-value cannot be smaller than any previous p-values

More Related