Loading in 2 Seconds...

Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests

Loading in 2 Seconds...

130 Views

Download Presentation
##### Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Strong Control of the Familywise Type I Error Rate in DNA**Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University**Microarrays**• glass (1 cm2) • ~ 6,500 genes Different cDNA sequence**Example**Group 1: Acute Myeloid Leukemia (AML), n1=11 Group 2: Acute Lymphoblastic Leukemia (ALL), n2=27 Data: OBS TYPE G1 G2 G3 … G7000 1 AML (Gene expression levels) 2 AML … … … … 11 AML 12 ALL … … 38 ALL**Testing for 7000 Gene Expression Levels**Goal: Test H0i: FALL,i = FAML,i for i=1,…,7000. Here, “F” denotes cdf. Many choices for test statistics. Multiplicity problem: If tests are done at a=.05, and there are 6600 equivalent genes, then .05*6600= 330 will be determined “non-equivalent.”**Closed Testing to Control False Discoveries**Let S = {1,2,…,7000} (gene labels). Let K = {i1,…,ik} Í S denote a particular subset. The Closed Testing Procedure: 1. Test H0K: FALL,K = FAML,K for each K Í S, using a valid a-level test for each. 2. Reject H0i: FALL,i = FAML,i if H0K is rejected for all K Ê {i}.**Theorem: CTP strongly Controls FWE**Proof: Suppose H0j1,..., H0jmall are true (unknown to you which ones). You may reject at least one only when you reject the intersection H0j1Ç... Ç H0jm . Thus, FWE = P(reject at least one of H0j1,..., H0jm| H0j1,..., H0jmall are true) £ P(reject H0j1Ç... Ç H0jm| H0j1,..., H0jmall are true) = a .**Exact Tests for Composite Hypotheses H0K**Use the permutation distribution of miniÎK pi, where pi = 2P(T38-2 > |ti|), and ti = p-value = proportion of the 38!/(27!11!) permutations for which miniÎK Pi*£ miniÎK pi . Note: Exact despite “massively singular” covariance matrix!**A Slight Problem...**There are 27000 -1 subsets K to be tested This might take a while...**A Fantastic Simplification**You need only test 7000 of the 27000-1 subsets! Why? Because P(miniÎK Pi*£ c) £ P(miniÎK’ Pi*£ c) when KÌ K’. Significance for most lower order subsets is determined by significance of higher order subsets.**Illustration with Four Genes**H{1234} min p = .0121, p{1234} = .0379 H{134} min p = .0121, p{134} < .0379 H{234} min p = .0142, p{234} = .0351 H{123} min p = .0121, p{123} < .0379 H{124} min p = .0121, p{124} < .0379 H{12} min p = .0121 p{12} < .0379 H{13} min p = .0121 p{13} < .0379 H{34} min p = .0191 p{34} = .0355 H{14} min p =.0121 p{14} < .0379 H{23} min p = .0142 p{23} < .0351 H{24} min p = .0142 p{24} < .0351 H4 p4 = 0.0191 p{4} < .0355 H1 p1 = 0.0121 p{1} < .0379 H2 p2 = 0.0142 p{2} < .0351 H3 p3 = 0.1986 p{3} = .1991 (Start at bottom.)**MULTTEST PROCEDURE**• Tests only the needed subsets (7000, not 27000 - 1). • Samples from the permutation distribution. • Only one sample is needed, not 7000 distinct samples: • The joint distribution of minP is identical under • HK and HS. (Called the “subset pivotality” condition • by Westfall and Young, 1993.)**PROC MULTTEST code**Proc multtest noprint out=adjp holm hoc stepperm n=200000; class type; /* AML or ALL */ test mean (gene1-gene7123); contrast ‘AML vs ALL’ -1 1; run; proc sort data=adjp(where=(raw_p le .0005)); by raw_p; proc print; var _var_ raw_p stppermp; run;**PROC MULTTEST Output**(50 minutes for 200,000 samples)**Imbalance Issues**• Use of student t statistics does result in an • exact, closed multiple testing procedure, but ... • There is imbalance: • less power for gene types that are highly kurtotic • than for normally distributed types. • Solutions: • Use exact unadjusted p-values • Already available for binary data • Computational difficulties otherwise • Rank-transform the data prior to analysis**Rank Transform for Better Balance**Proc rank; var gene1-gene7123; run; Proc multtest noprint out=adjp holm hoc stepperm n=200000; class type; /* AML or ALL */ test mean (gene1-gene7123); contrast ‘AML vs ALL’ -1 1; run; proc sort data=adjp(where=(raw_p le .0005)); by raw_p; proc print; var _var_ raw_p stppermp; run;**Comparing ALL and AML for Gene 6128**G E 2000 N E 6 1000 1 2 8 0 ALL AML TYPE**Is Better Balance Good?**• Maybe not - Imbalance induces more powerful multiple testing procedure • Bonferroni multiplier implicitly reduced through imbalance • Serendipity!**Summary**• Westfall-Young Method is an exact, • closed testing method, despite large p, small n • Detected genes are “honestly significant” • Robust (nonparametric)