210 likes | 324 Views
Metagene Projection. There are a lot of array data available Species, platform, labeling method, researcher and other issues make using these data difficult. Metagene Projection claims to “reduce noise while still capturing the invariant biological features of the data.”
E N D
Metagene Projection • There are a lot of array data available • Species, platform, labeling method, researcher and other issues make using these data difficult. • Metagene Projection claims to “reduce noise while still capturing the invariant biological features of the data.” • This should “enable cross-platform and cross-species analysis, improve clustering and class prediction and provide a computational means to detect and remove sample contamination.”
NMF (Brunet et al., 2004 PNAS 101:4164) – Nonnegative Matrix Factorization W=genes X small # metagenes H= small # metagenes X samples M and T n (genes) x N (samples) KEY point: n (genes) identifiers in M and T must match Unknown: Can M and T be totally different types of data? Moore-Penrose generalized pseudoinverse
Model – 30 samples, 3 metagenes Test – 38 samples, 3 metagenes
After Metagene Projection Before Metagene Projection
Before Metagene Projection : Rank normalized and including only the top 500 markers of each class. – Underperforms metagene projection
KO of the same gene impacts different cell lines in similar way. • Both mouse stem cell lines, one on Exon array, one on 430_2
For 3’ UTR – max average per gene selected • For Exon – max probe count per transcript cluster id selected • gene symbol <–> gene symbol join • All 17354 genes used
Metagene Projection Preprocessing 2 required inputs for the genepattern metagene projection module are model and test preprocessing parameter files. gct.file="Arv.gct" cls.file="Arv.cls" column.subset="ALL" column.sel.type="samples" thres=3 ceil=14 fold=1 delta=1.5 norm=6 NO FILTER at this value 4525 pass
Model input, preprocessing and refinement H matrix from NMF
Projected data – possibly better separation move 1 het to improve clades