1 / 21

Metagene Projection

Metagene Projection. There are a lot of array data available Species, platform, labeling method, researcher and other issues make using these data difficult. Metagene Projection claims to “reduce noise while still capturing the invariant biological features of the data.”

Download Presentation

Metagene Projection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metagene Projection • There are a lot of array data available • Species, platform, labeling method, researcher and other issues make using these data difficult. • Metagene Projection claims to “reduce noise while still capturing the invariant biological features of the data.” • This should “enable cross-platform and cross-species analysis, improve clustering and class prediction and provide a computational means to detect and remove sample contamination.”

  2. NMF (Brunet et al., 2004 PNAS 101:4164) – Nonnegative Matrix Factorization W=genes X small # metagenes H= small # metagenes X samples M and T n (genes) x N (samples) KEY point: n (genes) identifiers in M and T must match Unknown: Can M and T be totally different types of data? Moore-Penrose generalized pseudoinverse

  3. Model – 30 samples, 3 metagenes Test – 38 samples, 3 metagenes

  4. After Metagene Projection Before Metagene Projection

  5. Before Metagene Projection : Rank normalized and including only the top 500 markers of each class. – Underperforms metagene projection

  6. KO of the same gene impacts different cell lines in similar way. • Both mouse stem cell lines, one on Exon array, one on 430_2

  7. For 3’ UTR – max average per gene selected • For Exon – max probe count per transcript cluster id selected • gene symbol <–> gene symbol join • All 17354 genes used

  8. Expressed Clustering (10989 genes)

  9. RankNorm = 15((Rank-1)/(#genes-1))

  10. Expressed and rank normalized clustering

  11. Metagene Projection Preprocessing 2 required inputs for the genepattern metagene projection module are model and test preprocessing parameter files. gct.file="Arv.gct" cls.file="Arv.cls" column.subset="ALL" column.sel.type="samples" thres=3 ceil=14 fold=1 delta=1.5 norm=6 NO FILTER at this value 4525 pass

  12. Model input, preprocessing and refinement H matrix from NMF

  13. Projected model dataset

  14. Model combined with test

  15. Original data – Platforms separated

  16. Projected data – possibly better separation move 1 het to improve clades

More Related