1 / 10

Final Project Week 3 - 5/7/09 GSEA and Cluster Computing in Protein Research Leon Kay, Yan Tran,

Final Project Week 3 - 5/7/09 GSEA and Cluster Computing in Protein Research Leon Kay, Yan Tran, Chris Thomas. Leon. Chris. Yan. Gary. Gene Set Enrichment Analysis.

eamon
Download Presentation

Final Project Week 3 - 5/7/09 GSEA and Cluster Computing in Protein Research Leon Kay, Yan Tran,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Final Project Week 3 - 5/7/09 GSEA and Cluster Computing in Protein Research Leon Kay, Yan Tran, Chris Thomas Leon Chris Yan Gary

  2. Gene Set Enrichment Analysis • GSEA is a computational method that determines whether defined set of genes shows statistically significant, differences between two phenotypes • 3 Key Steps • Calculation of the Enrichment Score • Estimation of Significance Level of ES • Adjustment for multiple hypothesis testing

  3. Broad Institute GSEA Tool • We tried using the GSEA tool from the Broad Institute, where most of the original work for GSEA was done - http://www.broad.mit.edu/gsea/ • Java web-start app that launches quickly and easily, lots of online documentation and tutorials. • Unfortunately, we ran into some major issues getting our data to work with it.

  4. Input to the GSEA Tool

  5. Input to the GSEA Tool – Parameters • Expression dataset – This is the expression data, in our case, sub-data extracted from clusters using T-MeV • Gene sets database – databases of gene sets, downloadable through the tool, from Broad’s website – created by Broad and others • Phenotype labels – an independent file of label data plus more, format specific to GSEA – created from original data • Chip Platform – Chip data file that matches the data set from which the data was recorded.

  6. What is a Phenotype? • Simply put, a characteristic of an organism as a result of differing gene expression, plus possible environmental factors. • In our data, the breast cancer classifications can be considered phenotypes. • So the phenotype file is created from the breast cancer data using the class labels as phenotypes.

  7. Folding@Home • The most powerful computing cluster in the world • One of the largest computing clusters as well • Launched in 2000, It is managed by the Pande Group within Stanford's Chemistry Department • Goal is “to understand protein folding, misfolding and related diseases” • As of May 2009, 63 papers have been published utilizing Folding@Home

  8. Folding@Home: Model • Does not rely on a “super computer” for data processing • Small client application installed on client hardware • Leverages unused computing power on hardware • As of April '09, from an estimated 400,000 machines, a peak speed of 4.5 Native PFLOPS • More modern CPUs are now multi-core, so the Pande Group has explored Symmetrical Processing to leverage unused power

  9. Folding@Home At a Glance

  10. References • “Folding at Home”, http://folding.stanford.edu/ • Spanish Inquisition Image - http://roflrazzi.com/upcoming/?pid=12265 • Subramanian, Aravind; Gene Set Enrichment Analysis: A Knowledged based approach for interpretting genome wide expression profiles; http://mootha.med.harvard.edu/PubPDFs/Subramanian2005.pdf • GSEA, http://www.broad.mit.edu/gsea/

More Related