1 / 15

Extraction of functional information from large-scale gene expression data

Extraction of functional information from large-scale gene expression data. Bioinformatics 91.580 2003 Spring Jianping Zhou. Contents. A prominence feature of cell cycle-regulated genes ----- show more remarkable and active functions than others

cecile
Download Presentation

Extraction of functional information from large-scale gene expression data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extraction of functional information from large-scale gene expression data Bioinformatics 91.580 2003 Spring Jianping Zhou

  2. Contents • A prominence feature of cell cycle-regulated genes • ----- show more remarkable and active functions than others • SP( shortest-path) analysis to extract functional information • ----- An alternative to clustering analysis

  3. Prominence Feature • Because of their ruling features, the cell cycle-regulated genes are assumed to be more active and remarkable than others in the Yeast Saccharomyces cerevisiae genome. • When performing filtering process against original dataset by some thresholds in terms of significance, if the cell cycle-regulated genes show higher survival ratio than others, we may conclude they are more active and remarkable Assumption

  4. Prominence Feature Methods • The preprocess utility of Gepas package can be used to prepare the comparing dataset • Microarray gene data are the ideal data sources • 800 Spellman’s identified cell cycle-regulated genes for Yeast Saccharomyces cerevisiae are the most complete spectrum at this point

  5. Prominence Feature Use a single sentence of Common Lisp to count the hitting genes: (length (intersection regu '(plain text file content))) regu: the preset CL list representing the list of 800 cell cycle-regulated gene names. It is defined in CL as: (setf regu ‘(plain text of 800 cell cycle-regulated gene)) The plain text of 800 cell cycle-regulated gene can be got by copy and paste of ORF column of CellCycle98.xls plain text file content: Copy and paste of preprocess or clustering output plain text file inside which the ORFs corresponding to selected genes are contained. Methods (cont)

  6. Prominence Feature Steps

  7. Prominence Feature Steps (cont)

  8. Prominence Feature Steps (cont)

  9. Prominence Feature Parameter: Pe, Pk, Sd Pe: Minimum percentage of existing values -- patterns with missing values greater this rate will be removed. Pk: Minimum number of peaks -- patterns with peak values less this value will be removed. Sd: Threshold for standard deviation -- patterns with a standard deviation below the threshold will be removed. P0: total profiles in the original file P1: Removed profiles with missing values, determined by Minimum percentage of existing values P2: Profiles mended through imputing missing values, determined by Minimum number of peaks P3: Removed profiles through filtering out flat profiles by number of peaks P4: Removed profiles through filtering out flat profiles by standard deviation P5: Profiles remaining in the result dataset Hit: Count of genes existing in both result dataset and 800 Spellman cell cycle-regulated gene dataset. Hit rate: Hit / P5 Steps (cont)

  10. Prominence Feature Result

  11. Prominence Feature Pe = 95% Result (cont)

  12. SP( shortest-path) analysis • SP( shortest-path) analysis is used to identify transitive genes between two given genes from the same biological process. • Transitive expression similarity among genes can be used as an important attribute to link genes of the same biological pathway. • Recent advances in computational and experimental technologies have opened up real opportunities for annotating gene functions not only at the phenomenological levels but also at the mechanistic levels. Introduction

  13. SP( shortest-path) analysis • With Yeast Saccharomyces cerevisiae genome, The author, X. Zhou [5], constructed the cytoplasm graph (another two graphs include mitochondria, nucleus), which contain 398 genes. All those genes are got involved in the same biological pathway. • Through matching the cytoplasm outcome with Spellman CellCycle98.xls, six genes are identified, they are • YPR045C YPL221W(BOP1) YIL056W YHR029C YDR130C YBR053C Discovery

  14. SP( shortest-path) analysis • Referring to CellCycle98.xls, all these genes are with unknown process and far away cluster order number each other. • For the SOM clustering output with respect to normalized file, which has 561 hits with 800 Spellman genes, those genes exist in YPR045C Cluster (2, 4); YPL221W Cluster (1, 1); YBR053C Cluster (2, 7). Other three are not found. • As far as all my clustering outputs, none is found in clustering. • All Ftigo linked databases have no results for these five genes or ORFs • No evidence show these six genes can stay in the same cluster. Discovery (cont)

  15. References [1] Paul T. Spellman, Gavin Sherlock,Michael Q. Zhang, Vishwanath R. Iyer,§ Kirk Anders, Michael B. Eisen, Patrick O. Brown, David Botstein, and Bruce Futcher Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization MBC, Vol. 9, Issue 12, 3273-3297, December 1998 [2] Oliveros, J.C., Blaschke, C., Herrero, J., Dopazo, J. & Valencia, A. (2000) Expression profiles and biological function. Genome Informatics Workshop 2000, 11, 106-117 [3] M. Q. Zhang Extracting functional information from microarrays: A challenge for functional genomics PNAS, October 1, 2002; 99(20): 12509 - 12511. [4] M. Q. Zhang Large-Scale Gene Expression Data Analysis: A New Challenge to Computational Biologists Genome Res., August 1, 1999; 9(8): 681 - 688. [5] X. Zhou, M.-C. J. Kao, and W. H. Wong From the Cover: Transitive functional annotation by shortest-path analysis of gene expression data PNAS, October 1, 2002; 99(20): 12783 - 12788. [6] www.biostat.harvard.edu/complab/SP/

More Related