1 / 22

A graph-theoretic modeling on GO space for biological interpretation of gene clusters

A graph-theoretic modeling on GO space for biological interpretation of gene clusters. Bioinformatics Unit, ISTECH Inc. Cancer Metastasis Research Center, Yonsei University College of Medicine Sung Geun Lee, Jung Uk Hur and Yang Seok Kim. 報告人 : 張家榮. Introduction. Gene Ontology (GO)

chase
Download Presentation

A graph-theoretic modeling on GO space for biological interpretation of gene clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A graph-theoretic modeling on GO space for biological interpretation of gene clusters Bioinformatics Unit, ISTECH Inc. Cancer Metastasis Research Center, Yonsei University College of Medicine Sung Geun Lee, Jung Uk Hur and Yang Seok Kim 報告人:張家榮

  2. Introduction • Gene Ontology (GO) • Controlled vocabulary of various genomic databases about diverse species • Clusters of microarray data • Each cluster has some genes • Extracts GO terms for a gene cluster • Each gene has several corresponding GO terms • Purpose • To discover the meaning of each cluster

  3. Metric structure of GO tree

  4. Lowest common ancestor • Given a non-empty subset U, v is a common ancestor of U if every node in U is on a subtree having v as the root and v0 is an LCA of U if v0 is greater than or equal to the level of w for any common ancestor w of U.

  5. Lowest common ancestor

  6. Principal distance • Each level has its own weight • W: IH -> R+ • W(i) > W(i+1) • For example: W(k)=150-10(k-1) Where w0 is LCA of v1 and v2

  7. Principal distance 40 30 20 10 0

  8. Multiset • Mathematically, the following three sets {1}, {1, 1}, {1, 1, 1} are equal in the set notation. • Yet, we want to take the number of occurrences of elements into account. • Such set is called as a multiset.

  9. MaxPd and AverPd given a multiset G ={v1, v2, . . . , vn}

  10. MaxPd and AverPd • MaxPd • give the comprehensive biological meanings of a gene cluster • Not flexible but informs us of the existence of some functional outliers • AverPd • Signifies the most frequent GO codes • More than one

  11. Algorithmic approach (1) • c[i,j] is j’st GO code of i’st gene • We consider ordered GO codes g[m] where 1≤m≤α • α is a constant related to the input data • α≤Ω , Ω is the total number of GO code

  12. Algorithmic approach (2) • MaxPd is used to find LCA of C • Complexity : 3αn min

  13. MaxPd 40 30 20 10 0 40 30 40 40 20

  14. Algorithmic approach (3) • AverPd is used to find an optimal GO code g[m0] such that the average distance between g[m0] and each gene in C is smaller than that of any g[m] • Complexity : 3αn

  15. AverPd

  16. Discussion • Other algorithms consider GO term frequencies or compare specific GO term-related gene groups • In our modeling, the topological property of GO hierarchy is used.

  17. Utility • Biological assessment of the clustering results of DNA microarray data • Coupled with any clustering technique to predict the functional category of the unknown genes • Not only DNA microarray data, but also any kinds of group analysis with any ontology having an identical structure with GO

  18. Another approach • The length of GO code is about logα • Take one number of GO code each time pseudo-code: for 1≤k≤ logα cluster C by the kth number break if no cluster above n delete clusters under n end

  19. MaxPd

  20. Complexity • Worst case : O( nlogα) • Best case : O (n) • Alternate • Change the break time to cluster it in detail

  21. disadvantage • Cannot obtain information not contained in GO such as disease-related genes • GO terms on the same level have different level information • GO hierarchy is dynamic and flexible

More Related