1 / 33

Zhaoming Yin School of CSE, Georgia Tech

Analysis of Real World NP-Complete Graph Problem: DCJ Median Algorithm to Find Ancestor of Genome of Three. Zhaoming Yin School of CSE, Georgia Tech. Foundamentals.

jerry-wolfe
Download Presentation

Zhaoming Yin School of CSE, Georgia Tech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Real World NP-Complete Graph Problem: DCJ Median Algorithm to Find Ancestor of Genome of Three Zhaoming Yin School of CSE, Georgia Tech

  2. Foundamentals Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. ...

  3. Foundamentals In 1980s Jeffrey Palmer studied evolution of plant organelles by comparing mitochondrial genomes of the cabbage and turnip, 99% similarity between genes, These surprisingly identical gene sequences differed in gene order, This study helped pave the way to analyzing genome rearrangements in molecular evolution. 1 2 3 4 5 6 7 8 9 10 Inversion: 1 2 –6 –5 -4 -3 7 8 9 10 Transposition: 1 2 7 8 3 4 5 6 9 10 Inverted Transposition: 1 2 7 8 –6 -5 -4 -3 9 10

  4. Foundamentals Maximal Parsimony Phylogeny is to optimize each ancestral node of an unrooted phylogeny in terms of its three or more immediate neighbours, modern or ancestral, and to iterate across the tree until convergence of the objective function (to a local optimum) at all nodes.

  5. Break Point Graph 1 2 -1 2 0/+1 1/-1 2/+2 3/-2 1/-1 0/+1 2/+2 3/-2 11/-6 0/+1 1/-1 2/+2 3/-2 4/+3 5/-3 6/+4 7/-4 8/+5 9/-5 10/+6 1 2 3 4 5 6 1 -5 -2 3 -6 -4

  6. MBG/0-Matching -6 -3 -2 1 2 3 4 5 6 +5 +3 1 -5 -2 3 -6 -4 +1 -4 +4 +2 -1 1 3 5 -4 6 -2 +6 -5 1 -5 -3 2 -4 6

  7. Subgraph/Decomposer -6 -3 -2 1 2 3 4 5 6 +5 +3 1 -5 -2 3 -6 -4 +1 -4 +4 +2 -1 1 3 5 -4 6 -2 Subgraph +6 -5 1 -5 -3 2 -4 6 H-crossing

  8. Adequate Subgraph Definition: In an MBG for a set of genomes G, a connected subgraph H of size m is an adequate subgraph if cmax(H) ≥ 1/2mNG; it is strongly adequate if cmax(H) >1/2mNG. (m is the size of node in the subgraph, NG is the size of genome, which is 3 for the median of three problem). Property: A Adequate Subgraph is simple, if it does not contain another adequate subgraph. Lemma: A Adequate Subgraph is a decomposer.

  9. Adequate Subgraph

  10. Algorithm: AS1() major set for each v do if v[0]=v[1] or v[0]=v[2] or v[1]=v[2] these two points are AS; the edge conncecting them is major set; endif endfor

  11. Adequate Subgraph √ √

  12. Algorithm: AS2() c c c1 c2 c2 c1 (1) (2) c2 c1 c c1 c c2 (1) (2) for each color c do for each v do if v[c1][c]=v[c][c2](1)or v[c2][c]=v[c][c1] (2) or v[c2][c1]=v[c][c2] (3) or v[c1][c2]=v[c][c1] v,v[c],v[c1],v[c2] are AS; (1), major set is (v,v[c1]) and (v[c],v[c2]) or (2), major set is (v,v[c2]) and (v[c],v[c1]) or (3), major set is (v,v[c]) and (v[c1],v[c][c2]) or (4), major set is (v,v[c]) and (v[c2],v[c][c1]) endif endfor endfor

  13. Algorithm: AS2() c c2 c1 c1 c2 c2 c1 c c (2) (1) for each color c do for each v do if v[c1][c]=v[c][c1] and (v[c1]=v[c][c2] || v[c1]=v[c][c2) (1)or v[c1][c2]=v[c][c1] and (v[c1]=v[c][c2] || v[c1]=v[c][c2) (2) v,v[c],v[c1],v[c2] are AS; (1), major set is (v,v[c1]) and (v[c],v[c2]) or (2), major set is (v,v[c2]) and (v[c],v[c1]) endif endfor endfor

  14. Algorithm: AS2() for each color c do for each v do if v[c1][c]=v[c][c1] and (v[c2][c]=v[c][c2] and v[c1]!=v[c][c2] and v[c2] !v[c][c1] v,v[c],v[c1],v[c2] are AS; (1), major set is (v,v[c1]) and (v[c],v[c2]) endif endfor endfor c2 c c1

  15. Algorithm: AS2() for each color c do for each v do if v[c1][c]=v[c][c1] and type three is not find v,v[c],v[c1],v[c2] are AS; (1), major set is (v,v[c1]) and (v[c],v[c2]) and (v,v[c]) and (v[c1],v[c][c1]) endif endfor endfor In this case, there are two major sets c c1

  16. Adequate Subgraph √ √ √ √ √ √

  17. Algorithm: AS4()--type 5-3-5 p1 p2 c2 c1 po1 core po2 c0 po11 po22 po0

  18. Adequate Subgraph √ √ √ √ √ √ √ √ √ √

  19. Algorithm: AS4()

  20. Adequate Subgraph √ √ √ √ √ √ √ √ √ √ √ √ √

  21. Algorithm: AS4()

  22. Algorithm: AS4()

  23. Adequate Subgraph √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √

  24. Algorithm: Shrink() 11 5 3 8 0 4 7 6 2 1 10 9

  25. Algorithm: Shrink() 11 5 2 5 0 3 4 6 0 1 7 6

  26. Algorithm: Shrink() 11 5 2 5 0 3 4 6 1 0 7 6

  27. Branch and Bound Algorithm

  28. Branch and Bound Algorithm(1) If there is no brach that has the current upper bound, decrease it. No element in the memory, load others from disk.

  29. Branch and Bound Algorithm(2) Get a intermediate sub- graph, and check if it could be trimed, or it is the final solution. If too much elems in the memory store them in the disk.

  30. Upperbound and Lowerbound-Upperbound DCJ distance between genomes obey triangular inequality. So: Given Three genomes G1 G2 G3, the median genome will have the distance between them: Because the distance is defined by: therefore, the upperbound for circle number is:

  31. Upperbound and Lowerbound-Upperbound DCJ distance between genomes obey triangular inequality. So: Given Three genomes G1 G2 G3, the median genome will have the distance between them: Because the distance is defined by: therefore, the upperbound for circle number is:

  32. Best First Search Because best first search can ensure that the searching space is minimal. However, it needs much space to store the foot print. Which makes the branch and bound algorithm an I/O bound algorithm. 1 k k+1 k+1 2 3 4 5 6 7 7 3 1 8 9 10 2 9 5 10 6 4 8

  33. Reference [1] Andrew Wei Xu and David Sankoff, Decompositions of multiple breakpoint graphs and rapid exact solutions to the median problem., K.A. Crandal l and J. Lagergren (Eds.): Proceedings of the Workshop on Algorithms in Bioinformatics, WABI 2008, Lecture Notes in Bioinformatics 5251,Springer. [2] Yancopoulos, S., Attie, O., Friedberg, R.: E?cient sorting of genomic permutations by translocation, inversion and block interchange. Bioinform. 21, 3340ĺC3346 (2005) [3] Andrew Wei Xu, A Fast and Exact Algorithm for the Median of three Problem: a Graph Decomposition Approach., Journal of computational biology, 2009, 16(10), 1-13.

More Related