1 / 37

Optimized ASMedian Algorithm to Find DCJ Median Genome of Three

Optimized ASMedian Algorithm to Find DCJ Median Genome of Three. Zhaoming Yin School of CSE, Georgia Tech. Sequence Alignment.

mendel
Download Presentation

Optimized ASMedian Algorithm to Find DCJ Median Genome of Three

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimized ASMedian Algorithm to Find DCJ Median Genome of Three Zhaoming Yin School of CSE, Georgia Tech

  2. Sequence Alignment Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. ...

  3. Genome Rearrangement In 1980s Jeffrey Palmer studied evolution of plant organelles by comparing mitochondrial genomes of the cabbage and turnip, 99% similarity between genes, These surprisingly identical gene sequences differed in gene order, This study helped pave the way to analyzing genome rearrangements in molecular evolution. 1 2 3 4 5 6 7 8 9 10 Inversion: 1 2 –6 –5 -4 -3 7 8 9 10 Transposition: 1 2 7 8 3 4 5 6 9 10 Inverted Transposition: 1 2 7 8 –6 -5 -4 -3 9 10

  4. Genome Rearrangement http://ai.stanford.edu/~serafim/CS374_2006/presentations/lecture17.ppt

  5. Foundamentals Maximal Parsimony Phylogenyis to optimize each ancestral nodeof an unrooted phylogeny in terms of its three or more immediate neighbours,modern or ancestral, and to iterate across the tree until convergence of theobjective function (to a local optimum) at all nodes. Andrew Wei Xu and David Sankoff, Decompositions of multiple breakpoint graphs and rapid exact solutions to the median problem., K.A. Crandal l and J. Lagergren (Eds.): Proceedings of the Workshop on Algorithms in Bioinformatics, WABI 2008, Lecture Notes in Bioinformatics 5251,Springer.

  6. Break Point Graph and DCJ Distance 1 2 -1 2 0/+1 1/-1 2/+2 3/-2 1/-1 0/+1 2/+2 3/-2 11/-6 0/+1 1/-1 2/+2 3/-2 4/+3 5/-3 6/+4 7/-4 8/+5 9/-5 10/+6 1 2 3 4 5 6 1 -5 -2 3 -6 -4 # cycles # genes

  7. Multiple Break Point Graph(MBG)/DCJ Median -6 -3 -2 1 2 3 4 5 6 +5 +3 1 -5 -2 3 -6 -4 +1 -4 +4 +2 -1 1 3 5 -4 6 -2 +6 -5 1 -5 -3 2 -4 6

  8. Subgraph/Decomposer -6 -3 -2 1 2 3 4 5 6 +5 +3 1 -5 -2 3 -6 -4 +1 -4 +4 +2 -1 1 3 5 -4 6 -2 Subgraph +6 -5 1 -5 -3 2 -4 6 H-crossing

  9. Adequate Subgraph Definition: In an MBG for a set of genomes G, a connected subgraph H of size m is an adequate subgraph if cmax(H) ≥ 1/2mNG; it is strongly adequate if cmax(H) >1/2mNG. (m is the size of node in the subgraph, NG is the size of genome, which is 3 for the median of three problem). Property: A Adequate Subgraph is simple, if it does not contain another adequate subgraph. Lemma: A Adequate Subgraph is a decomposer.

  10. Adequate Subgraph

  11. AS Detechtion and MBG Decomposition 11 5 3 8 0 4 7 6 2 1 10 9

  12. AS Detechtion and MBG Decomposition 11 5 2 5 0 3 4 6 0 1 7 6

  13. AS Detechtion and MBG Decomposition 11 5 2 5 0 3 4 6 1 0 7 6

  14. 4 3 5 2 6 1 7 8 AS Detechtion and MBG Decomposition 4 4 3 3 5 5 4 3 2 2 5 6 6 2 6 1 1 7 7 1 8 8 7 8 4 3 5 2 6 1 7 8 4 3 5 2 6 4 4 3 3 5 5 1 2 2 6 6 7 8 1 1 7 7 8 8

  15. Branch and Bound Algorithm upper bound lower bound upper bound lower bound continue search ........................ upper bound lower bound upper bound upper bound ............. lower bound lower bound

  16. Branch and Bound Algorithm(I/O) If there is no nodes that has the current upper bound, decrease it. No element in the memory, load others from disk.

  17. Branch and Bound Algorithm(I/O) Get a intermediate sub- graph, and check if it could be trimed, or it is the final solution. If too much elems in the memory store them in the disk.

  18. Branch and Bound Algorithm--Complexity AS found O((1 − P)v + Pv2). AS not found upper bound Step 1: Examine O(v) upper bound Step 1: Examine O(v) lower bound lower bound Step 2: Expand O(v2) ..... Step 2: Expand O(v) upper bound upper bound ..... upper bound Step 3: Evaluate O(v2) Step 3: Evaluate O(v) lower bound lower bound lower bound

  19. What's wrong with expand 1) The overhead to store the foot print of best first search makes the branch and bound algorithm an I/O bound algorithm. 2) After each step of edge shrinking, the renaming process is redundant. 1 3 4 5 7 8 1 2 3 4 5 6 1 2 3 4 5 6 7 8 Original Shrinkage Renaming 3) So many mallocs, which will make the algorithm hard to achieve good parallel efficiency (only about 3 times acceleration using 8 threads). thread malloc Parallel thread malloc malloc ... memory memory thread malloc

  20. Reduce expand's time/space complexity from O(v2) to O(v) 1, Do not rename vertices after shrinkage 2, keep the foot-print in the search node, instead of iMBG (For example: root graph has 500 nodes, it's child node is shrinked from vertices 1,2,3,4, so just keep 1,2,3,4 instead of 500-4 vertices) 3, when there is no ASs found, there will be v-1 child node, but they share the same parent node (same footprint), so each child node store a two-verteces footprint + a pointer to the parent footprint(total 2v) storage. 1 2 3 4 5 6 7 8 1 3 4 5 7 8 Original Shrinkage

  21. Upperbound and Lowerbound-Upperbound DCJ distance between genomes obey triangular inequality. So: Given Three genomes G1 G2 G3, the median genome will have the distance between them: Because the distance is defined by: Evaluate is in fact all about cycle counting! Therefore, the upperbound for circle number is: Similarly lower bound:

  22. Reduce Examine from O(v2) to O(v) -6 -3 -2 +5 +3 +1 -4 +4 +2 -1 +6 -5

  23. Reduce Examine from O(v2) to O(v) -6 -3 -2 +5 Which means when there is no ASs found the update of cycle number could only be +1 -1 or 0 +3 +1 -4 +4 +2 -1 +6 -5

  24. Reduce Examine from O(v2) to O(v) Definition 1: A i-j path is a path in the graph that they are visted of color i and color j one by one. Definition 2: A i-j cycle is a cycle in the graph that they are visted of color i and color j one by one. If two vertices a and b are in the same i-j path they are in the same i-j cycle.

  25. Reduce Examine from O(v2) to O(v) Lemma 1: If we shrink one edge connecting two vertices a and b, if a and b are indifferent i-j cycle, after shrinkage, the number of i-j cycle in the new iMBG will decrease 1. a b a b

  26. Reduce Examine from O(v2) to O(v) Definition3: Suppose x, y w and z are in the same cycle, start from some vertex in the cycle v, visit this cycle and get each vertex u number of hops from v, we call this number rank of the vertex u mark as rank(u). Lemma 2: If we shrink one edge connecting two vertices a and b, if a and b are in the same i-j cycle, after shrinkage, Case 1 if rank(x) = rank(z) or rank(y) = rank(z), the cycle number will stay the same. Case 2 if x = b or y = b or w = a or z = a, the cycle number will stay the same, Case 3 if rank(w) < rank(x) and !(rank(z) < rank(x)andrank(z) > rank(w)), the cycle number will stay the same. Case 4 if rank(w) > rank(x) and rank(z) > rank(x)andrank(z) < rank(w)), the cycle number will stay the same. Case 5 for all the other cases the cycle number will be increased.

  27. Reduce Examine from O(v2) to O(v) Case 1 if rank(x) = rank(z) or rank(y) = rank(z), the cyclenumber will stay the same. a b a b

  28. Reduce Examine from O(v2) to O(v) Case 2 if x = b or y = b or w = a or z = a, the cycle number will stay the same. a b a b

  29. Reduce Examine from O(v2) to O(v) Case 3 if rank(w) < rank(x) and !(rank(z) < rank(x)andrank(z) > rank(w)), the cycle number will stay the same. Case 4 if rank(w) > rank(x) and rank(z) > rank(x)andrank(z) < rank(w)), the cycle number will stay the same. a b a b

  30. Reduce Examine from O(v2) to O(v) Case 5 for all theother cases the cycle number will be increased a b a b

  31. Reduce Examine from O(v2) to O(v) We just need to know two inforamtions to update the cycle number when there is no ASs detected: 1) Which cycle the vertices are in. 2) The rank of the vertices in their cycle. This information can be collected by just visiting the MBG once!

  32. Algorithm Summary Get the node by exapnd and shrink the edges represent by the foot prints

  33. Algorithm Summary Deal with search nodes that detect ASs, no difference from Wei Xu's Algorithm

  34. Algorithm Summary Deal with search nodes that no ASs detected, using O(v) time/space complexity algorithm.

  35. Result

  36. Future work 1) parallelization of our code 2) Incoporation of our algorithm into GRAPPA/COGNAC 3) extend our algorithm to deal with multiple chromosome DCJ median problem

  37. Reference [1] Andrew Wei Xu and David Sankoff, Decompositions of multiplebreakpoint graphs and rapidexact solutions to the median problem.,K.A. Crandal l and J. Lagergren (Eds.): Proceedings oftheWorkshop on Algorithms in Bioinformatics, WABI 2008, LectureNotes in Bioinformatics5251,Springer. [2] Yancopoulos, S., Attie, O., Friedberg, R.: E?cient sorting of genomic permutations by translocation, inversion and block interchange. Bioinform. 21, 3340ĺC3346 (2005) [3] Andrew Wei Xu, A Fast and Exact Algorithm for the Median of three Problem: a GraphDecomposition Approach., Journal of computational biology, 2009, 16(10), 1-13.

More Related