1 / 28

Zhaoming Yin Bader-Polo Joint Group Meeting, Nov 11, 2013

DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents. Zhaoming Yin Bader-Polo Joint Group Meeting, Nov 11, 2013. Contribution. Research Aspect

reina
Download Presentation

Zhaoming Yin Bader-Polo Joint Group Meeting, Nov 11, 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov 11, 2013

  2. Contribution • Research Aspect -A framework to solve the maximum parsimonious tree with the input of unequal genome contents. -Proved Adequate subgraph theory is applicable in unequal contents data which reduces search space. -provide a benchmark for the HPC community. • Engineering Aspect -Implement software with many state of the art features such as supertree method, GAS initialization method, spectral partition etc. -The software can produce a tree with not only topologies, but also type/number of different evolution events (visualization!).

  3. Why Phylogenetic Tree Problem is Hard? • For N genomes, there are (N-3)!! number of possible tree topologies. • For each topology, we need to compute at least one different median, the possible median order are (g-2)!! . g is the number of genes. • To validate each possible median, if the gene content has duplications, it’s NP hard. • So the complexity type of computing the MP tree with uneuqal contents genomes is: NP hard over NP hard over NP hard!

  4. Phylogenetic Tree This picture presents the phylogeny of the “12 Drosophila.” From http://insects.eugenes.org/species

  5. Maximum Parsimony Concept 5 6 5 6 4 2 3 4 1 1 3 2 5 6 1 4 2 3 Of all possible topologies, the maximum parsimonious tree is the one that has the minimum total tree length

  6. Genome Rearrangement http://ai.stanford.edu/~serafim/CS374_2006/presentations/lecture17.ppt

  7. Genome Rearrangement In 1980s Jeffrey Palmer studied evolution of plant organelles by comparing mitochondrial genomes of the cabbage and turnip, 99% similarity between genes, These surprisingly identical gene sequences differed in gene order, This study helped pave the way to analyzing genome rearrangements in molecular evolution. 1 2 3 4 5 6 7 8 9 10 Inversion: 1 2 –6 –5 -4 -3 7 8 9 10 Transposition: 1 2 7 8 3 4 5 6 9 10 Inverted Transposition: 1 2 7 8 –6 -5 -4 -3 9 10

  8. Genome Median Computation 5 6 5 6 4 2 3 3 1 4 2 1 4 4 3 3 1 1 5 5 6 6 2 2

  9. Genome Median Computation 1,2,3 4 1,-3,-2 -2,-1,3 3 1 5 6 1,2,3 = 2 moves 2,-1,3 = 5 moves ….. 2

  10. Step 1: Spectral Partition

  11. Step 2: Compute MP Tree for Each Sub-Disk

  12. 4 3 5 2 6 1 7 8 Step 2-1: How to Compute Median (BNB) 4 4 3 3 5 5 4 3 2 2 5 6 6 2 6 1 1 7 7 1 8 8 7 8 4 3 5 2 6 1 7 8 4 3 5 2 6 4 4 3 3 5 5 1 2 2 6 6 7 8 1 1 7 7 8 8

  13. Step 2-2: How to Compute Median (LK) …………………. stop

  14. Step 2-2: How to Evaluate Median 1 1, 2, 3, 4, 3, 6, 5 med 1, 2, 3, 3, 4, 6, 5 2 1, 2, 3, 4, 6, 3, 5 3 1, 2, 5, 4, 6, 3, 3 Dis(m,1)+Dis(m,2)+Dis(m,3)

  15. Step 2-2: How to Evaluate Median 1, 2, 3, 3, 4, 6, 5 1, 2, 3, 4, 3, 5 Find a mapping first (NP hard) dis=1 1, 2, 3, 3, 4, 6, 5 -2, -1, 3, 3, 4, 5 Complete the loss (polynomial) dis =2 1, 2, 3, 4, 6, 5 -2, -1, 3, 4, 6, 5 Compute DCJ (polynomial) dis =3 1, 2, 3, 4, 6, 5 1, 2, 3, 4, 6, 5

  16. Step 3: Merge Disks Decomposition of The disks Construct a tree for each disk Merge the tree using A specific consensus method: Strict, majority etc… Disambiguation

  17. Step 4: Initialization Init by insertion Which is local 4 3 1 5 6 c X 2 b 1 2 e Init by prospection Which is global. d

  18. Step5: Iterative Refinement 1 2 a b 3 4

  19. Review • Step 1: Spectral partition • Step 2: Subtree construction • Step 3: Supertree merge • Step 4: Initialization of complete tree using General Adequate Subgraph (GAS) method. • Step 5: Iterative Refinement until the complete tree converged.

  20. Result—Simulated Data seed #Theta+ #gamma+ #phi operations We grow our own tree We know the total number of evolution event in the model tree

  21. Result--Accuracy %of duplication 0.1 % of loss 0.1 Theta is % of inversion There are 8 species 2*8-3 =13edges. So the average accuracy is ~90%

  22. Result – Real Data SCRaMbLE Matrix • We can represent a SCRaMbLEd strain by its vector. • The sign gives the orientation. • The color encodes the position in the synthetic chromosome.

  23. Result – Real Data #inversion:#insertion/deletion:#duplication

  24. Parallel Method [Bader 05] Load Balancing Parallel search

  25. Experimental Results (Parallel)

  26. Why Many-core BnB? • So many distributed memory MIP BnB frameworks (PICO, PEBBL, ALPS, COIN-OR). • Load balance of distributed BnB is highly relied on Ramp up, run time load balancing is not efficient. • But nowadays Peta-flops machines are mostly hybrid systems(distributed + many-core (or accelerators)).

  27. Experimental Results (Intel Phi knapsack)

More Related