1 / 14

Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees

Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees. Yufeng Wu and Jiayin Wang Department of Computer Science and Engineering University of Connecticut ISBRA 2010. Phylogenetic Tree and Hybridization Network. ρ. ρ. T’. T. Input phylogenies.

israel
Download Presentation

Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees Yufeng Wu and Jiayin Wang Department of Computer Science and EngineeringUniversity of Connecticut ISBRA 2010

  2. Phylogenetic Tree and Hybridization Network ρ ρ T’ T Input phylogenies • Reticulate Evolution: tree model no longer sufficient: e.g. hybrid speciation, horizontal gene transfer, recombination • Phylogenetic Tree:rooted, binary trees 1 2 3 4 1 3 2 4 • Hybridization Network:a directed acyclic graph displays two phylogenetic trees in a compact way delete two yellow edges delete two red edges 1 2 3 4 Hybridization event: nodes with in-degree two or more Hybridization Number Problem: compute the minimum hybridization events needed to construct a hybridization network displaying two trees

  3. ρ ρ A Related Problem: rSPR Distance Problem Input phylogenies • rSPR distance problem: the minimum number of rooted Subtree Prune and Regraft operations to transform T to T’ T’ T 1 2 3 4 1 3 2 4 rSPR distance of two phylogenies = the number of subtrees in Maximum Agreement Forest (MAF) - 1 (Hein, et al and Bordewich, et al) ρ ρ Prune 2 Prune 3 Regraft 3 Regraft 2 1 2 3 4 1 2 3 4 One rSPR operation Two rSPR operations

  4. Maximum Agreement Forest (MAF) ρ ρ Input phylogenies T T’ • Agreement Forest of T and T’:a set of subtrees s.t. • the two subtrees in AF have same topology in T and T’ • subtrees partition the given taxa • any two subtrees are vertex-disjoint 1 2 3 4 5 6 3 4 1 2 5 6 ρ Number of subtrees is 3 ρ 1 1 2 2 3 3 4 4 6 6 5 5 Maximum Agreement Forest Agreement Forest • Maximum Agreement Forest is an agreement forest of two trees where the number of subtrees is minimized

  5. Maximum Acyclic Agreement Forest (MAAF) Input phylogenies T T’ • Maximum Acyclic Agreement Forest: subtrees in MAF are acyclic 1 2 5 3 4 5 1 2 3 4 MAF Maximum Acyclic Agreement Forest 1 2 3 4 5 3 4 1 2 5 T12 Ti in AF is ancestral to Tj if the root of Ti s ancestral to the root of Tj in either T or T’ Cyclic Graph of AF T34 • Graph of Agreement Forest: GF(T,T’) • nodes in graph G correspond to trees in the AF • an edge from Ti to Tj if Ti is ancestral to Tj in the AF • When graph of the AF is acyclic, the AF is said to be acyclic

  6. Hybridization Number and Size of MAAF Input phylogenies T T’ • Hybridization Number of two original trees = the number of subtrees in a MAAF -1 (Baroni, et al, 2005) 1 2 3 4 5 3 4 1 2 5 For example, the size of the Maximum Acyclic Agreement Forest is 3, so the hybridization number is 3-1=2 Keep two red edges Keep two yellow edges Node 3 and 4 are hybridization events 1 2 5 3 4 Maximum Acyclic Agreement Forest 2 1 3 4 5 Hybridization Network

  7. Computation of the Exact Hybridization Number • Our Idea: Find a minimum collection of edge-cuts to break down the tree into MAAF • Previous Work:Bordewich, Semple, et al, (2007), HybridNumber ρ ρ • Our Approach:Use Integer Linear Programming (ILP) to minimize the number of subtrees • ObjectCi=1 if edge ei is cut • Subject to 3 groups of constraints to ensure the result AF is MAAF Input phylogenies e3 e4 e1 e2 e5 1 2 3 4 1 3 2 4 Triple incompatible ILP constraint for triple 1,2,3:C1+C2+C3+C4+C5≤1 More details for Triple Constraint and Pathway Constraint in Wu (2009) Triple Constraint Pathway Constraint Cyclic Constraint

  8. Graph of AF and Leaf Pair (LP) Graph Input phylogenies MRCA(3,4) T T’ • Difficulty:Graph of AF depends on AF • Leaf Pair (LP) Graph:a node corresponds to a pair of two distinct leaves • create an edge from lp(i,j) to lp(p,q) if: • the path between i and j is disjoint with that of p and q in both T and T’; and • lp(i,j) is ancestral to lp(p,q) in either T or T’ 1 2 3 4 5 3 4 1 2 5 MRCA(1,2) leaf pair lp(i,j) is ancestral to lp(p,q) if Most Recent Common Ancestor (MRCA) of (i,j) is ancestral to MRCA of (p,q) 1,2 3,4 Part of the Leaf Pair (LP) Graph

  9. Acyclicity of Leaf Pair Graph Input phylogenies T T’ • Realized Leaf Pair:if the two leaves are in the same subtree • Reduced LP Graph: A LP Graph for a certain AF • Lemma: For an AF, say F, GF(T,T’) is acyclic iff LP Graph(F) is acyclic • Add constraints naively: enumerate all cycles – impractical in most cases 1 2 3 4 5 3 4 1 2 5 1,2 3,4 1 2 5 3 4 1 2 3 4 5 Maximum Acyclic Agreement Forest Maximum Agreement Forest

  10. An Easy Way for Acyclic Constraints 3,7 1,3 Input phylogenies 4,5 1,2 • deal with Infeasible twin pair: Mi,j + Mp,q ≤ 1Mi,j=1 if the path between i and j is not cut • Enumerate all possible elementary cycles after reduce infeasible twin pairsin biological data, it seems a great reduction T’ T 4,6 ILP Constraint: M1,3 + M4,5 ≤ 1 1 2 3 4 5 6 7 4 5 6 1 2 3 7

  11. Speed up by Divide and Conquer Approach Input phylogenies T T’ 9 9 T2 T’2 • Subtree Reduction:replace a pendant subtree occurs identically in T and T’ with a new label • Subtree reduction keeps the Hybridization Number T1 T’1 8 8 1 2 3 1 2 3 4 5 5 4 6 7 7 6 • Cluster Reduction:replace a cluster common to T and T’, say T1 and T’1 with a new label, the rest part of two trees are T2 and T’2 • h(T,T’)=h(T1,T’1)+h(T2,T’2) See Bordewich, et al (2007) for detail

  12. Results on Simulation Datasets Simulation datasets are from Beiko and Hamilton (2006) Each pair of phylogenies has 100 leaves and generated by applying 10 rSPR operations on one tree HybridNumber is another software tool to compute exact Hybridization Number This version of HybridNumber downloaded in Oct. 2009 Later version of HybridNumber appears faster, but still very slow for EEEP data Running time (s)

  13. Results on Biological Datasets Tree pairs for a Grass (Poaceas) dataset from the Grass Phylogeny Working Group (2001) The results are gained under CPLEX environment The later version of HybridNumber gives roughly the same running time with ours but still not so scalable

  14. Acknowledgment Research is supported by National Science Foundation [IIS-0803440]andthe Research Foundation of University of Connecticut

More Related