1 / 30

On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities

On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities. Ilan Gronau Shlomo Moran Technion – Israel Institute of Technology Haifa, Israel. B E G H L M. B E G H L M. D. T. B E G H L M. 4. 2. 1. 5. 7. 3. reconstruct. calculate. B E G H L M. 4. 3.

Download Presentation

On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities Ilan Gronau Shlomo Moran Technion – Israel Institute of Technology Haifa, Israel

  2. B E G H L M B E G H L M D T B E G H L M 4 2 1 5 7 3 reconstruct calculate B E G H L M 4 3 1 2 B E M L G H Pairwise-Distance Based Reconstruction DT M E L G H B

  3. B E G H L M B E G H L M B E G H L M B E G H L M Optimization Criteria We wish the tree-metric DT to approximate simultaneously the pairwise distances in D. = D should be “close” to DT = Two “closeness” measures studied here: Maximal Difference(l∞) • Maximal Distortion

  4. B E G H L M B E G H L M Maximal Difference (l∞)vs. Maximal Distortion B E G H L M D = DT = B E G H L M Goal: Find optimal T, which minimizes the maximal difference/distortion between D and DT

  5. Previous works on Approximating Dissimilarities by Tree Distances • Negative results: (NP-hardness) • Closest tree-metric (even ultrametric ) to dissimilarity matrix under l1 l2 [Day ‘87] • Closest tree-metric to dissimilarity matrix under l∞ [ABFPT99] • Hard to approximate better than 1.125 • Implicit:Hard to approximate closest MaxDist tree within any constant factor • Positive results: • Closest ultrametric to dissimilarity matrix under l∞ [Krivanek ‘88] • 3-approximation of closest additive metric to a given metric[ABFPT99] • (implicit 6-approximation for general dissimilarity matrices)

  6. This Work: Triplet-Distances – Distances to Triplets Midpoints C(i,j,k) τT (i ; jk) • τT (i ; jk) = τT (i ; kj) • τT (i ; ij) = 0 • τT (i ; jj) = DT (i, j) i k j

  7. …is realizable by a 3-tree j i 5 3 4 C(i,j,k) k Triplet-Distances Defined by 2-Distances • Each distance Matrix D defines 3-trees • τ(i ; jk)= ½[D(i,j)+D(i,k)-D(j,k)]. i Any metric on 3 taxa… 8 9 j 7 k

  8. BB BE BG….. LL LM MM B E G H L M T T 4 2 1 5 7 3 4 3 1 2 B E M L G H Triplet-Distance Based Reconstruction τ(i ; jk)= ½[D(i,j)+D(i,k)-D(j,k)]. BB BE BG….. LL LM MM B E G H L M reconstruct 

  9. Why use Triplet-Distances? 1. They enable more accurate estimations of 2-distances. 2. They are used (de facto) by known reconstruction algorithms

  10. B E G H L M B E G H L M E (Maximum Likelihood) 13 (In calculating D(H,E), all other taxa are ignored H Improved Estimations of Pairwise Distances: “Information Loss” D= Calculate D(H,E)

  11. B=(..AAGT..) L=(..AATA..) G=(..CCGT..) (..****..) (..****..) M=(..CGCG..) 2 3 4 2 (..****..) (..****..) H= (..AACG..) H= (..AACG..) E=(..CAGA..) E=(..CAGA..) 1 5 3 3 H= (..AACG..) H= (..AACG..) E=(..CAGA..) E=(..CAGA..) Improved Estimations (cont): • Estimate D(H,E) by calculating all the 3-trees on {H,E,X:XH,E} • (Or: calculate just one 3-tree, for a “trusted” 3rd taxon X : • V. Ranwez, O. Gascuel, Improvement of distance-based phylogenetic methods by a local maximum likelihood approach using triplets, Mol.Biol. Evol. 19(11) 1952–1963. (2002)

  12. T BB BE BG….. LL LM MM 4 B E G H L M 2 1 5 7 3 B E G H L M 4 3 1 2  B E M L G H D (Implicit) use of Triplet-Distances in 2-Distance Reconstruction Algorithms τ(i ; jk)= ½[D(i,j)+D(i,k)-D(j,k)].

  13. i r j 1st use :“Triplet Distances from a Single Source”: • Fix a taxon r, and construct a tree T which minimizes: • Optimal solution is doable in O(n2) time, and is used eg in : • (FKW95): Optimal approximation of distances by ultrametric trees. • (ABFPT99): The best known approximation of distances by general trees • (BB99): Fast construction of Buneman trees.

  14. 2nd use:Saitou&Nei Neighbour Joining The neighbors-selection criterion of NJ selects a taxon-pair i,j which maximizes the sum : r r i r r r r j r r

  15. Previous Works on Triplet-Dissimilarities/Distances • I. Gronau, S. MoranNeighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances, Journal of Computational Biology 14(1) pp. 1-15 (2007). • Works which use the total weights of 3 trees: • S. Joly, GL Calve, Three Way Distances, Journal of Classification 12 pp. 191-205 (1995) • L. Pachter, D. Speyer Reconstructing Trees from Subtrees Weights , Applied Mathematics Letters 17 pp. 615-621 (2004) • D. Levy, R. Yoshida, L. Pachter, Beyond pairwise distances: Neighbor-joining with phylogenetic diversity estimates, Mol. Biol. Evol. 23(3) 491–498 (2006) .

  16. Summary of Results • Results for Maximal Difference (l∞): • Decision problem is NP-Hard •  IS there a tree T s.t. ||τ,τT ||∞ ≤ Δ ? • Hardness-of-approximation of optimization problem •  Finding a tree T s.t. ||τ,τT ||∞ ≤ 1.4||τ,τOPT||∞ • A 15-approximation algorithm •  Using the 6-approximation algorithm for 2-dissimilarities from [ABFPT99] • Result forMaximal Distortion: • Hardness-of-approximation within any constant factor

  17. literals clause Satisfying assignment: NP Hardness of the Decision Problem We use a reduction from 3SAT (the problem of determining whether a 3CNF formula is satisfiable) We show: If one can determine for (τ,Δ) whether there exists a tree T s.t. ||τ,τT ||∞ ≤ Δ,then one can determine for every 3CNF formula φ whether it is satisfiable.

  18. The Reduction Given a 3CNF formula φ we define triplet distances  and an error bound Δ which enforce the output tree to imply a satisfying assignment to φ. • The set of taxa: • Taxa T , F. • A taxon for every literal ( ). • 3 taxa for every clause Cj ( y j1, y j2, y j3).

  19. v w Properties Enforced by the Input (,Δ) • One the following can be enforced on each taxa triplet (u,v,w): • taxon u is closeto Path(v,w), or • taxon u is farto Path(v,w) u

  20. T F Enforcing Truth Assignmaent • A truth assignment to φis implied by the following: • TisfarfromF • For each i, isfar from , and both of and areclose toPath(T ,F) Thus we set xi =T iff xi is close to T.

  21. l1 F l2 l3 Enforcing Clauses-Satisfaction A clause C=( l1 l2 l3 )is satisfied iff At least one literal liis true, i.e. is close toT. (l1 l2 l3 ) is satisfiediff it is not like this We need to guarantee that all clauses avoid the above by the close/far relations.

  22. But we don’t know which two paths Clauses-Satisfaction (cont) -(l1 l2 l3 )is satisfied iff out of the three paths: Path(l1 , l2),Path(l1 , l3),Path(l2 , l3), at least two paths areclose toT . l3 T F l1 l2

  23. y1 y2 y3 l3 T F l1 l2 Clauses-Satisfaction (cont) We attach a taxon to each such path: y1is close toPath( l2,l3) y2is close toPath( l1,l3) y3is close toPath( l1,l2) (l1 l2 l3 )is satisfied iff at least twoyi’s can be locatedclose toT.…

  24. y1 y2 y3 l3 T F l1 l2 Clauses-Satisfaction (end) … and, at least two of theyi’scan be located close toT Path( y2,y3), Path( y1,y3), Path( y1,y2), are close to T So, (l1 l2 l3 )is satisfied iff all the above paths are close toT

  25. y22 y13 y12 y21 y11 y23 α α T 2β F α α vT vF α α Construction Example φ is satisfiable  there is a tree T which satisfies all bounds A1τT (T , F ) ≥ 2α+2β A2i=1..n :τT (T ; ) ≤α ; τT (F ; ) ≤α B1j=1..m :τT (y j1; l j2 l j3 ) ≤α ; τT (y j2; l j1 l j3 ) ≤α ; τT (y j3; l j1 l j2 ) ≤α B2j=1..m :τT (y j1; T F ) ≥α ; τT (y j2; T F ) ≥α ; τT (y j3; T F ) ≥α B3j=1..m :τT (T ; y j2 y j3 ) ≤α ; τT (T ; y j1 y j3 ) ≤α ; τT (T ; y j1 y j2 ) ≤α

  26. Hardness of Approximation Results By “stretching” the close/far restrictions, the following problems are also shown NP hard: • Approximating Maximal Difference • Finding a tree T s.t. ||τ,τT ||∞ ≤ 1.4||τ,τOPT||∞ • ApproximatingMaximal Distortion: • Finding a tree T s.t. • MaxDist(τ,τT )≤ CMaxDist(τ,τOPT) for any constantC Details in: I. Gronau and S. moran, On The Hardness of Inferring Phylogenies from Triplet-Dissimilarities, Theoretical Computer Science 389(1-2), December 2007, pp. 44-55.

  27. Open Problems/Further Research • Extending hardness results for 3-diss tables induced by 2-diss matrices • (τ(i ; jk)= ½[D(i,j)+D(i,k)-D(j,k)] ) • Extending hardness results for “naturally looking” trees • (binary trees with constant-bounded edge weights) • Check Performance of NJ when neighbor selection formula computed from “real” 3-distances. • Devise algorithms which use 3-distances as input. • Does optimization of 3-diss lead to good topological accuracy (under accepted models of sequence evolution) • (it is known that optimization of 2-diss doesn’t lead to good topological accuracy)

  28. Thank You

  29. 1 5 2 4 6 10 1 2 7 • Compute distances between all taxon-pairs • Find a tree(edge-weighted) best-describing the distances Distance-Based Phylogenetic Reconstruction

  30. y22 y13 y12 y21 y11 y23 α α 2β α α T F vT vF α α The Reduction – τ(φ) A1τT (T , F ) ≥ 2α+2β A2i=1..n :τT (T ; ) ≤α ; τT (F ; ) ≤α B1j=1..m :τT (y j1; l j2 l j3 ) ≤α ; τT (y j2; l j1 l j3 ) ≤α ; τT (y j3; l j1 l j2 ) ≤α B2j=1..m :τT (y j1; T F ) ≥α ; τT (y j2; T F ) ≥α ; τT (y j3; T F ) ≥α B3j=1..m :τT (T ; y j2 y j3 ) ≤α ; τT (T ; y j1 y j3 ) ≤α ; τT (T ; y j1 y j2 ) ≤α • In our constructed tree: • All 2-distances are in[2α , 2α+2β]. • All 3-distances are in[α , α+2β]. •  Δ=β. A1τ(T , F ) = 2α+3β A2i=1..n :τ(T ; ) = α-β ; τ(F ; ) = α-β B1j=1..m :τ(y j1; l j2 l j3 ) = α-β ; τ(y j2; l j1 l j3 ) = α-β ; τ(y j3; l j1 l j2 ) = α-β B2j=1..m :τ(y j1; T F ) = α+β ; τ(y j2; T F ) = α+β ; τ(y j3; T F ) = α+β B3j=1..m :τ(T ; y j2 y j3 ) = α-β ; τ(T ; y j1 y j3 ) = α-β ; τ(T ; y j1 y j2 ) = α-β Other2-distances: τ(s , t) = 2α+2β Other3-distances: τ(s ; t u) = α+2β

More Related