on the hardness of inferring phylogenies from triplet dissimilarities
Download
Skip this Video
Download Presentation
On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities

Loading in 2 Seconds...

play fullscreen
1 / 30

On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities - PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on

On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities. Ilan Gronau Shlomo Moran Technion – Israel Institute of Technology Haifa, Israel. B E G H L M. B E G H L M. D. T. B E G H L M. 4. 2. 1. 5. 7. 3. reconstruct. calculate. B E G H L M. 4. 3.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities' - gianna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
on the hardness of inferring phylogenies from triplet dissimilarities
On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities

Ilan Gronau Shlomo Moran

Technion – Israel Institute of Technology

Haifa, Israel

pairwise distance based reconstruction

B E G H L M

B E G H L M

D

T

B E G H L M

4

2

1

5

7

3

reconstruct

calculate

B E G H L M

4

3

1

2

B

E

M

L

G

H

Pairwise-Distance Based Reconstruction

DT

M

E

L

G

H

B

optimization criteria

B E G H L M

B E G H L M

B E G H L M

B E G H L M

Optimization Criteria

We wish the tree-metric DT to approximate simultaneously the pairwise distances in D.

= D

should be “close” to

DT =

Two “closeness” measures studied here:

Maximal Difference(l∞)

  • Maximal Distortion
maximal difference l vs maximal distortion

B E G H L M

B E G H L M

Maximal Difference (l∞)vs. Maximal Distortion

B E G H L M

D =

DT =

B E G H L M

Goal: Find optimal T,

which minimizes the maximal difference/distortion between D and DT

previous works on approximating dissimilarities by tree distances
Previous works on Approximating Dissimilarities by Tree Distances
  • Negative results: (NP-hardness)
  • Closest tree-metric (even ultrametric ) to dissimilarity matrix under l1 l2 [Day ‘87]
  • Closest tree-metric to dissimilarity matrix under l∞ [ABFPT99]
    • Hard to approximate better than 1.125
    • Implicit:Hard to approximate closest MaxDist tree within any constant factor
  • Positive results:
  • Closest ultrametric to dissimilarity matrix under l∞ [Krivanek ‘88]
  • 3-approximation of closest additive metric to a given metric[ABFPT99]
    • (implicit 6-approximation for general dissimilarity matrices)
this work triplet distances distances to triplets midpoints
This Work: Triplet-Distances – Distances to Triplets Midpoints

C(i,j,k)

τT (i ; jk)

  • τT (i ; jk) = τT (i ; kj)
  • τT (i ; ij) = 0
  • τT (i ; jj) = DT (i, j)

i

k

j

triplet distances defined by 2 distances

…is realizable by a 3-tree

j

i

5

3

4

C(i,j,k)

k

Triplet-Distances Defined by 2-Distances
  • Each distance Matrix D defines 3-trees
  • τ(i ; jk)= ½[D(i,j)+D(i,k)-D(j,k)].

i

Any metric on 3 taxa…

8

9

j

7

k

triplet distance based reconstruction

BB BE BG….. LL LM MM

B E G H L M

T

T

4

2

1

5

7

3

4

3

1

2

B

E

M

L

G

H

Triplet-Distance Based Reconstruction

τ(i ; jk)= ½[D(i,j)+D(i,k)-D(j,k)].

BB BE BG….. LL LM MM

B E G H L M

reconstruct

why use triplet distances
Why use Triplet-Distances?

1. They enable more accurate estimations of 2-distances.

2. They are used (de facto) by known reconstruction algorithms

improved estimations of pairwise distances

B E G H L M

B E G H L M

E

(Maximum Likelihood)

13

(In calculating D(H,E),

all other taxa are ignored

H

Improved Estimations of Pairwise Distances:

“Information Loss”

D=

Calculate D(H,E)

improved estimations cont

B=(..AAGT..)

L=(..AATA..)

G=(..CCGT..)

(..****..)

(..****..)

M=(..CGCG..)

2

3

4

2

(..****..)

(..****..)

H= (..AACG..)

H= (..AACG..)

E=(..CAGA..)

E=(..CAGA..)

1

5

3

3

H= (..AACG..)

H= (..AACG..)

E=(..CAGA..)

E=(..CAGA..)

Improved Estimations (cont):
  • Estimate D(H,E) by calculating all the 3-trees on {H,E,X:XH,E}
  • (Or: calculate just one 3-tree, for a “trusted” 3rd taxon X :
  • V. Ranwez, O. Gascuel, Improvement of distance-based phylogenetic methods by a local maximum likelihood approach using triplets, Mol.Biol. Evol. 19(11) 1952–1963. (2002)
implicit use of triplet distances in 2 distance reconstruction algorithms

T

BB BE BG….. LL LM MM

4

B E G H L M

2

1

5

7

3

B E G H L M

4

3

1

2

B

E

M

L

G

H

D

(Implicit) use of Triplet-Distances in 2-Distance Reconstruction Algorithms

τ(i ; jk)= ½[D(i,j)+D(i,k)-D(j,k)].

1 st use triplet distances from a single source

i

r

j

1st use :“Triplet Distances from a Single Source”:
  • Fix a taxon r, and construct a tree T which minimizes:
  • Optimal solution is doable in O(n2) time, and is used eg in :
  • (FKW95): Optimal approximation of distances by ultrametric trees.
  • (ABFPT99): The best known approximation of distances by general trees
  • (BB99): Fast construction of Buneman trees.
2 nd use saitou nei neighbour joining
2nd use:Saitou&Nei Neighbour Joining

The neighbors-selection criterion of NJ selects a taxon-pair i,j which maximizes the sum :

r

r

i

r

r

r

r

j

r

r

previous works on triplet dissimilarities distances
Previous Works on Triplet-Dissimilarities/Distances
  • I. Gronau, S. MoranNeighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances, Journal of Computational Biology 14(1) pp. 1-15 (2007).
  • Works which use the total weights of 3 trees:
  • S. Joly, GL Calve, Three Way Distances, Journal of Classification 12 pp. 191-205 (1995)
  • L. Pachter, D. Speyer Reconstructing Trees from Subtrees Weights , Applied Mathematics Letters 17 pp. 615-621 (2004)
  • D. Levy, R. Yoshida, L. Pachter, Beyond pairwise distances: Neighbor-joining with phylogenetic diversity estimates, Mol. Biol. Evol. 23(3) 491–498 (2006) .
summary of results
Summary of Results
  • Results for Maximal Difference (l∞):
  • Decision problem is NP-Hard
  •  IS there a tree T s.t. ||τ,τT ||∞ ≤ Δ ?
  • Hardness-of-approximation of optimization problem
  •  Finding a tree T s.t. ||τ,τT ||∞ ≤ 1.4||τ,τOPT||∞
  • A 15-approximation algorithm
  •  Using the 6-approximation algorithm for 2-dissimilarities from [ABFPT99]
  • Result forMaximal Distortion:
  • Hardness-of-approximation within any constant factor
np hardness of the decision problem

literals

clause

Satisfying assignment:

NP Hardness of the Decision Problem

We use a reduction from 3SAT

(the problem of determining whether a 3CNF formula is satisfiable)

We show:

If one can determine for (τ,Δ) whether there exists a tree T s.t. ||τ,τT ||∞ ≤ Δ,then one can determine for every 3CNF formula φ whether it is satisfiable.

the reduction
The Reduction

Given a 3CNF formula φ we define triplet distances  and an error bound Δ which enforce the output tree to imply a satisfying assignment to φ.

  • The set of taxa:
  • Taxa T , F.
  • A taxon for every literal ( ).
  • 3 taxa for every clause Cj ( y j1, y j2, y j3).
properties enforced by the input

v

w

Properties Enforced by the Input (,Δ)
  • One the following can be enforced on each taxa triplet (u,v,w):
  • taxon u is closeto Path(v,w), or
  • taxon u is farto Path(v,w)

u

enforcing truth assignmaent

T

F

Enforcing Truth Assignmaent
  • A truth assignment to φis implied by the following:
  • TisfarfromF
  • For each i, isfar from , and both of and areclose toPath(T ,F)

Thus we set xi =T iff xi is close to T.

enforcing clauses satisfaction

l1

F

l2

l3

Enforcing Clauses-Satisfaction

A clause C=( l1 l2 l3 )is satisfied iff

At least one literal liis true, i.e. is close toT.

(l1 l2 l3 ) is satisfiediff it is not like this

We need to guarantee that all clauses avoid the above by the close/far relations.

clauses satisfaction cont

But we don’t know which two paths

Clauses-Satisfaction (cont)

-(l1 l2 l3 )is satisfied iff out of the three paths:

Path(l1 , l2),Path(l1 , l3),Path(l2 , l3),

at least two paths areclose toT .

l3

T

F

l1

l2

clauses satisfaction cont1

y1

y2

y3

l3

T

F

l1

l2

Clauses-Satisfaction (cont)

We attach a taxon to each such path:

y1is close toPath( l2,l3)

y2is close toPath( l1,l3)

y3is close toPath( l1,l2)

(l1 l2 l3 )is satisfied iff at least twoyi’s can be locatedclose toT.…

clauses satisfaction end

y1

y2

y3

l3

T

F

l1

l2

Clauses-Satisfaction (end)

… and, at least two of theyi’scan be located close toT

Path( y2,y3), Path( y1,y3), Path( y1,y2), are close to T

So, (l1 l2 l3 )is satisfied iff all the above paths are close toT

construction example

y22

y13

y12

y21

y11

y23

α

α

T

F

α

α

vT

vF

α

α

Construction Example

φ is satisfiable  there is a tree T which satisfies all bounds

A1τT (T , F ) ≥ 2α+2β

A2i=1..n :τT (T ; ) ≤α ; τT (F ; ) ≤α

B1j=1..m :τT (y j1; l j2 l j3 ) ≤α ; τT (y j2; l j1 l j3 ) ≤α ; τT (y j3; l j1 l j2 ) ≤α

B2j=1..m :τT (y j1; T F ) ≥α ; τT (y j2; T F ) ≥α ; τT (y j3; T F ) ≥α

B3j=1..m :τT (T ; y j2 y j3 ) ≤α ; τT (T ; y j1 y j3 ) ≤α ; τT (T ; y j1 y j2 ) ≤α

hardness of approximation results
Hardness of Approximation Results

By “stretching” the close/far restrictions, the following problems are also shown NP hard:

  • Approximating Maximal Difference
  • Finding a tree T s.t. ||τ,τT ||∞ ≤ 1.4||τ,τOPT||∞
  • ApproximatingMaximal Distortion:
  • Finding a tree T s.t.
  • MaxDist(τ,τT )≤ CMaxDist(τ,τOPT) for any constantC

Details in:

I. Gronau and S. moran, On The Hardness of Inferring Phylogenies from Triplet-Dissimilarities, Theoretical Computer Science 389(1-2), December 2007, pp. 44-55.

open problems further research
Open Problems/Further Research
  • Extending hardness results for 3-diss tables induced by 2-diss matrices
  • (τ(i ; jk)= ½[D(i,j)+D(i,k)-D(j,k)] )
  • Extending hardness results for “naturally looking” trees
  • (binary trees with constant-bounded edge weights)
  • Check Performance of NJ when neighbor selection formula computed from “real” 3-distances.
  • Devise algorithms which use 3-distances as input.
  • Does optimization of 3-diss lead to good topological accuracy (under accepted models of sequence evolution)
  • (it is known that optimization of 2-diss doesn’t lead to good topological accuracy)
distance based phylogenetic reconstruction

1

5

2

4

6

10

1

2

7

  • Compute distances between all taxon-pairs
  • Find a tree(edge-weighted) best-describing the distances

Distance-Based Phylogenetic Reconstruction

the reduction1

y22

y13

y12

y21

y11

y23

α

α

α

α

T

F

vT

vF

α

α

The Reduction – τ(φ)

A1τT (T , F ) ≥ 2α+2β

A2i=1..n :τT (T ; ) ≤α ; τT (F ; ) ≤α

B1j=1..m :τT (y j1; l j2 l j3 ) ≤α ; τT (y j2; l j1 l j3 ) ≤α ; τT (y j3; l j1 l j2 ) ≤α

B2j=1..m :τT (y j1; T F ) ≥α ; τT (y j2; T F ) ≥α ; τT (y j3; T F ) ≥α

B3j=1..m :τT (T ; y j2 y j3 ) ≤α ; τT (T ; y j1 y j3 ) ≤α ; τT (T ; y j1 y j2 ) ≤α

  • In our constructed tree:
  • All 2-distances are in[2α , 2α+2β].
  • All 3-distances are in[α , α+2β].
  •  Δ=β.

A1τ(T , F ) = 2α+3β

A2i=1..n :τ(T ; ) = α-β ; τ(F ; ) = α-β

B1j=1..m :τ(y j1; l j2 l j3 ) = α-β ; τ(y j2; l j1 l j3 ) = α-β ; τ(y j3; l j1 l j2 ) = α-β

B2j=1..m :τ(y j1; T F ) = α+β ; τ(y j2; T F ) = α+β ; τ(y j3; T F ) = α+β

B3j=1..m :τ(T ; y j2 y j3 ) = α-β ; τ(T ; y j1 y j3 ) = α-β ; τ(T ; y j1 y j2 ) = α-β

Other2-distances: τ(s , t) = 2α+2β

Other3-distances: τ(s ; t u) = α+2β

ad