on the hardness of inferring phylogenies from triplet dissimilarities n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities PowerPoint Presentation
Download Presentation
On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities

Loading in 2 Seconds...

play fullscreen
1 / 30

On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities. Ilan Gronau Shlomo Moran Technion – Israel Institute of Technology Haifa, Israel. B E G H L M. B E G H L M. D. T. B E G H L M. 4. 2. 1. 5. 7. 3. reconstruct. calculate. B E G H L M. 4. 3.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities' - gianna


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
on the hardness of inferring phylogenies from triplet dissimilarities
On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities

Ilan Gronau Shlomo Moran

Technion – Israel Institute of Technology

Haifa, Israel

pairwise distance based reconstruction

B E G H L M

B E G H L M

D

T

B E G H L M

4

2

1

5

7

3

reconstruct

calculate

B E G H L M

4

3

1

2

B

E

M

L

G

H

Pairwise-Distance Based Reconstruction

DT

M

E

L

G

H

B

optimization criteria

B E G H L M

B E G H L M

B E G H L M

B E G H L M

Optimization Criteria

We wish the tree-metric DT to approximate simultaneously the pairwise distances in D.

= D

should be “close” to

DT =

Two “closeness” measures studied here:

Maximal Difference(l∞)

  • Maximal Distortion
maximal difference l vs maximal distortion

B E G H L M

B E G H L M

Maximal Difference (l∞)vs. Maximal Distortion

B E G H L M

D =

DT =

B E G H L M

Goal: Find optimal T,

which minimizes the maximal difference/distortion between D and DT

previous works on approximating dissimilarities by tree distances
Previous works on Approximating Dissimilarities by Tree Distances
  • Negative results: (NP-hardness)
  • Closest tree-metric (even ultrametric ) to dissimilarity matrix under l1 l2 [Day ‘87]
  • Closest tree-metric to dissimilarity matrix under l∞ [ABFPT99]
    • Hard to approximate better than 1.125
    • Implicit:Hard to approximate closest MaxDist tree within any constant factor
  • Positive results:
  • Closest ultrametric to dissimilarity matrix under l∞ [Krivanek ‘88]
  • 3-approximation of closest additive metric to a given metric[ABFPT99]
    • (implicit 6-approximation for general dissimilarity matrices)
this work triplet distances distances to triplets midpoints
This Work: Triplet-Distances – Distances to Triplets Midpoints

C(i,j,k)

τT (i ; jk)

  • τT (i ; jk) = τT (i ; kj)
  • τT (i ; ij) = 0
  • τT (i ; jj) = DT (i, j)

i

k

j

triplet distances defined by 2 distances

…is realizable by a 3-tree

j

i

5

3

4

C(i,j,k)

k

Triplet-Distances Defined by 2-Distances
  • Each distance Matrix D defines 3-trees
  • τ(i ; jk)= ½[D(i,j)+D(i,k)-D(j,k)].

i

Any metric on 3 taxa…

8

9

j

7

k

triplet distance based reconstruction

BB BE BG….. LL LM MM

B E G H L M

T

T

4

2

1

5

7

3

4

3

1

2

B

E

M

L

G

H

Triplet-Distance Based Reconstruction

τ(i ; jk)= ½[D(i,j)+D(i,k)-D(j,k)].

BB BE BG….. LL LM MM

B E G H L M

reconstruct

why use triplet distances
Why use Triplet-Distances?

1. They enable more accurate estimations of 2-distances.

2. They are used (de facto) by known reconstruction algorithms

improved estimations of pairwise distances

B E G H L M

B E G H L M

E

(Maximum Likelihood)

13

(In calculating D(H,E),

all other taxa are ignored

H

Improved Estimations of Pairwise Distances:

“Information Loss”

D=

Calculate D(H,E)

improved estimations cont

B=(..AAGT..)

L=(..AATA..)

G=(..CCGT..)

(..****..)

(..****..)

M=(..CGCG..)

2

3

4

2

(..****..)

(..****..)

H= (..AACG..)

H= (..AACG..)

E=(..CAGA..)

E=(..CAGA..)

1

5

3

3

H= (..AACG..)

H= (..AACG..)

E=(..CAGA..)

E=(..CAGA..)

Improved Estimations (cont):
  • Estimate D(H,E) by calculating all the 3-trees on {H,E,X:XH,E}
  • (Or: calculate just one 3-tree, for a “trusted” 3rd taxon X :
  • V. Ranwez, O. Gascuel, Improvement of distance-based phylogenetic methods by a local maximum likelihood approach using triplets, Mol.Biol. Evol. 19(11) 1952–1963. (2002)
implicit use of triplet distances in 2 distance reconstruction algorithms

T

BB BE BG….. LL LM MM

4

B E G H L M

2

1

5

7

3

B E G H L M

4

3

1

2

B

E

M

L

G

H

D

(Implicit) use of Triplet-Distances in 2-Distance Reconstruction Algorithms

τ(i ; jk)= ½[D(i,j)+D(i,k)-D(j,k)].

1 st use triplet distances from a single source

i

r

j

1st use :“Triplet Distances from a Single Source”:
  • Fix a taxon r, and construct a tree T which minimizes:
  • Optimal solution is doable in O(n2) time, and is used eg in :
  • (FKW95): Optimal approximation of distances by ultrametric trees.
  • (ABFPT99): The best known approximation of distances by general trees
  • (BB99): Fast construction of Buneman trees.
2 nd use saitou nei neighbour joining
2nd use:Saitou&Nei Neighbour Joining

The neighbors-selection criterion of NJ selects a taxon-pair i,j which maximizes the sum :

r

r

i

r

r

r

r

j

r

r

previous works on triplet dissimilarities distances
Previous Works on Triplet-Dissimilarities/Distances
  • I. Gronau, S. MoranNeighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances, Journal of Computational Biology 14(1) pp. 1-15 (2007).
  • Works which use the total weights of 3 trees:
  • S. Joly, GL Calve, Three Way Distances, Journal of Classification 12 pp. 191-205 (1995)
  • L. Pachter, D. Speyer Reconstructing Trees from Subtrees Weights , Applied Mathematics Letters 17 pp. 615-621 (2004)
  • D. Levy, R. Yoshida, L. Pachter, Beyond pairwise distances: Neighbor-joining with phylogenetic diversity estimates, Mol. Biol. Evol. 23(3) 491–498 (2006) .
summary of results
Summary of Results
  • Results for Maximal Difference (l∞):
  • Decision problem is NP-Hard
  •  IS there a tree T s.t. ||τ,τT ||∞ ≤ Δ ?
  • Hardness-of-approximation of optimization problem
  •  Finding a tree T s.t. ||τ,τT ||∞ ≤ 1.4||τ,τOPT||∞
  • A 15-approximation algorithm
  •  Using the 6-approximation algorithm for 2-dissimilarities from [ABFPT99]
  • Result forMaximal Distortion:
  • Hardness-of-approximation within any constant factor
np hardness of the decision problem

literals

clause

Satisfying assignment:

NP Hardness of the Decision Problem

We use a reduction from 3SAT

(the problem of determining whether a 3CNF formula is satisfiable)

We show:

If one can determine for (τ,Δ) whether there exists a tree T s.t. ||τ,τT ||∞ ≤ Δ,then one can determine for every 3CNF formula φ whether it is satisfiable.

the reduction
The Reduction

Given a 3CNF formula φ we define triplet distances  and an error bound Δ which enforce the output tree to imply a satisfying assignment to φ.

  • The set of taxa:
  • Taxa T , F.
  • A taxon for every literal ( ).
  • 3 taxa for every clause Cj ( y j1, y j2, y j3).
properties enforced by the input

v

w

Properties Enforced by the Input (,Δ)
  • One the following can be enforced on each taxa triplet (u,v,w):
  • taxon u is closeto Path(v,w), or
  • taxon u is farto Path(v,w)

u

enforcing truth assignmaent

T

F

Enforcing Truth Assignmaent
  • A truth assignment to φis implied by the following:
  • TisfarfromF
  • For each i, isfar from , and both of and areclose toPath(T ,F)

Thus we set xi =T iff xi is close to T.

enforcing clauses satisfaction

l1

F

l2

l3

Enforcing Clauses-Satisfaction

A clause C=( l1 l2 l3 )is satisfied iff

At least one literal liis true, i.e. is close toT.

(l1 l2 l3 ) is satisfiediff it is not like this

We need to guarantee that all clauses avoid the above by the close/far relations.

clauses satisfaction cont

But we don’t know which two paths

Clauses-Satisfaction (cont)

-(l1 l2 l3 )is satisfied iff out of the three paths:

Path(l1 , l2),Path(l1 , l3),Path(l2 , l3),

at least two paths areclose toT .

l3

T

F

l1

l2

clauses satisfaction cont1

y1

y2

y3

l3

T

F

l1

l2

Clauses-Satisfaction (cont)

We attach a taxon to each such path:

y1is close toPath( l2,l3)

y2is close toPath( l1,l3)

y3is close toPath( l1,l2)

(l1 l2 l3 )is satisfied iff at least twoyi’s can be locatedclose toT.…

clauses satisfaction end

y1

y2

y3

l3

T

F

l1

l2

Clauses-Satisfaction (end)

… and, at least two of theyi’scan be located close toT

Path( y2,y3), Path( y1,y3), Path( y1,y2), are close to T

So, (l1 l2 l3 )is satisfied iff all the above paths are close toT

construction example

y22

y13

y12

y21

y11

y23

α

α

T

F

α

α

vT

vF

α

α

Construction Example

φ is satisfiable  there is a tree T which satisfies all bounds

A1τT (T , F ) ≥ 2α+2β

A2i=1..n :τT (T ; ) ≤α ; τT (F ; ) ≤α

B1j=1..m :τT (y j1; l j2 l j3 ) ≤α ; τT (y j2; l j1 l j3 ) ≤α ; τT (y j3; l j1 l j2 ) ≤α

B2j=1..m :τT (y j1; T F ) ≥α ; τT (y j2; T F ) ≥α ; τT (y j3; T F ) ≥α

B3j=1..m :τT (T ; y j2 y j3 ) ≤α ; τT (T ; y j1 y j3 ) ≤α ; τT (T ; y j1 y j2 ) ≤α

hardness of approximation results
Hardness of Approximation Results

By “stretching” the close/far restrictions, the following problems are also shown NP hard:

  • Approximating Maximal Difference
  • Finding a tree T s.t. ||τ,τT ||∞ ≤ 1.4||τ,τOPT||∞
  • ApproximatingMaximal Distortion:
  • Finding a tree T s.t.
  • MaxDist(τ,τT )≤ CMaxDist(τ,τOPT) for any constantC

Details in:

I. Gronau and S. moran, On The Hardness of Inferring Phylogenies from Triplet-Dissimilarities, Theoretical Computer Science 389(1-2), December 2007, pp. 44-55.

open problems further research
Open Problems/Further Research
  • Extending hardness results for 3-diss tables induced by 2-diss matrices
  • (τ(i ; jk)= ½[D(i,j)+D(i,k)-D(j,k)] )
  • Extending hardness results for “naturally looking” trees
  • (binary trees with constant-bounded edge weights)
  • Check Performance of NJ when neighbor selection formula computed from “real” 3-distances.
  • Devise algorithms which use 3-distances as input.
  • Does optimization of 3-diss lead to good topological accuracy (under accepted models of sequence evolution)
  • (it is known that optimization of 2-diss doesn’t lead to good topological accuracy)
distance based phylogenetic reconstruction

1

5

2

4

6

10

1

2

7

  • Compute distances between all taxon-pairs
  • Find a tree(edge-weighted) best-describing the distances

Distance-Based Phylogenetic Reconstruction

the reduction1

y22

y13

y12

y21

y11

y23

α

α

α

α

T

F

vT

vF

α

α

The Reduction – τ(φ)

A1τT (T , F ) ≥ 2α+2β

A2i=1..n :τT (T ; ) ≤α ; τT (F ; ) ≤α

B1j=1..m :τT (y j1; l j2 l j3 ) ≤α ; τT (y j2; l j1 l j3 ) ≤α ; τT (y j3; l j1 l j2 ) ≤α

B2j=1..m :τT (y j1; T F ) ≥α ; τT (y j2; T F ) ≥α ; τT (y j3; T F ) ≥α

B3j=1..m :τT (T ; y j2 y j3 ) ≤α ; τT (T ; y j1 y j3 ) ≤α ; τT (T ; y j1 y j2 ) ≤α

  • In our constructed tree:
  • All 2-distances are in[2α , 2α+2β].
  • All 3-distances are in[α , α+2β].
  •  Δ=β.

A1τ(T , F ) = 2α+3β

A2i=1..n :τ(T ; ) = α-β ; τ(F ; ) = α-β

B1j=1..m :τ(y j1; l j2 l j3 ) = α-β ; τ(y j2; l j1 l j3 ) = α-β ; τ(y j3; l j1 l j2 ) = α-β

B2j=1..m :τ(y j1; T F ) = α+β ; τ(y j2; T F ) = α+β ; τ(y j3; T F ) = α+β

B3j=1..m :τ(T ; y j2 y j3 ) = α-β ; τ(T ; y j1 y j3 ) = α-β ; τ(T ; y j1 y j2 ) = α-β

Other2-distances: τ(s , t) = 2α+2β

Other3-distances: τ(s ; t u) = α+2β