fast algorithms for minimum evolution n.
Download
Skip this Video
Download Presentation
Fast Algorithms for Minimum Evolution

Loading in 2 Seconds...

play fullscreen
1 / 39

Fast Algorithms for Minimum Evolution - PowerPoint PPT Presentation


  • 166 Views
  • Uploaded on

Fast Algorithms for Minimum Evolution. Richard Desper, NCBI Olivier Gascuel, LIRMM. Overview. Statement of phylogeny reconstruction problem and various approaches to solving it. Tree length formula as a function of average distances. Greedy algorithms for tree building and tree swapping.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Fast Algorithms for Minimum Evolution' - maribeth


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
fast algorithms for minimum evolution

Fast Algorithms for Minimum Evolution

Richard Desper, NCBI

Olivier Gascuel, LIRMM

overview
Overview
  • Statement of phylogeny reconstruction problem and various approaches to solving it.
  • Tree length formula as a function of average distances.
  • Greedy algorithms for tree building and tree swapping.
  • Simulation results.
  • A few extras regarding consistency and branch lengths.
phylogeny reconstruction
Phylogeny Reconstruction
  • General problem: reconstruct the evolutionary history for a set L of extant species.
  • Input: multiple sequence alignment for L or matrix of estimates of pairwise evolutionary distances.
  • Output: weighted phylogeny representing history of L and common ancestors.
methods
Methods
  • Likelihood methods: model-based likelihood maximization.
  • Parsimony methods: minimize total number of mutations in tree.
  • Distance methods: fit tree structure to inferred evolutionary distances. Leading methods include Felsenstein-Fitch-Margoliash weighted least-squares and Neighbor-Joining and its variants.
felsenstein fitch margoliash least squares method
Felsenstein-Fitch-Margoliash Least-squares Method
  • FITCH searches the space of topologies by iteratively adding leaves and by tree swapping.
  • Edge weights and topology are chosen to minimize the sum of squares (D is the input metric, DT is the induced tree metric):

If sij= 1 for alliandj, this is called the ordinary least-squaresmethod.

minimum evolution
Minimum Evolution
  • Developed by Rzhetsky and Nei (1992) as a modification of the OLS method
  • For each topology T,
    • Define function lassigning OLS lengths to edges of T
    • Define size of tree
    • Choose Tminimizing l(T )
recursive definition of d a b

For

Recursive Definition of DA|B

If A = {a}, B = {b}, DA|B = Dab,

All average distances for all pairs of non-intersecting subtrees of a

given topology can be calculated inO(n2)time.

external ols edge length function

A

e

i

B

External OLS Edge Length Function

If e is the edge connecting the leaf i to the

subtrees A and B,

internal ols edge length function

A

C

e

D

B

Internal OLS Edge Length Function

The length of the edge eis (Vach, 1988)

where

tree length formula

B

A

e

D

C

Tree length formula

Lemma: with T as to the right,

let denote the root of subtree X,

and the edge to X for

Then,

tree length formula1
With T as in prior slide, Tree Length Formula

Using lemma and branch length formula for l(e),

general approach
General approach
  • To search the space of topologies, we’ll keep in memory two data structures:
    • Sizes of each subtree of given topology
    • Matrix of average distances DX|Y for X,Y disjoint subtrees in given topology
  • As we move from one topology to another, we’ll update the matrix, but only as much as needed, in an efficient manner.
tree swapping by nni

B

A

C

A

e

e

D

B

D

C

Tree Swapping by NNI

NNI swapping is a basic step in topology building and searching

tree length formula2
With T as in prior slide, Tree Length Formula

Using lemma and branch length formula for l(e),

tree length after nni
Tree Length after NNI

Given TgT’ the tree swap in prior slide, lthe edge length function:

(1)

wherel and l’ are constants depending on

the topologies.

ols fastnni
OLS: FASTNNI
  • Pre-compute average distances between non-intersecting sub-trees. (O(n2)computations)
  • Loop over all internal edges, select the best swap using Equation (1). (O(n))
  • If no swap improves length of the tree, stop and return the tree, else perform the best swap and update the matrix of average distances and repeat Step 2. (O(n)per swap; there is only one new split.)

Thus, if we require p swaps, the total complexity of

FASTNNI is O(n2 + pn).

balanced minimum evolution
Balanced Minimum Evolution
  • Gascuel (2000) observed that the OLS/ME method was weaker than NJ in approximating the correct topology.
  • Pauplin (2000) to simplify tree length computation proposed to use a “balanced” version of Minimum Evolution, weighting each sub-tree equally when calculating averages: if A and B are sub-trees of T, with
slide18
BNNI
  • Calculate balanced averages of all pairs of sub-trees. (O(n2))
  • Calculate improvement for each swap using

(2)

  • If no tree swap improves length of the tree, stop and return tree, else update matrix of average distances and repeat Step 2. (O(ndiam(T)) per swap)

The average complexity, when performing p swaps, is

O(n2 + pn diam(T)).

updating subtree averages

y

If we perform the B-C tree swap, then we must recalculate

Typical values for diam(T):

Yule-Harding distribution:

Uniform distribution:

Updating Subtree Averages

T

x

X

A

C

e

Y

B

D

Q: How many recalculations?

(Hint: you can count (x,y) pairs).

A: O(n diam(T))

building trees from scratch

Building trees from scratch

We have NNI algorithms for OLS and balanced branch lengths. But what if we have no initial topology for NNIs?

ols greedy minimum evolution
OLS: Greedy Minimum Evolution
  • Start with three-taxon tree T3
  • For k=4 to n,
      • Calculate Dk|A for each subtree A in Tk-1
      • Express cost of inserting k along edgeeas f(e).

(Use Equation (3) on the next slide.)

      • Choose e minimizing f. Insert k along eto form Tk.
      • Update matrix of average distances between every pair of 2-distant subtrees.

GME runs in O(n2)running time

greedy minimum evolution

T’

C

C

T

k

k

A

A

B

B

Greedy Minimum Evolution

We use a variant of Equation (1), where D = {k}.Let L = l(T).

Then

balanced minimum evolution1
Balanced Minimum Evolution

Same as GME,except:

  • (modifications)
    • Calculate balanced average distances instead of ordinary average distances
    • Use l = ½ to find weights for insertion points
    • Must keep average distances for all pairs of sub-trees.

BME runs in O(n2 diam(T)) running time.

simulations
Simulations
  • Created 24- and 96-taxon trees, 2000 per each size, Yule-Harding process (g molecular clock).
  • Edge lengths multiplied by (1.0 + mX), where X is exponentially distributed.
  • Generated trees with three rates of evolution
  • SeqGen used to generate sequences for each tree and rate (12,000 in all)
  • DNADIST used to calculate distance matrices
results topological distances
Results: topological distances

BNNI improved

all input trees

results topological distances1
Results: topological distances

This improvement

is large with fast rates

and high numbers of taxa

results topological distances2
Results: topological distances

NNI trees are close to the

best possible for BME

results topological distances3
Results: topological distances

The quality of the

NNI tree is (mostly)

independent

of starting point

computational times
Computational Times

in (MM:SS)

Computations done on Sun Enterprise E4500/E5500 running Solaris 8

on 10 400-Mhz processors with 7 Gb memory.

average number of nnis
Average number of NNIs

We see that the average number of NNIs is

considerably lower than the number of taxa.

bme wls
BME = WLS

Why does the balanced approach work so well?

  • Pauplin’s formula for the length of a tree is
  • BME is a weighted least squares approach with

Where pT(i,j) is the length of the (i,j) path in T.

Distantly related taxa see their importance

decrease exponentially.

bonus features
Bonus features
  • BME is a consistent method. As observed distances converge to true distances, the true topology becomes the minimum evolution tree.
  • The BNNI tree has no negative branch lengths. A negative value to the branch length function implies a NNI leading to a smaller tree.
consistency of balanced me
Consistency of Balanced ME
  • Theorem: SupposeS is a weighted tree, andTis a treetopologyincompatible withS. Let T be the tree of topology T with weights determined by the balanced scheme. Then

l(T) > l(S).

  • Lemma: it suffices to prove the case when S is a split metric.
balanced me consistency
Balanced ME consistency
  • Basic idea: let l be the tree length function on the space of topologies. We find a sequence of topologies, T=T0, T1, ... Tk=S such that
    • Each Ti+1 can be reached from Ti via one of two simple topological transformations
    • l(Ti) > l(Ti+1) for all i.

Proof structure modeled after OLS/ME proof (Rzhetsky and Nei, 1993).

type i transformation

D

D

A

A

C

B

B

C

Type I transformation

Color the leaves black or white according to the split metric S. A

Type I transformation uses a NNI to form a larger monochromatic

cluster

This transformation reduces the size of the tree under l

type ii transformation

A1

A1

B1

A2

C

C

A2

B1

B2

B2

Type II transformation

A Type II transformation uses two NNIs to form two monochromatic

subtrees

This transformation also reduces the value of the size of the tree

under l

positive branch lengths after bnni
Positive Branch Lengths after BNNI

Recall that the length of an edge is described by

B

D

A

e

C

We do not perform the switch because

i.e.

Thus

Similarly,

conclusions
Conclusions
  • BME + BNNI runs in O((n2 + pn)diam(T)), outputs trees comparable to (better than) FITCH, Weighbor, BioNJ, or NJ.
  • FastME is faster than NJ or its variants.
  • BNNI consistently improved output trees in all settings, even when WLS/Fitch trees were input.
  • BNNI outputs tree without negative branch lengths.
  • FASTME software available at http://www.ncbi.nlm.nih.gov/CBBResearch/Desper/FastME.html or http://www.lirmm.fr/~w3ifa/MAAS/.
ad