Algorithmic research in phylogeny reconstruction
Download
1 / 18

algorithmic research in phylogeny reconstruction - PowerPoint PPT Presentation


  • 184 Views
  • Uploaded on

Algorithmic research in phylogeny reconstruction. Tandy Warnow The University of Texas at Austin. Phylogeny. From the Tree of the Life Website, University of Arizona. Orangutan. Human. Gorilla. Chimpanzee. Reconstructing the “Tree” of Life. Handling large datasets: millions of species

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'algorithmic research in phylogeny reconstruction' - daniel_millan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Algorithmic research in phylogeny reconstruction l.jpg

Algorithmic research in phylogeny reconstruction

Tandy Warnow

The University of Texas at Austin


Phylogeny l.jpg
Phylogeny

From the Tree of the Life Website,University of Arizona

Orangutan

Human

Gorilla

Chimpanzee


Reconstructing the tree of life l.jpg
Reconstructing the “Tree” of Life

Handling large datasets: millions of species

NSF funds many projects

towards this goal, under

the Assembling the Tree of

Life (ATOL) program


Current projects l.jpg
Current projects

  • Heuristics for NP-hard optimization problems for phylogeny reconstruction

  • “Phylogenetic” multiple sequence alignment

  • Detecting and reconstruction horizontal gene transfer and hybridization

  • Constructing phylogenies on languages

    Graph-theory, combinatorial optimization, probabilistic analysis, are fundamental to algorithm development in this area. But all methods are extensively tested in simulation and on real data as well. Collaborations with biologists or linguists are essential.


Dna sequence evolution l.jpg

-3 mil yrs

AAGACTT

AAGACTT

-2 mil yrs

AAGGCCT

AAGGCCT

AAGGCCT

AAGGCCT

TGGACTT

TGGACTT

TGGACTT

TGGACTT

-1 mil yrs

AGGGCAT

AGGGCAT

AGGGCAT

TAGCCCT

TAGCCCT

TAGCCCT

AGCACTT

AGCACTT

AGCACTT

today

AGGGCAT

TAGCCCA

TAGACTT

AGCACAA

AGCGCTT

AGGGCAT

TAGCCCA

TAGACTT

AGCACAA

AGCGCTT

DNA Sequence Evolution


Phylogeny problem l.jpg
Phylogeny Problem

U

V

W

X

Y

AGGGCAT

TAGCCCA

TAGACTT

TGCACAA

TGCGCTT

X

U

Y

V

W


Solving np hard problems exactly is unlikely l.jpg
Solving NP-hard problems exactly is … unlikely

  • Number of (unrooted) binary trees on n leaves is (2n-5)!!

  • If each tree on 1000 taxa could be analyzed in 0.001 seconds, we would find the best tree in

    2890 millennia


Approaches for solving hard optimization problems like maximum parsimony l.jpg

Local optimum

Cost

Global optimum

Phylogenetic trees

Approaches for “solving” hard optimization problems (like maximum parsimony)

  • Hill-climbing heuristics (which can get stuck in local optima)

  • Randomized algorithms for getting out of local optima

  • Approximation algorithms (give bounds on what is possible)


Problems with current techniques for mp l.jpg
Problems with current techniques for MP

Shown here is the performance of a heuristic maximum parsimony analysis on a real dataset of almost 14,000 sequences. (“Optimal” here means best score to date, using any method for any amount of time.) Acceptable error is below 0.01%.

Performance of TNT with time


Performance of nj a popular polynomial time method nakhleh et al ismb 2001 l.jpg
Performance of NJ, a popular polynomial time method [Nakhleh et al. ISMB 2001]

Simulation study based upon fixed edge lengths, K2P model of evolution, sequence lengths fixed to 1000 nucleotides.

Error rates reflect proportion of incorrect edges in inferred trees.

0.8

NJ

0.6

Error Rate

0.4

0.2

0

0

400

800

1200

1600

No. Taxa


Dcms disk covering methods l.jpg
DCMs (Disk-Covering Methods)

  • DCMs for polynomial time methods improve topological accuracy (empirical observation), and have provable theoretical guarantees under Markov models of evolution

  • DCMs for hard optimization problems reduce running time needed to achieve good levels of accuracy (empirically observation)



Boosting phylogeny reconstruction methods l.jpg
“Boosting” phylogeny reconstruction methods reconstruction

  • DCMs “boost” the performance of phylogeny reconstruction methods.

DCM

Base method M

DCM-M


Iterative dcm3 l.jpg
Iterative-DCM3 reconstruction

T

DCM3

Base method

T’


Rec i dcm3 significantly improves performance l.jpg
Rec-I-DCM3 significantly improves performance reconstruction

Current best techniques

DCM boosted version of best techniques

Comparison of TNT to Rec-I-DCM3(TNT) on one large dataset


Dcm1 boosting distance based methods nakhleh et al ismb 2001 l.jpg
DCM1-boosting distance-based methods reconstruction[Nakhleh et al. ISMB 2001]

  • DCM1-boosting makes distance-based methods more accurate

  • Theoretical guarantees that DCM1-NJ converges to the true tree from polynomial length sequences

0.8

NJ

DCM1-NJ

0.6

Error Rate

0.4

0.2

0

0

400

800

1200

1600

No. Taxa


General comments l.jpg
General comments reconstruction

  • Everything in phylogeny (just about) is NP-hard

  • Graph-theory, probability, and optimization are the basic tools for algorithmic advances

  • Algorithms are tested on both real and simulated data.

  • Collaborations with domain experts (biologists or linguists) essential to success. (At UT, we have wonderful biologists to work with, and all my students collaborate with them.)


For more information l.jpg
For more information reconstruction

  • Send me email to make an appointment

  • Check my webpage for tutorials on the subject

    See http://www.phylo.org and http://www.cs.utexas.edu/~tandy for more info


ad