algorithmic research in phylogeny reconstruction
Download
Skip this Video
Download Presentation
Algorithmic research in phylogeny reconstruction

Loading in 2 Seconds...

play fullscreen
1 / 18

algorithmic research in phylogeny reconstruction - PowerPoint PPT Presentation


  • 185 Views
  • Uploaded on

Algorithmic research in phylogeny reconstruction. Tandy Warnow The University of Texas at Austin. Phylogeny. From the Tree of the Life Website, University of Arizona. Orangutan. Human. Gorilla. Chimpanzee. Reconstructing the “Tree” of Life. Handling large datasets: millions of species

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'algorithmic research in phylogeny reconstruction' - daniel_millan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
algorithmic research in phylogeny reconstruction

Algorithmic research in phylogeny reconstruction

Tandy Warnow

The University of Texas at Austin

phylogeny
Phylogeny

From the Tree of the Life Website,University of Arizona

Orangutan

Human

Gorilla

Chimpanzee

reconstructing the tree of life
Reconstructing the “Tree” of Life

Handling large datasets: millions of species

NSF funds many projects

towards this goal, under

the Assembling the Tree of

Life (ATOL) program

current projects
Current projects
  • Heuristics for NP-hard optimization problems for phylogeny reconstruction
  • “Phylogenetic” multiple sequence alignment
  • Detecting and reconstruction horizontal gene transfer and hybridization
  • Constructing phylogenies on languages

Graph-theory, combinatorial optimization, probabilistic analysis, are fundamental to algorithm development in this area. But all methods are extensively tested in simulation and on real data as well. Collaborations with biologists or linguists are essential.

dna sequence evolution

-3 mil yrs

AAGACTT

AAGACTT

-2 mil yrs

AAGGCCT

AAGGCCT

AAGGCCT

AAGGCCT

TGGACTT

TGGACTT

TGGACTT

TGGACTT

-1 mil yrs

AGGGCAT

AGGGCAT

AGGGCAT

TAGCCCT

TAGCCCT

TAGCCCT

AGCACTT

AGCACTT

AGCACTT

today

AGGGCAT

TAGCCCA

TAGACTT

AGCACAA

AGCGCTT

AGGGCAT

TAGCCCA

TAGACTT

AGCACAA

AGCGCTT

DNA Sequence Evolution
phylogeny problem
Phylogeny Problem

U

V

W

X

Y

AGGGCAT

TAGCCCA

TAGACTT

TGCACAA

TGCGCTT

X

U

Y

V

W

solving np hard problems exactly is unlikely
Solving NP-hard problems exactly is … unlikely
  • Number of (unrooted) binary trees on n leaves is (2n-5)!!
  • If each tree on 1000 taxa could be analyzed in 0.001 seconds, we would find the best tree in

2890 millennia

approaches for solving hard optimization problems like maximum parsimony

Local optimum

Cost

Global optimum

Phylogenetic trees

Approaches for “solving” hard optimization problems (like maximum parsimony)
  • Hill-climbing heuristics (which can get stuck in local optima)
  • Randomized algorithms for getting out of local optima
  • Approximation algorithms (give bounds on what is possible)
problems with current techniques for mp
Problems with current techniques for MP

Shown here is the performance of a heuristic maximum parsimony analysis on a real dataset of almost 14,000 sequences. (“Optimal” here means best score to date, using any method for any amount of time.) Acceptable error is below 0.01%.

Performance of TNT with time

performance of nj a popular polynomial time method nakhleh et al ismb 2001
Performance of NJ, a popular polynomial time method [Nakhleh et al. ISMB 2001]

Simulation study based upon fixed edge lengths, K2P model of evolution, sequence lengths fixed to 1000 nucleotides.

Error rates reflect proportion of incorrect edges in inferred trees.

0.8

NJ

0.6

Error Rate

0.4

0.2

0

0

400

800

1200

1600

No. Taxa

dcms disk covering methods
DCMs (Disk-Covering Methods)
  • DCMs for polynomial time methods improve topological accuracy (empirical observation), and have provable theoretical guarantees under Markov models of evolution
  • DCMs for hard optimization problems reduce running time needed to achieve good levels of accuracy (empirically observation)
boosting phylogeny reconstruction methods
“Boosting” phylogeny reconstruction methods
  • DCMs “boost” the performance of phylogeny reconstruction methods.

DCM

Base method M

DCM-M

iterative dcm3
Iterative-DCM3

T

DCM3

Base method

T’

rec i dcm3 significantly improves performance
Rec-I-DCM3 significantly improves performance

Current best techniques

DCM boosted version of best techniques

Comparison of TNT to Rec-I-DCM3(TNT) on one large dataset

dcm1 boosting distance based methods nakhleh et al ismb 2001
DCM1-boosting distance-based methods[Nakhleh et al. ISMB 2001]
  • DCM1-boosting makes distance-based methods more accurate
  • Theoretical guarantees that DCM1-NJ converges to the true tree from polynomial length sequences

0.8

NJ

DCM1-NJ

0.6

Error Rate

0.4

0.2

0

0

400

800

1200

1600

No. Taxa

general comments
General comments
  • Everything in phylogeny (just about) is NP-hard
  • Graph-theory, probability, and optimization are the basic tools for algorithmic advances
  • Algorithms are tested on both real and simulated data.
  • Collaborations with domain experts (biologists or linguists) essential to success. (At UT, we have wonderful biologists to work with, and all my students collaborate with them.)
for more information
For more information
  • Send me email to make an appointment
  • Check my webpage for tutorials on the subject

See http://www.phylo.org and http://www.cs.utexas.edu/~tandy for more info

ad