1 / 28

An Introduction to Phylogenetic Methods

An Introduction to Phylogenetic Methods. Part one. Dr Laura Emery Laura.Emery@ebi.ac.uk www.ebi.ac.uk /training. Objectives. After this tutorial you should be able to … Discuss a range of methods for phylogenetic inference, their advantages , assumptions and limitations

vahe
Download Presentation

An Introduction to Phylogenetic Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Introduction to Phylogenetic Methods Part one Dr Laura Emery Laura.Emery@ebi.ac.uk www.ebi.ac.uk/training

  2. Objectives • After this tutorial you should be able to… • Discuss a range of methods for phylogenetic inference, their advantages, assumptionsand limitations • Implement some phylogenetic methods using publicly available software • Appreciate some approaches for assessing branch support and selecting an appropriate substitution model

  3. Outline • Alignment for phylogenetics • Phylogenetics: The general approach • Phylogenetic Methods (1 – simple methods) • Assessing Branch Support BREAK • Substitution Models • Phylogenetic Methods (2 - statistical inference) • Deciding which model to use (hypothesis testing) • Software

  4. Alignment for phylogenetics • Phylogenetic analyses are typically applied to alignments of sequence data • Occasionally other data such as morphological traits are used (e.g. when no sequence data is available) • Alignments must contain homologous sequences • We assume that sites in the same column in an alignment are homologous

  5. Alignment for phylogenetics Benjamin Redelings

  6. Columns in alignments should be homologous Benjamin Redelings

  7. Phylogenetics: The general approach • We want to find the tree that best explains our aligned sequences • We need to be able to define “best explains” • we need a model of sequence evolution • we need a criterion (or set of criteria) to use to choose between alternative trees • then evaluate all possible trees (NB: if N=20, then 2 x 1020 possible unrooted trees!) • or take a short cut Paul Sharp

  8. There is only one true tree • The true tree refers to what actually happened in the evolutionary past • All methods attempt to reconstruct the true phylogeny • Even the best method may not give you the true tree

  9. Methodological approaches • Distance matrix methods (pre-computed distances) • UPGMA assumes perfect molecular clock Sokal & Michener (1958) • Minimum evolution (e.g. Neighbor-joining, NJ) Saitou & Nei (1987) • Maximum parsimony Fitch (1971) • Minimises number of mutational steps • Maximum likelihood, ML • Evaluates statistical likelihood of alternative trees, based on an explicit model of substitution • Bayesian methods • Like ML but can incorporate prior knowledge

  10. What is a distance matrix? A table that indicates the number of substitutions between pairs of sequences

  11. Distance Matrix Methods Andrew Rambaut

  12. UPGMA Method • Identify the pair of most closely related taxa according to the pairwise-genetic distance matrix • Cluster these together Figures Andrew Rambaut

  13. UPGMA Method • Recalculate distance matrix (calculate the distances from the new cluster to every other sequence) Take the average of both distances E.g. distance[spinach, monkey/human] : • = (distance[spinach, human] + distance[spinach, monkey]) / 2 • = (86.3 + 90.8)/2 = 88.55 Figures Andrew Rambaut

  14. UPGMA Method • Repeat the procedure until the tree is finished distance between (spi,ric) and mos(mon,hum) is 108.7 Andrew Rambaut

  15. UPGMA Method • Assumptions: • Strict molecular clock • Ultrametricdistance data • Advantages: • Fast and simple • Disadvantages: • Data are almost never ultrametric • Usage: Almost never used

  16. Neighbour Joining Method • An improvement over the UPGMA: does not require data to be ultrametric • Identifies the topology that gives the least total branch length at each step Figures Olivier Gascuel

  17. Neighbour Joining Method • Advantages: • allows the use of an explicit model of evolution • fast and simple • able to deal with thousands of taxa • Disadvantages: • only produces one tree • reduces all sequence information into a single distance value • dependant on the evolutionary model used • Usage: commonly used due to being widely available in many software packages

  18. Methodological approaches • Distance matrix methods (pre-computed distances) • UPGMA assumes perfect molecular clock Sokal & Michener (1958) • Minimum evolution (e.g. Neighbor-joining, NJ) Saitou & Nei (1987) • Maximum parsimony Fitch (1971) • Minimises number of mutational steps • Maximum likelihood, ML • Evaluates statistical likelihood of alternative trees, based on an explicit model of substitution • Bayesian methods • Like ML but can incorporate prior knowledge

  19. Maximum Parsimony C The most parsimonious tree is the tree requiring the smallest number of substitutions to explain the sequences C T ? C MP (unrooted) * A C T T C C * * ? A ? C A C C T * * A A C C C C T T length = 2 A C C T length = 3 * * * * * * length = 3 length = 3

  20. Maximum Parsimony • Assumptions: • Multiple substitutions rare • Advantages: • fast • Disadvantages • not consistent with most models of evolution • can result in multiple optimal trees • Usage: still used with morphological data Figures Andrew Rambaut

  21. The problem of multiple substitutions A * • More likely to have occurred between distantly related species • > We need an explicit model of evolution to account for these (to be covered in part two) hidden mutations * G A * * A T

  22. Methodological approaches • Distance matrix methods (pre-computed distances) • UPGMA assumes perfect molecular clock Sokal & Michener (1958) • Minimum evolution (e.g. Neighbor-joining, NJ) Saitou & Nei (1987) • Maximum parsimony Fitch (1971) • Minimises number of mutational steps • Maximum likelihood, ML • Evaluates statistical likelihood of alternative trees, based on an explicit model of substitution • Bayesian methods • Like ML but can incorporate prior knowledge How well supported are my branches?

  23. How well supported are my branches? A tree is a collection of hypotheses so we assess our confidence in each of its parts or branches independently There are three main approaches: • Bootstraps • Bayesian methods • Approximate likelihood ratio test (aLRT) methods 100 0.99 63 0.81 85 0.93 probabilistic

  24. Bootstrapping 2. Resample columns with replacement to create many dummy alignments Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39: 783-791. 1. Take your alignment, and consider each column separately repeat lots repeat lots 3. Use these to draw many trees and count up the occurrences of each branch among these trees Figures Andrew Rambaut

  25. Issues with bootstrapping • Sites may not evolve independently • P values are biased (too conservative) • Calculating bootstraps for many branches results in multiple testing • Bootstrapping does not correct biases in phylogeny methods • Nevertheless they perform surprisingly well

  26. Outline • Alignment for phylogenetics • Phylogenetics: The general approach • Phylogenetic Methods (1 – simple methods) • Assessing Branch Support BREAK • Substitution Models • Phylogenetic Methods (2 - statistical inference) • Deciding which model to use (hypothesis testing) • Software

  27. Now it's your turn… • Open your course manuals and begin Tutorial 1 • Also available to download from: http://www.ebi.ac.uk/training/course/scuola-di-bioinformatica-2013 • You will require the alignment file 5SrRNA.txt • There are answers available online but it is much better to ask for help!

  28. Thank you! www.ebi.ac.uk Twitter: @emblebi Facebook: EMBLEBI

More Related