1 / 15

Phylogenetics

Phylogenetics. Motivation Concepts Algorithms. Motivation. Life arose just once - “Thou art my brethren but a few sequences removed”

barney
Download Presentation

Phylogenetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phylogenetics • Motivation • Concepts • Algorithms Lecture 13 CS566

  2. Motivation • Life arose just once - “Thou art my brethren but a few sequences removed” • Phylogenetic trees = Topologies of evolutionary relationships between sequences, and, possibly, species - “Is it man or cow that is the true heir of the fabulous treasures of the woolly mammoth dynasty?” • Phylogenetic tree as guide tree for multiple sequence alignment (déjà vu) Lecture 13 CS566

  3. Concepts • Mutation and Evolution • Mutations that persist over generations = Evolution • Tree, not a lattice • Each species arose just once • Speciesphylogeny (often) != Sequence phylogeny • Sequences evolve at different rates • Within a single species • Between different species • Within a single sequence • Especially in bacteria, horizontal transfer (“Napster’s been around for ages”) quite common Lecture 13 CS566

  4. Concepts • Molecular clock assumption • Sequences drift apart at a constant rate • Aka edge length proportional to time • Aka satisfaction of ultrametricity • For any 3 sequences (all pair-wise distances are equal) xor (2 distances are equal, and the third one smaller) • If true, then • All path lengths from root to leaf nodes are equal • Additivity • Distance metric chosen is • True distance (fulfils triangular inequality) • Such that cumulative sum of edge lengths along path between 2 sequences equals the distance between 2 sequences Lecture 13 CS566

  5. Concepts • Heuristic forays into intractable space • Start with pairwise “distances” • Path length = Distance (~Evolutionary time) • Work from leaves to node to generate tree • (opposite of binary tree generation) • “Its easier to be rootless than to be rooted” • Binary tree approximation of higher order trees • Edges do not imply direct links (Missing links/incomplete data), only a representation of sequence evolution Lecture 13 CS566

  6. Algorithms • Parsimony (Character-based) • Distance based methods • Neighbor joining • UPGMA • Maximum Likelihood IIncreasing Sequence Similarity Lecture 13 CS566

  7. Algorithms • UPGMA (Unweighted pair group method with arithmetic averages) • Caveat if molecular clock not applicable: “If my cousin looks more like me than my brother, he must be my lost brother, and perhaps my brother my cousin?” • Neighbor joining • “Give me additive distances, and I shall give thee a tree, even if some sequences morph faster than others” • Parsimony • “Its just a bruise, not Kaposi’s sarcoma!” • Maximum Likelihood • “Given the facts, Watson, the answer is elementary!” Lecture 13 CS566

  8. UPGMA • Easiest to use if molecular clock and additivity are valid • No. of clusters = no. of sequences = no. of leaf nodes • Inter cluster distance = Average pairwise distance {While (no. of clusters > 1) • Connect pair of closest clusters (at distance d) with intermediate node at distance d/2 from each of them} • Caveat: Satisfies minimal distance requirement, but may result in spurious topologies – because of constant rate evolution assumption Lecture 13 CS566

  9. Parsimony • Parsimony (“Miserliness in model space”): Pick the simplest explanation that fits the facts - “If I hear a blood-curdling scream, it’s just one of my sons trying to kill the other – not an invasion by aliens!” • Every possible tree evaluated in terms of total number of steps needed to convert each sequence to another • Practical for only a few sequences • High percentage of similarity a prerequisite • Neither identical or ‘completely different’ sequence positions useful • Each difference should represent a single step (WYSIWYG) and not a ‘full circle’ or ‘non-shortest route’ Lecture 13 CS566

  10. Parsimony 123456789………… • ACCEFAHIKLKNPR • ACCEFGHILLLNPR • ACDEFGHIKLINPK • AADEFGHILLNNPK * * * 1 C 3 D D C 2 C 4 D Candidate tree for position 3 Lecture 13 CS566

  11. Parsimony • 3 sets of 3 trees each compared • The one with lowest total number of substitutions selected • Refinements: • Branch and bound: • Abandon a tree if subtree has a higher score than current minimal score tree • Heuristic branch-pattern representatives • Non-boolean costs: Tranversion > transition OR use of amino-acid substitution matrices Lecture 13 CS566

  12. Neighbor Joining • Generates unrooted tree, allowing for unequal branches • Given: Distance matrix for sequences • Steps: Repeat 1-3 till all branches generated • Take closest sequences i, j • Find branch lengths between i and j by treating remaining sequences as composite (c) • Calculate average i-C and j-C distances • Calculate branch lengths i and j • Treat ij as composite sequence now and generate new distance table. • Generate multiple trees by starting with different pairs • Compare resulting trees in terms of best fit to original distance matrix Lecture 13 CS566

  13. Rooting trees • Based on a “proxy ancestor” • Include a distant relative (“outgroup”) as the proxy ancestor • Add the outgroup as the last node • Point of attachment of outgroup represents root • Diameter center • Place root at center of longest path through tree Lecture 13 CS566

  14. Summary • Parsimony and ML based approaches computationally intensive – scalability poor • Neighbor joining adequate if additivity assumption is valid • UPGMA adequate if both molecular clock and additivity assumptions are valid for given set of sequences Lecture 13 CS566

  15. Summary • Phylogenetics useful to understand sequence evolution • Phylogenetics makes sense for • sequences with a high percentage of sequence identity • sequences not subject to ‘selection’ • Sequence tree not the same as species tree Lecture 13 CS566

More Related