1 / 30

Class 9: Phylogenetic Trees

Class 9: Phylogenetic Trees. The Tree of Life. D’après Ernst Haeckel, 1891. Evolution. Many theories of evolution Basic idea: speciation events lead to creation of different species Speciation caused by physical separation into groups where different genetic variants become dominant

davin
Download Presentation

Class 9: Phylogenetic Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Class 9: Phylogenetic Trees .

  2. The Tree of Life D’après Ernst Haeckel, 1891

  3. Evolution • Many theories of evolution • Basic idea: • speciation events lead to creation of different species • Speciation caused by physical separation into groups where different genetic variants become dominant • Any two species share a (possibly distant) common ancestor

  4. Phylogenies • A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species • Leafs - current day species • Nodes - hypothetical most recent common ancestors • Edges length - “time” from one speciation to the next Aardvark Bison Chimp Dog Elephant

  5. Primate evolution

  6. Until mid 1950’s phylogenies were constructed by experts based on their opinion (subjective criteria) • The Linnaeus classification scheme implicitly assumes tree structure • Since then, focus on objective criteria for constructing phylogenetic trees • Thousands of articles in the last decades • Important for many aspects of biology • Classification (systematics) • Understanding biological mechanisms

  7. Morphological vs. Molecular • Classical phylogenetic analysis: morphological features • number of legs, lengths of legs, etc. • Modern biological methods allow to use molecular features • Gene sequences • Protein sequences • Analysis based on homologous sequences (e.g., globins) in different species

  8. Dangers in Molecular Phylogenies • We have to remember that gene/protein sequence can be homologous for different reasons: • Orthologs -- sequences diverged after a speciation event • Paralogs -- sequences diverged after a duplication event • Xenologs -- sequences diverged after a horizontal transfer (e.g., by virus)

  9. Dangers of Paralogues Gene Duplication Speciation events 2B 1B 3A 3B 2A 1A

  10. Dangers of Paralogs • If we only consider 1A, 2B, and 3A... Gene Duplication Speciation events 2B 1B 3A 3B 2A 1A

  11. Types of Trees • A natural model to consider is that of rooted trees Common Ancestor

  12. Types of Trees • Depending on the model, data from current day species does not distinguish between different placements of the root vs

  13. Types of trees • Unrooted tree represents the same phylogeny with out the root node

  14. Positioning Roots in Unrooted Trees • We can estimate the position of the root by introducing an outgroup: • a set of species that are definitely distant from all the species of interest Proposed root Falcon Aardvark Bison Chimp Dog Elephant

  15. Type of Data • Distance-based • Input is a matrix of distances between species • Can be fraction of residue they disagree on, or alignment score between them, or … • Character-based • Examine each character (e.g., residue) separately

  16. Simple Distance-Based Method Input: distance matrix between species Outline: • Cluster species together • Initially clusters are singletons • At each iteration combine two “closest” clusters to get a new one

  17. UPGMA Clustering • Let Ci and Cj be clusters, define distance between them to be • When we combine two cluster, Ci and Cj, to form a new cluster Ck, then

  18. Molecular Clock • UPGMA implicitly assumes that all distances measure time in the same way 2 3 2 3 4 1 4 1

  19. Additivity • A weaker requirement is additivity • In “real” tree, distances between species are the sum of distances between intermediate nodes k c b j a i

  20. Consequences of Additivity • Suppose input distances are additive • For any three leaves • Thus k c b j a m i

  21. Neighbor Joining • Can we use this fact to construct trees? • Let where Theorem: if D(i,j) is minimal (among all pairs of leaves), then i and j are neighbors in the tree

  22. Neighbor Joining • Set L to contain all leaves Iteration: • Choose i,j such that D(i,j) is minimal • Create new node k, and set • remove i,j from L, and add k Terminate:when |L| =2, connect two remaining nodes

  23. Distance Based Methods • If we make strong assumptions on distances, we can reconstruct trees • In real-life distances are not additive • Sometimes they are close to additive

  24. Parsimony • Character-based method Assumptions: • Independence of characters (no interactions) • Best tree is one where minimal changes take place

  25. Simple Example • Suppose we have five species, such that three have ‘C’ and two ‘T’ at a specified position • Minimal tree has one evolutionary change: C T C T C C C T T  C

  26. Aardvark Bison Chimp Dog Elephant Another Example • What is the parsimony score of A: CAGGTA B: CAGACA C: CGGGTA D: TGCACT E: TGCGTA

  27. Evaluating Parsimony Scores • How do we compute the Parsimony score for a given tree? • Weighted Parsimony • Each change is weighted by the score c(a,b)

  28. Evaluating Parsimony Scores Dynamic programming on the tree Initialization: • For each leaf i set S(i,a) = 0 if i is labeled by a, otherwise S(i,a) =  Iteration: • if k is node with children i and j, then S(k,a) = minb(S(i,b)+c(a,b)) + minb(S(j,b)+c(a,b)) Termination: • cost of tree is minaS(r,a) where r is the root

  29. Aardvark Bison Chimp Dog Elephant Example A: CAGGTA B: CAGACA C: CGGGTA D: TGCACT E: TGCGTA

  30. Cost of Evaluating Parsimony • If there are n nodes, m characters, and k possible values for each character, then complexity is O(nmk) • Using this procedure, we can reconstruct most parsimonious values at each ancestor node

More Related