1 / 25

Phylogenies and the Tree of Life

Phylogenies and the Tree of Life. Basic Principles of Phylogenetics Parsimony - Distance - Likelihood Topologies - Super Trees - Testing Networks Challenges Empirical Investigations: Molecular Clock Biochemical rates Selection Strength Tree shapes

yelena
Download Presentation

Phylogenies and the Tree of Life

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phylogenies and the Tree of Life Basic Principles of Phylogenetics Parsimony - Distance - Likelihood Topologies - Super Trees - Testing Networks Challenges Empirical Investigations: Molecular Clock Biochemical rates Selection Strength Tree shapes Branching Patterns Rootings Open Questions

  2. Central Principles of Phylogeny Reconstruction s1 s1 s1 s3 s3 s3 s2 s2 s2 s4 s4 s4 0 1 2 Parsimony Total Weight: 3 0 0 0.6 1 3 2 3 2 0 0.7 1.5 Distance 0.4 0.3 L=3.1*10-7 Parameter estimates Likelihood TTCAGT TCCAGT GCCAAT GCCAAT

  3. Molecular clock a b c d e a - 22 10 22 22 b 7 - 22 16 14 c 7 8 - 22 22 d 12 13 9 - 16 e 13 14 10 13 - No Molecular clock c 11 a d 3 2 6 8 2 1 7 5 7 4 e b d a c e b a 7 7 c b e b 8 14 From Distance to Phylogenies What is the relationship of a, b, c, d & e?

  4. 1 1 2 1 3 2 3 4 3 4 2 1 1 1 1 1 1 2 2 2 2 2 2 3 4 4 3 3 3 3 4 4 3 4 4 5 5 5 5 5 Enumerating Trees: Unrooted & valency 3 Recursion: Tn= (2n-5) Tn-1 Initialisation: T1= T2= T3=1

  5. T1 T1 T2 T3 T1 T3 T3 T4 T4 T4 T2 T2 s4 s6 s1 s4 s6 T4 s2 s5 s5 s3 s3 s1 T3 T3 T4 s2 s4 s6 s1 s4 s6 T4 s2 s5 s5 s3 s1 s3 T3 T3 T4 s2 Heuristic Searches in Tree Space Nearest Neighbour Interchange Subtree regrafting Subtree rerooting and regrafting

  6. Assignment to internal nodes: The simple way. A G T C ? ? ? ? ? ? C C C A What is the cheapest assignment of nucleotides to internal nodes, given some (symmetric) distance function d(N1,N2)?? If there are k leaves, there are k-2 internal nodes and 4k-2 possible assignments of nucleotides. For k=22, this is more than 1012.

  7. 3 5 4 6 13 11 9 7 15 17 14 10 12 16 8 2 1 5S RNA Alignment & Phylogeny Hein, 1990 Transitions 2, transversions 5 Total weight 843. 10 tatt-ctggtgtcccaggcgtagaggaaccacaccgatccatctcgaacttggtggtgaaactctgccgcggt--aaccaatact-cg-gg-gggggccct-gcggaaaaatagctcgatgccagga--ta 17 t--t-ctggtgtcccaggcgtagaggaaccacaccaatccatcccgaacttggtggtgaaactctgctgcggt--ga-cgatact-tg-gg-gggagcccg-atggaaaaatagctcgatgccagga--t- 9 t--t-ctggtgtctcaggcgtggaggaaccacaccaatccatcccgaacttggtggtgaaactctattgcggt--ga-cgatactgta-gg-ggaagcccg-atggaaaaatagctcgacgccagga--t- 14 t----ctggtggccatggcgtagaggaaacaccccatcccataccgaactcggcagttaagctctgctgcgcc--ga-tggtact-tg-gg-gggagcccg-ctgggaaaataggacgctgccag-a--t- 3 t----ctggtgatgatggcggaggggacacacccgttcccataccgaacacggccgttaagccctccagcgcc--aa-tggtact-tgctc-cgcagggag-ccgggagagtaggacgtcgccag-g--c- 11 t----ctggtggcgatggcgaagaggacacacccgttcccataccgaacacggcagttaagctctccagcgcc--ga-tggtact-tg-gg-ggcagtccg-ctgggagagtaggacgctgccag-g--c- 4 t----ctggtggcgatagcgagaaggtcacacccgttcccataccgaacacggaagttaagcttctcagcgcc--ga-tggtagt-ta-gg-ggctgtccc-ctgtgagagtaggacgctgccag-g--c- 15 g----cctgcggccatagcaccgtgaaagcaccccatcccat-ccgaactcggcagttaagcacggttgcgcccaga-tagtact-tg-ggtgggagaccgcctgggaaacctggatgctgcaag-c--t- 8 g----cctacggccatcccaccctggtaacgcccgatctcgt-ctgatctcggaagctaagcagggtcgggcctggt-tagtact-tg-gatgggagacctcctgggaataccgggtgctgtagg-ct-t- 12 g----cctacggccataccaccctgaaagcaccccatcccgt-ccgatctgggaagttaagcagggttgagcccagt-tagtact-tg-gatgggagaccgcctgggaatcctgggtgctgtagg-c--t- 7 g----cttacgaccatatcacgttgaatgcacgccatcccgt-ccgatctggcaagttaagcaacgttgagtccagt-tagtact-tg-gatcggagacggcctgggaatcctggatgttgtaag-c--t- 16 g----cctacggccatagcaccctgaaagcaccccatcccgt-ccgatctgggaagttaagcagggttgcgcccagt-tagtact-tg-ggtgggagaccgcctgggaatcctgggtgctgtagg-c--t- 1 a----tccacggccataggactctgaaagcactgcatcccgt-ccgatctgcaaagttaaccagagtaccgcccagt-tagtacc-ac-ggtgggggaccacgcgggaatcctgggtgctgt-gg-t--t- 18 a----tccacggccataggactctgaaagcaccgcatcccgt-ccgatctgcgaagttaaacagagtaccgcccagt-tagtacc-ac-ggtgggggaccacatgggaatcctgggtgctgt-gg-t--t- 2 a----tccacggccataggactgtgaaagcaccgcatcccgt-ctgatctgcgcagttaaacacagtgccgcctagt-tagtacc-at-ggtgggggaccacatgggaatcctgggtgctgt-gg-t--t- 5 g---tggtgcggtcataccagcgctaatgcaccggatcccat-cagaactccgcagttaagcgcgcttgggccagaa-cagtact-gg-gatgggtgacctcccgggaagtcctggtgccgcacc-c--c- 13 g----ggtgcggtcataccagcgttaatgcaccggatcccat-cagaactccgcagttaagcgcgcttgggccagcc-tagtact-ag-gatgggtgacctcctgggaagtcctgatgctgcacc-c--t- 6 g----ggtgcgatcataccagcgttaatgcaccggatcccat-cagaactccgcagttaagcgcgcttgggttggag-tagtact-ag-gatgggtgacctcctgggaagtcctaatattgcacc-c-tt-

  8. Cost of a history - minimizing over internal states d(C,G) +wC(left subtree) A CGT A CGT A C G T

  9. Cost of a history – leaves (initialisation). A C G T Initialisation: leaves Cost(N)= 0 if N is at leaf, otherwise infinity G A Empty Cost 0 Empty Cost 0

  10. Fitch-Hartigan-Sankoff Algorithm (A,C,G,T) (9,7,7,7) (A, C, G,T) (10,2,10,2) The cost of cheapest tree hanging from this node given there is a “C” at this node (A,C,G,T) * 0 * * (A,C,G,T) * * * 0 (A,C,G,T) * * 0 * 5 C A 2 T G

  11. The Felsenstein Zone Felsenstein-Cavendar (1979) Reconstructed Tree True Tree s1 s4 s1 s2 s3 s4 s2 s3 Patterns:(16 only 8 shown) 0 1 0 0 00 0 0 0 0 1 0 01 0 1 0 0 0 1 01 1 0 0 0 0 0 10 1 1

  12. Bootstrapping Felsenstein (1985) 1 500 2 ATCTGTAGTCT ATCTGTAGTCT ATCTGTAGTCT ATCTGTAGTCT ?????????? ?????????? ?????????? ?????????? ?????????? ?????????? ?????????? ?????????? 2 3 2 3 2 3 4 4 1 1 4 1 ATCTGTAGTCT ATCTGTAGTCT ATCTGTAGTCT ATCTGTAGTCT 10230101201

  13. Assignment to internal nodes: The simple way. A G T C ? ? ? ? ? ? C C C A If branch lengths and evolutionary process is known, what is the probability of nucleotides at the leaves? Cctacggccatacca a ccctgaaagcaccccatcccgt Cttacgaccatatca c cgttgaatgcacgccatcccgt Cctacggccatagca c ccctgaaagcaccccatcccgt Cccacggccatagga c ctctgaaagcactgcatcccgt Tccacggccatagga a ctctgaaagcaccgcatcccgt Ttccacggccatagg c actgtgaaagcaccgcatcccg Tggtgcggtcatacc g agcgctaatgcaccggatccca Ggtgcggtcatacca t gcgttaatgcaccggatcccat

  14. Probability of leaf observations - summing over internal states P(CG) *PC(left subtree) A CGT A CGT A C G T

  15. Output from Likelihood Method. Molecular Clock No Molecular Clock s3 s4 23 -/+5.2 10.9 -/+2.1 Duplication Times s1 11.6 -/+2.1 3.9 -/+0.8 9.9 -/+1.2 Amount of Evolution 12 -/+2.2 4.1 -/+0.7 11.1 -/+1.8 11.4 -/+1.9 6.9 -/+1.3 5.9 -/+1.2 s2 s5 Now s5 s3 s4 s1 s2 2n-3 lengths estimated n-1 heights estimated Likelihood: 7.9*10-14   = 0.31 0.18 Likelihood: 6.2*10-12   = 0.34 0.16 ln(7.9*10-14) –ln(6.2*10-12) is 2 – distributed with (n-2) degrees of freedom

  16. The Molecular Clock Unknown Ancestors a Known Ancestor,a, at Time t ?? s2 s1 s1 s3 s2 First noted by Zuckerkandl & Pauling (1964) as an empirical fact. How can one detect it?

  17. Rootings Purpose 1) To give time direction in the phylogeny & most ancient point 2) To be able to define concepts such a monophyletic group. 1) Outgrup: Enhance data set with sequence from a species definitely distant to all of them. It will be be joined at the root of the original data 2) Midpoint: Find midpoint of longest path in tree. 3) Assume Molecular Clock.

  18. Rooting the 3 kingdoms P P E E E Root?? MDH LDH P A A A LDH/MDH LDH/MDH E P A E P A 3 billion years ago: no reliable clock - no outgroup Given 2 set of homologous proteins, i.e. MDH & LDH can the archea, prokaria and eukaria be rooted? Given 2 set of homologous proteins, i.e. MDH & LDH can the archea, prokaria and eukaria be rooted?

  19. Absolute Time Clock: s2 l2 {l1 = l2 < l3} l1 s1 l3 Some rooting techniquee l3 l1 = l2 s3 s2 s1 s3 Generation Time Clock: 100 Myr constant Generation Time variable Absolute Time Clock Elephant Mouse The generation/year-time clock Langley-Fitch,1973

  20. Generation Time Clock Any Tree s2 s1 s3 Assume, a data set: 3 species, 2 sequences each s2 s1 s3 s2 s1 s2 s1 s3 s3 The generation/year-time clock Langley-Fitch,1973 Can the generation time clock be tested?

  21. The generation/year-time clock Langley-Fitch,1973 s2 l2 l3 l1 s1 l3 l1 = l2 s2 s1 s3 s3 dg: 2 k=3: degrees of freedom: 3 dg: k-1 k: dg: 2k-3 s2 c*l2 s2 l2 c*l1 l1 s1 s1 l3 c*l3 s3 s3 k=3, t=2: dg=4 k, t: dg =(2k-3)-(t-1)

  22. & b – globin, cytochrome c, fibrinopeptide A & generation time clock Langley-Fitch,1973 Fibrinopeptide A phylogeny: Rat Pig Dog Cow Goat Human Horse Rabbit Llama Sheep Gibbon Monkey Donkey Gorilla Relative rates a-globin 0.342 • – globin 0.452 cytochrome c 0.069 fibrinopeptide A 0.137

  23. Almost Clocks (MJ Sanderson (1997) “A Nonparametric Approach to Estimating Divergence Times in the Absence of Rate Constancy” Mol.Biol.Evol.14.12.1218-31), J.L.Thorne et al. (1998): “Estimating the Rate of Evolution of the Rate of Evolution.” Mol.Biol.Evol. 15(12).1647-57, JP Huelsenbeck et al. (2000) “A compound Poisson Process for Relaxing the Molecular Clock” Genetics 154.1879-92. ) I Smoothing a non-clock tree onto a clock tree (Sanderson) II Rate of Evolution of the rate of Evolution (Thorne et al.). The rate of evolution can change at each bifurcation III Relaxed Molecular Clock (Huelsenbeck et al.). At random points in time, the rate changes by multiplying with random variable (gamma distributed) Comment: Makes perfect sense. Testing no clock versus perfect is choosing between two unrealistic extremes.

  24. 2 3 3 1 2 Spanning tree 1 4 Steiner tree 4 2 1 1 1-Spannoid 5 5 3 2-Spannoid 3 4 4 2 6 Spannoids Advantage: Decomposes large trees into small trees Questions: How to find optimal spannoid? How well do they approximate?

  25. Profiloids and Staroids sk s2 s1 Ideal large phylogeny Profile HMM HMM1 HMM3 HMM2 A phylogeny of profiles - a staroid Questions: Parameter changes on edges relating HMMs Choosing Optimal Staroids

More Related