1 / 40

Phylogeny

Phylogeny . Reconstructing a phylogeny. The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data The data must be comprised of homologous types In molecular evolution, the studied data are homologous DNA/AA sequences

jana
Download Presentation

Phylogeny

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phylogeny

  2. Reconstructing a phylogeny • The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data • The data must be comprised of homologous types • In molecular evolution, the studied data are homologous DNA/AA sequences • Phylogeny reconstruction explicitly assumes that the sequences are aligned INPUT = MSA

  3. Reminder: MSA and phylogeny are dependent MSA Unaligned sequences Sequence alignment Phylogeny reconstruction Inaccurate guide tree

  4. Phylogeny representation Textual representation (Newick format) Visual representation ((A,C),(B,D)); A C B D • Each pair of parenthesis () encloses a clade in the tree • A comma “,” separates the members of the corresponding clade • A semicolon “;” is always the last character

  5. Some terminology monophyletic group (clade) root External branches internal branches (splits) Neighbors Neighbors internal nodes External nodes (leaves)

  6. Swapping neighbors is meaningless Gorilla Human Chimp Human Chimp Gorilla = Gorilla Chimp Human (Gorilla,(Human,Chimp)) = (Gorilla,(Chimp,Human)) = = Chimp Human Gorilla = ((Human,Chimp),Gorilla) = ((Chimp,Human),Gorilla)

  7. Rooted vs. unrooted 3 1 A B C C B A 2 A C B ≠ 3 A 1 ≠ C B 2

  8. In newick format 3 1 A B C C B A ((A,B),C) ((C,B),A) 2 A C B (A,B,C) ((A,C),B) ≠ 3 A 1 ≠ C B 2

  9. How can we root a tree?

  10. Rooting the tree based on a priori knowledge: using an outgroup Human Chimp Gorilla Chicken Human Chicken Gorilla Chimp Chicken Gorilla Human Chimp OUTGROUP INGROUP The outgroup should be close enough for detecting sequence homology, but far enough to be a clear outgroup

  11. The gene tree is not always identical to the species tree Gorilla Chicken Human Chimp Chicken Chicken Human Gorilla Gorilla Human Chimp Chimp Gene tree Species tree ≠

  12. Phylogeny reconstruction approaches C C A A D D A,B E B C B E A D E B Distance based methods: Neighbor Joining The Minimum Evolution (ME) criterion: in each iteration we separate the two sequences which result with the minimal sum of branch lengths

  13. Phylogeny reconstruction approaches 1 4 2 3 1 2 3 4 1 3 2 4 Seq 1: Seq 2: Seq 3: Seq 4: 1 4 2 3 1 2 3 4 1 3 2 4 Topology search methods: MP, ML Maximum Parsimony: finds the most parsimonious topology Maximum Likelihood: finds the most likely topology P(Data|T)

  14. Phylogeny reconstruction approaches: summary • Distance based methods • Neighbor Joining (e.g., using ClustalX) • Fast • Inaccurate • Topology search methods • Maximum parsimony (e.g., using MEGA) • Crude • Questionable statistical basis • Maximum likelihood (e.g., using RAxML, phyML) • Accurate • Slow • Bayesian methods • Monte Carlo Markov Chains (MCMC) (e.g., using MrBayes) • Most accurate • Very slow

  15. Chimp Human Gorilla How robust is our tree?

  16. Bootstrap for estimating robustness • We need some statistical way to estimate the confidence in the tree topology • But we don’t know anything about the distribution of tree topologies • The only data source we have is our data (MSA) • So, we must rely on our own resources: “pull up by your own bootstraps”

  17. Sp1 Sp2 Sp3 Sp4 Bootstrap 1. Create n (100-1000) new MSAs (pseudo-MSAs) by randomly sampling K positions from our original MSA with replacement 12345K 1 : ATCTG…A 2 : ATCTG…C 3 : ACTTA…C 4 : ACCTA…T 11244…3 1 : AATTT…C 2 : AATTT…C 3 : AACTT…T 4 : AACTT…C 97478…10 1 : TTTTA…T 2 : CATAC…A 3 : CATAC…T 4 : AGTGG…A 51578… 12 1 : GAGTA…T 2 : GAGAC…G 3 : AAAAC…A 4 : AAAGG…C

  18. Sp1 Sp2 Sp3 Sp4 Bootstrap 2. Reconstruct a pseudo-tree from each pseudo-MSA with the same method used for reconstructing the original tree 11244…3 1 : AATTT…C 2 : AATTT…C 3 : AACTT…T 4 : AACTT…C 97478…10 1 : TTTTA…T 2 : CATAC…A 3 : CATAC…T 4 : AGTGG…A 51578… 12 1 : GAGTA…T 2 : GAGAC…G 3 : AAAAC…A 4 : AAAGG…C Sp1 Sp1 Sp2 Sp2 Sp3 Sp3 Sp4 Sp4

  19. Sp1 Sp2 Sp3 Sp4 Bootstrap 3. For each split in our original tree, we count the number of times it appeared in the pseudo-trees Sp1 Sp1 Sp2 Sp2 Sp3 Sp3 Sp4 Sp4 67% Sp1 In 67% of the pseudo-trees, the split between SP1+SP2 and the rest of the tree was found 100% Sp2 Sp3 In general bp support < 80% is considered low Sp4

  20. ClustalX: NJ phylogeny reconstruction

  21. ClustalX: NJ phylogeny reconstruction

  22. http://phylobench.vital-it.ch/raxml-bb/

  23. Viewing the tree with njPlot

  24. Note: unrooted tree

  25. Defining an outgroup

  26. Swapping nodes

  27. Bootstrap support

  28. FigTree: tree visualization and figure creationhttp://tree.bio.ed.ac.uk/software/figtree/

  29. Reconstructing the tree of life

  30. Darwin’s vision of the tree of life from the Origin of Species

  31. The three-domain tree of life based on SSU rRNA MSA

  32. But branching of several kingdoms remain in dispute

  33. Lateral Gene Transfer (LGT) challenges the conceptual basis of phylogenetic classification

  34. Methodology • Started with 36 genes universally present in 191 species (spanning all 3 domains of life), for which orthologs could be unambiguously identified • Eliminated 5 genes that are LGT suspects (mostly tRNA synthetases) • Constructed an MSA for each of the 31 orthogroups • Concatenated all 31 MSAs to a super-MSA of 8090 columns • The phylogeny was reconstructed based on the super-MSA using the maximum likelihood approach

  35. Archaea http://itol.embl.de Eukaryota Bacteria

  36. Tree support • 81.7% of the splits show bootstrap support of over 80% • 65% of the split show bootstrap support of 100% • However, several deep splits show low supports

  37. Still, the debate goes on

  38. “Tree of one percent of life” • Ciccarelli et al. on the one hand favor the claim that bacteria adhere to a bifurcating tree of life, given that the small amount of LGT genes are filtered • On the other hand, their filtering process left only 31 proteins, which represent ~1% of an average prokaryotic proteome and ~0.1% of a large eukaryotic proteome • “If throwing out all non-universally distributed genes and all LGT suspects leaves a 1% tree, then we should probably abandon the tree as a working hypothesis”

More Related