1 / 59

Phylogenetics

Phylogenetics. Phylogenetic Trees. time. NODE. BRANCH. Hypothetical Taxonomic Unit. ROOT. Operational Taxonomic Unit (OTU). time. Information. Branching order (topology) Relative closeness of different taxa Branch length Amount of divergence. A. B. Rooted and unrooted trees. C.

Download Presentation

Phylogenetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phylogenetics

  2. Phylogenetic Trees time

  3. NODE BRANCH Hypothetical Taxonomic Unit ROOT Operational Taxonomic Unit (OTU) time

  4. Information • Branching order (topology) • Relative closeness of different taxa • Branch length • Amount of divergence

  5. A B Rooted and unrooted trees C A D B C D E E UNROOTED ROOTED

  6. A B Rooted and unrooted trees E A B C D D E C UNROOTED ROOTED

  7. Rooted and unrooted trees A A E B B C D D E C UNROOTED ROOTED

  8. ROOTED UNROOTED 3 OTUs A A C B B B C A A C B C A A A 4 OTUs C A B B B C C C D D D B D B A A D C C D B A C … 15 rooted trees of 4 OTUs B D

  9. Monophyletic & Paraphyletic Birds Crocodiles REPTILES Snakes and lizards Turtles and tortoises Mammals

  10. Monophyletic & Paraphyletic • Monophyletic • Natural clade; all of the taxa are derived from a common ancestor • Paraphyletic • Taxonomic group whose most recent common ancestor is shared by another taxon

  11. Reconstruct phylogeny from molecular data ACTGTTACCGA ? ACTGTTACCGA ACTGTTACCGA ACTGTTACCGA ACTGTTACCGA

  12. Types of phylogenetic analysis methods • Phenetic: trees are constructed based on observed characteristics, not on evolutionary history • Cladistic: trees are constructed based on fitting observed characteristics to some model of evolutionary history Distance methods Parsimony and Maximum Likelihood methods

  13. Methods of Tree reconstruction • Distance • Maximum Parsimony • Maximum Likelihood • Bayesian Phylogeny Estimation: Traditional and Bayesian Approaches Nature Reviews Genetics (2003) 4:275

  14. Genetic distance • Distance from one sequence to another • Hamming Distance • Count number of differences • Multiple hits – number of events is greater than number of differences • Estimate number of events • Infer tree from genetic distance using Neighbour-joining (NJ) method

  15. UPGMA shown for illustrative purposes. Neighbour-joining is preferred method.

  16. The algorithm in the text means: find the closest distance between two sequences, cluster those; then find the next closest distance, cluster those; as sequences are added to existing clusters find the average distance between existing clusters • Work through the notation! • UPGMA assumes a molecular clock mechanism of evolution

  17. Neighbor-joining: corrects for UPGMA’s assumption of the same rate of evolution for each branch by modifying the distance matrix to reflect different rates of change. • The net difference between sequence i and all other sequences is • ri = Sdik

  18. The rate-corrected distance matrix is then • Mij = dij - (ri + rj)/(n - 2) • Join the two sequences whose Mij is minimal; then calculate the distance from this new node to all other sequences using • dkm = (dim + djm - dij)/2 • Again correct for rates and join nodes.

  19. Maximum Parsimony (MP) • Find topology requiring smallest number of evolutionary changes • Consider each position (site) in the sequence alignment independently • Not all sites are informative • Informative • Favours one topology over others

  20. Informative sites a. A A G A G T T C A b. A G C C G T T C T c. A G A T A T C C A d. A G A G A T C C T a b c a c a d d d b c b

  21. Maximum Likelihood (ML) • Likelihood L of a tree is the probability of observing the data given the treeL = P(data|tree) • Find the tree with the highest L value • Results depends on model of nucleotide substitution • Computationally time-consuming

  22. Actually, all the other methods discussed implicitly use a simple model of evolution similar to the typical model made explicit in maximum likelihood: • All sites selectively neutral • All mutate independently, forward and reverse rates equal, given by m

  23. Also assume discrete generations and sites change independently • Given this model, can calculate probability that a site with initial nucleotide I will change to nucleotide j within time t: • Ptij = dije-mt + (1 - e-mt)gj, where dij = 1 if i = j and dij = 0 otherwise, and where gj is the equilibrium frequency of nucleotide j

  24. The likelihood that some site is in state i at the kth node of a tree is Li(k) • The likelihoods for all states for each site for each node are calculated separately; the product of the likelihoods for each site gives the overall likelihood for the observed data • Different tree topologies are searched to find the highest overall likelihood

  25. Maximum likelihood is maybe the “gold standard” for phylogenetic analysis; but because of its computational intensity it can only be used for select data and only after much initial fine tuning of many parameters of sequence alignments • Often used to distinguish between several already generated trees

  26. Bayesian (B) Phylogeny Estimation • Searches for best trees consistent with both model and data • Incorporates prior knowledge (prior probability) • B maximises probability of tree given data and model • Searches for best set of trees

  27. Comparison of methods How much information are they using? • MP, ML, B use actual DNA whereas NJ summarises information into distance matrix • BUT, not all sites are used by MP (“informative” sites only) How can the nature of the data affect the methods? • NJ better for recent divergences • MP works well for a high number of informative sites

  28. Comparison of methods How do they cope with lots of sequences? • MP requires comparison of all possible trees • Not possible for large number of taxa • ML is computationally intensive and very slow for large number of taxa • NJ efficient for large number of taxa Anything else? • ML requires explicit assumptions about rate and pattern of substitution (model) • ML may perform poorly if model is incorrect • ML or B may get stuck on local maxima

  29. chicken human human mouse mouse rat rat Outgroup rooting of unrooted trees • Outgroup – related sequence that definitely diverged earlier (paleontological evidence)

  30. Rate (r) of evolution • K = number of substitutions per site • T = time since divergence • r = K/2T • Rate is expressed as substitutions per site per year Species A Species B T

  31. Estimating species divergence times • fossil evidence shows that T1 = 310 mya • What is T2 ? • Only need to have sequences and information on one divergence time Chicken (C) Human (B) Rat (A) T2 T1

  32. True tree and inferred tree • There is only one true tree of species relationships • Inferred tree may not be correct • Some genes may not be representative • Tree inference method may have produced an incorrect tree • e.g. parsimony method: may get several equally parsimonious results

  33. How credible is the tree? • The tree is a hypothesis of the true relationship • Need some measure of the support for that hypothesis • Note: Bayesian methods simultaneously estimate tree and measures of uncertainty for each branch

  34. Standard Error of branches Human Chimp Gorilla Orangutan

  35. The bootstrap: randomly sample all positions (columns in an alignment) with replacement -- meaning some columns can be repeated -- but conserving the number of positions; build a large dataset of these randomized samples

  36. Bootstrap

  37. Then use your method (distance, parsimony, likelihood) to generate another tree • Do this a thousand or so times • Note that if the assumptions the method is based on hold, you should always get the same tree from the bootstrapped alignments as you did originally • The frequency of some feature of your phylogeny in the bootstrapped set gives some measure of the confidence you can have for this feature

  38. Applications of phylogenetics • Detection of orthology and paralogy • Estimation of divergence times • Reconstruction of ancient proteins • Identifying residues important to selection • Detecting recombination points • Identifying mutations likely to be associated with disease • Determining the identity of new pathogens

  39. The time will come, I believe, though I shall not live to see it, when we shall have fairly true genealogical trees of each great kingdom of Nature. Charles Darwin

  40. The Tree of Life • Traditional classification of life into five kingdoms • Bacteria (inc cyanobacteria) • Protista (inc. cilliates, flagellates, amoebae) • Fungi • Plantae • Animalia

  41. Archaebacteria • Carl Woese and colleagues • Study relationships by comparing rRNAs • Methanogens were expected to group with other bacteria • BUT, found to be equally distant from bacteria and eukaryotes • Made new taxon - Archaebacteria • Includes many extremophiles • thermophiles • hyperthermophiles • halophiles (salt dependent)

  42. The Tree of Life

  43. lineage 1 Gene A1 lineage 2 lineage 3 Gene A lineage 1 Gene A2 lineage 2 lineage 3 Where is the root of the Tree of Life? • No possible outgroup (by definition) • Iwabe et al. (1989) • Examined phylogenetic tree of pairs of genes that exist in all organisms • derived from gene duplication that predates lineage divergences

  44. Homologous elongation factor genes EF-Tu and EF-G present in all prokaryotes and eukaryotes • Both genes show the same topology Archaea EF-Tu Eucarya Bacteria Archaea EF-G Eucarya Bacteria

  45. Changing view ofThe Tree of Life …(Gaucher et al, 2010) based on morphological characteristics (Chatton, 1925) based on DNA sequence analysis (Woese & Fox, 1977) based on phylogenies of hundreds of genes based on membrane architecture & gene indels based on ancient gene duplication Most modern view …

  46. Human Chimp Gorilla Orangutan Gibbon Phylogeny of humans and apes • Darwin – Gorilla and Chimpanzee our closest relatives and human evolutionary origins in Africa • Many people preferred anthropocentric idea that humans were special Traditional view

  47. So what is the evidence? • Serological precipitation (Goodman 1962) – H, G, C constitute a natural clade, orangutans & gibbons earlier diverging • However, H,G,C relative relationships remained unclear • Most DNA sequence data support ((H,C),G) • Some genes show different relationship Human Chimp Gorilla Orangutan Gibbon

  48. Conservation biology – the dusky seaside sparrow • Last one died June 1987 (DisneyWorld) • Discovered 1872 • Ammodramus maritimus nigrescens • Geographically confined to small salt marsh in Florida • 2000 individuals in 1900 • 6 individuals (all male) in 1980 • Conservation program • artificial breeding

  49. Conservation genetics • Mating of remaining males with females from closest subspecies available • Female hybrids of first generation then “back-crossed” to original males • Continue as long as original males live • Which species to choose to take the females from??

  50. 8 other A. maritimus subspecies • Geographically dispersed along coast • Artificial breeding with Scott’s seaside sparrow (A. m. peninsulae) • Chosen based on Morphological and behavioural similarities • Was this the best choice?

More Related