1 / 40

Molecular Phylogeny

Molecular Phylogeny. Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features between organisms. Today, molecular sequence data are mainly used for phylogenetic analyses. One tree of life A sketch Darwin made

jcervantes
Download Presentation

Molecular Phylogeny

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Molecular Phylogeny

  2. Phylogenyis the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features between organisms. Today, molecular sequence data are mainly used for phylogenetic analyses. One tree of life A sketch Darwin made soon after returning from his voyage on HMS Beagle (1831–36) showed his thinking about the diversification of species from a single stock (see Figure, overleaf). This branching, extended by the concept of common descent,

  3. Haeckel (1879) Pace (2001)

  4. Human Chimpanzee Gorilla Orangutan Molecular phylogeny uses trees to depict evolutionary relationships among organisms. These trees are based upon DNA and protein sequence data Gorilla Chimpanzee Orangutan Human Pre-Molecular analysis: The great apes (chimpanzee, Gorilla & orangutan) Separate from the human Molecular analysis: Chimpanzee is related more closely to human than the gorilla

  5. What can we learn from phylogenetics tree?

  6. 1. Determine the closest relatives of one organism in which we are interested • Was the extinct quagga more like a zebra or a horse?

  7. Human Chimpanzee Gorilla Orangutan Which species are closest to Human? Gorilla Chimpanzee Orangutan Human

  8. 2. Help to find the relationship between the species and identify new species Example Metagenomics A new field in genomics aims the study the genomes recovered from environmental samples. A powerful tool to access the wealthy biodiversity of native environmental samples

  9. Incredible microbial diversity in a drop of seawater 106 cells/ ml seawater 107 virus particles/ ml seawater >99% uncultivated microbes

  10. Metagenomics community sample community DNA (extraction bias) shear …ACGGCTGCGTTACATCGATCATTTACGA 3 – 4 kb shotgun library ACATCGATCATTTACGATACCATTG… (cloning bias) paired-end sequence (F / R) composite contig assembly

  11. From : “The Sorcerer II Global Ocean Sampling Expedition: Metagenomic Characterization of Viruses within Aquatic Microbial Samples” Williamson et al, PLOS ONE 2008

  12. 3. Discover a function of an unknown gene or protein RBP1_HS RBP2_pig Hypothetical protein RBP_RAT ALP_HS ALPEC_BV ALPA1_RAT ECBLC Hypothetical protein X Hypothetical protein

  13. Relationships can be represented byPhylogenetic Tree or Dendrogram F E D B A C

  14. Phylogenetic Tree Terminology • Graph composed ofnodes &branches • Each branch connects two adjacent nodes R F E D B A C

  15. Phylogenetic Tree Terminology Rooted tree based on priori knowledge: Un-rooted tree Human Chicken Gorilla Chimp Gorilla Human Chimp Chicken

  16. Rooted vs. unrooted trees 3 1 2 3 1 2

  17. How can we build a tree with molecular data? -Trees based on DNA sequence (rRNA) -Trees based on Protein sequences atcgatcgtgatcgatcgtagcatcgatgcatcgtacg MWRCPYCGKRQWCMWG

  18. Questions: • Can DNA and proteins from the same gene produce different trees ? • Can different genes have different evolutionary history ? • Can different regions of the same gene produce different trees ?

  19. Methods

  20. Approach 1 - Distance methods • Two steps : • Compute a distances between any two sequences from the MSA. • Find the tree that agrees most with the distance table. • Algorithms : -Neighbor joining Approach 2 - State methods • Algorithms: • Maximum parsimony (MP) • Maximum likelihood (ML)

  21. Neighbor Joining (NJ) • Reconstructs unrooted tree • Calculates branch lengths Based on pairwise distance • In each stage, the two nearest nodes of the tree are chosen and defined as neighbors in our tree. This is done recursively until all of the nodes are paired together.

  22. a b d c Star Structure Assumption: Divergence of sequences is assumed to occur at constant rate  Distance to root equals

  23. a b d c Basic Algorithm Distance matrix Initial star diagram 23

  24. a b d c Selection step Choose the nodes with the shortest distance and fuse them. 24

  25. a a e c,b d Next Step Then recalculate the distance between the rest of the remaining sequences (a and d) to the new node (e) and remove the fused nodes from the table. D (EA) = (D(AC)+ D(AB)-D(CB))/2 D (ED) = (D(DC)+ D(DB)-D(CB))/2

  26. Next Step a c Dce e d Dde b In order to get a tree, un-fuse c and b by calculating their distance to the new node (e)

  27. a Next… c Dce a,d e f Dde b

  28. a Final c a Daf e f Dce Dbf Dde b d D (EF) = (D(EA)+ D(ED)-D(AD))/2

  29. a c Dce a,d e e f c,b d Dde c a b Daf e f Dce Dbf Dde b 1 2 3 d 29

  30. IMPORTANT !!! • Usually we don’t start from a star diagram and in order to choose the nodes to fuse we have to calculate the relative distance matrix (Mij) representing the relative distance of each node to all other nodes

  31. EXAMPLE Original distance Matrix Relative Distance Matrix (Mij) The Mij Table is used only to choose the closest pairs not for calculating the distances

  32. Advantages and disadvantages of the neighbor-joining method • Advantages • -It is fast and thus suited for large datasets • -permits lineages with largely different branch lengths • Disadvantages • - sequence information is reduced • - gives only one possible tree

  33. More problems with phylogenetic trees • It is wrong to assume that branch length is proportional to speciation time (molecular clock). • It is wrong to produce a tree based on distance values of the whole alignment.

  34. Problems with phylogenetic trees

  35. Problems with phylogenetic trees Bacillus Bacillus Burkholderias Aeromonas Aeromonas Pseudomonas Pseudomonas Burkholderias Lechevaliera Lechevaliera E.coli E.coli Salmonella Salmonella Bacillus Pseudomonas Pseudomonas Aeromonas Burkholderias Burkholderias Aeromonas Bacillus Lechevaliera Lechevaliera E.coli E.coli Salmonella Salmonella

  36. Problems with phylogenetic trees • It is wrong to assume that branch length is proportional to speciation time (molecular clock). • It is wrong to produce a tree based on distance values of the whole alignment : using different regions from a same alignment may produce different trees. • What to do?: use bootstrap

  37. Boostraped tree less reliable none Highly reliable none • Bootstrapping is a methods for estimating generalization error based on • “resampling“. • In the context of phylogenetic trees, it consist in randomly selecting • different positions from an alignment and constructing a tree based on these • position. • As a result we get the % of times a certain node was formed.

  38. Tools for tree reconstruction • CLUSTALX (NJ method) • Phylip -PHYLogeny Inference Package • includes parsimony, distance matrix, and likelihood methods, including bootstrapping. • Phyml (maximum likelihood method) • More phylogeny programs

  39. 362

  40. http://www.phylogeny.fr

More Related