1 / 69

Phylogenetic Tree Construction

Phylogenetic Tree Construction. Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia Columbia, MO 65211-2060 E-mail: xudong@missouri.edu 573-882-7064 (O) http://digbio.missouri.edu. Outline. Evolution theory

corbin
Download Presentation

Phylogenetic Tree Construction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phylogenetic Tree Construction Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia Columbia, MO 65211-2060 E-mail: xudong@missouri.edu 573-882-7064 (O) http://digbio.missouri.edu

  2. Outline • Evolution theory • Concept of phylogeny • Molecular clock • Types of trees • UPGMA • Parsimony • Maximum likelihood • An example for bird flu

  3. Evolution • Many theories of evolution • Basic idea: • speciation events lead to creation of different species • Any two species share a (possibly distant) common ancestor

  4. Evolutionary Events • Extinction: A new node u is created at the end of a lineage, no new lineage is started from u • Speciation: A new node u is created at the end of a lineage, and two new lineages are started from u • Hybridization: A new node u is created • when two lineages combine (diploid or polyploid) • when one lineage creates u and the new lineage from u has double the number of homologs (auto-polyploid)

  5. Tree of Life http://tolweb.org/

  6. Toxonomy • Glycine max • Taxonomy ID: 3847Genbank common name: soybeanRank: speciesGenetic code: Translation table 1 (Standard)Mitochondrial genetic code: Translation table 1 (Standard)Other names:common name:soybeans • Lineage( full ) • cellular organisms; Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids; eurosids I; Fabales; Fabaceae; Papilionoideae; Phaseoleae; Glycine

  7. Kingdom Plantae • Evolutionary tree of plants • From primitive more advanced traits __________ moncot Gymnosperms _______ Non-vascular Dicot  Greenalga ancestor Flowers Vascular

  8. Monocot vs. dicot plants (1)

  9. Monocot vs. dicot plants (2) • Number of cotyledons: one vs. two

  10. Monocot vs. dicot plants (3) • Leaf venation pattern: • Monocot is parallel • Dicot is net pattern

  11. Monocot vs. dicot plants (4) • Flower parts: • Monocot: in groups of three • Dicot: in groups of four or five

  12. Outline • Evolution theory • Concept of phylogeny • Molecular clock • Types of trees • UPGMA • Parsimony • Maximum likelihood • An example for bird flu

  13. Phylogenies (1) • A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species Aardvark Bison Chimp Dog Elephant

  14. Phylogenies (2) • Leafs - current day species • Nodes - hypothetical most recent common ancestors • Edges length - “time” from one speciation to the next

  15. Primate Evolution

  16. Tree Terminology d a b c leaf { a,b } edge internal node cluster { a,b,c } root { a,b,c,d }

  17. Rooted/Unrooted Tree • Rooted trees • Single common ancestor • Requires more information • Unrooted trees • Objects are leaves • Internal nodes are some common ancestors • Insufficient information to tell whether not not a given internal node is a common ancestor of any 2 leaves

  18. Motivation • Understand the lineage of different species • Organizing principle to sort species into a taxonomy • Understand how various functions evolved • Understand forces and constraints on evolution • Perform multiple sequence alignment • Predict gene function (phylogenetic footprint)

  19. Tree Basis • Phylogenies are reconstructed based on comparisons between present-day objects • Two main aspects • Topology • How its interior nodes connect to one another and to the leaves • Distance • An estimate of the evolutionary distance between the nodes

  20. Assumptions • homology reflects common ancestry • single common ancestor • treelike relationship exists • positional homology • independent processes • no reversals or convergence • molecular clock

  21. Outline • Evolution theory • Concept of phylogeny • Molecular clock • Types of trees • UPGMA • Parsimony • Maximum likelihood • An example for bird flu

  22. Molecular Clock Theory (1) • For any given protein, accepted mutations in the amino acid sequence for the protein occur at constant rate • Accepted = mutations that allow protein to function without death • Implication # of accepted mutations proportional to length of time interval i.e. relatively constant rateof accepted mutations within a protein

  23. Molecular Clock Theory (2) • Rate of accepted mutations maybe different for different proteins (depending on their tolerance for mutations) • Different parts of a protein may evolve at different rates • Thus, if A and B differ by k accepted mutations, then roughly k/2 mutations have occurred since divergence

  24. Molecular clock Science vol. 289

  25. Outline • Evolution theory • Concept of phylogeny • Molecular clock • Types of trees • UPGMA • Parsimony • Maximum likelihood • An example for bird flu

  26. Species/Gene Trees (1) • Species tree (how are my species related?) • contains only one representative from each species • when did speciation take place? • all nodes indicate speciation events • Gene tree (how are my genes related?) • normally contains a number of genes from a single species • nodes relate either to speciation or gene duplication events

  27. Species/Gene Trees (2) • Your sequence data may not have the same phylogenetic history as the species from which they were isolated • Different genes evolve at different speeds, and there is always the possibility of horizontal gene transfer (hybridization, vector mediated DNA movement, or direct uptake of DNA).

  28. Morphological vs. Molecular • Classical phylogenetic analysis: morphological features • number of legs, lengths of legs, etc. • Modern biological methods allow to use molecular features • Gene sequences • Protein sequences

  29. Dangers in Molecular Phylogenies Gene/protein sequence can be homologous for different reasons: • Orthologs -- sequences diverged after a speciation event • Paralogs -- sequences diverged after a duplication event • Xenologs -- sequences diverged after a horizontal transfer (e.g., by virus)

  30. Ultrametric trees (1) • A metric on a set of objects O given by the assignment of a real number d(x,y) to every pair x,y in O

  31. Ultrametric trees (2) An ultrametric has to fulfill the additional requirement An ultrametric tree is characterized by the three point condition

  32. Additive Trees • Generalization of ultrametric trees • # of mutations were assumed to be proportional to temporal distance of a node to ancestor • Also assumed, mutations took place at same rate in all branches • Additive trees model different rates of mutation along different branches

  33. Additivity • In “real” tree, distances between species are the sum of distances between intermediate nodes k c b j m a i c =

  34. Phylogeny Construction • parsimony methods: fewest changes • likelihood methods: maximize the probability • distance methods: based on pairwise evolutionary distances (sequence similarity, nucleotide composition, etc.)

  35. Outline • Evolution theory • Concept of phylogeny • Molecular clock • Types of trees • UPGMA • Parsimony • Maximum likelihood • An example for bird flu

  36. UPGMA • UPGMA is the unweighted pair group method with arithmetic mean • Distance matrix can come from (e.g) DNA-DNA hybridization, or be constructed from sequence data etc. • Iteratively group the most closely related groups. The average distance between elements in two groups is the distance between the groups.

  37. UPGMA Procedure • find closest pair of units (species, to start with) • connect this pair, defining an evolutionary unit (branch) • compute distances from the ancestor of this unit to all other ungrouped units --Branch length is distance/2 • go back to #1 and repeat

  38. Evolutionary distances among primates (1) nucleotide substitutions per 100 sites H C Humans and chimps are closest: lump them and recompute distances

  39. Evolutionary distances among primates (2) • e.g., (H-C) to gorilla distance • = (H-G+C-G)/2 • = (1.51+1.57)/2 = 1.54 • Gorilla is closest to H-C clade • (((H, C), 1.45), G, 1.54) G H C

  40. Evolutionary distances among primates (3) R O G H C  Human-Chimp-Gorilla is closer to Orang than to Rhesus

  41. UPGMA Clustering • Let Ci and Cj be clusters, define distance between them to be • When we combine two cluster, Ci and Cj, to form a new cluster Ck, then

  42. UPGMA: conclusions • UPGMA gives branch lengths or evolutionary distances as well as branching order • if(a big if) mutations occur at a constant rate, we can estimate dates of divergence from sequence differences

  43. Outline • Evolution theory • Concept of phylogeny • Molecular clock • Types of trees • UPGMA • Parsimony • Maximum likelihood • An example for bird flu

  44. Possible Evolutionary Tree (1) t1 t2 t1 t1 t3 t2 t4 t4 t4 t3 t2 t3 t1 1 three-taxa tree t2 1*(2*3-3) = 3 four-taxa trees t3

  45. Possible Evolutionary Tree (2)

  46. Possible Evolutionary Tree (3) Taxa (n): 4 2 3 Taxa (n) Unrooted/rooted 2 1/1 3 1/3 4 3/15

  47. Maximum parsimony (1) • Minimizes the number of steps required to generate the observed variation in the sequences • Guaranteed to find the "best" tree - danger of over-fitting the data • Columns representing greater variation dominate • Works best for small, highly conserved sequences

  48. Maximum parsimony (2) • Begin with a multiple sequence alignment • Identify informative sites within the sequences • Tree requiring smallest number of changes identified • Repeat over all informative sites • Length = sum of the # of steps in each branch • Choose tree with smallest length

  49. Maximum parsimony (3)

More Related