1 / 28

Phylogenetics Part II

Doug Raiford Lesson 9. Phylogenetics Part II. Review. Three Major Categories. 3 Approaches Distance Parsimony Maximum Likelihood Have already seen a distance method. UPGMA. Why need any other method?. What’s wrong with UPGMA? Let’s revisit the example

emery
Download Presentation

Phylogenetics Part II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Doug Raiford Lesson 9 Phylogenetics Part II

  2. Review • Three Major Categories • 3 Approaches • Distance • Parsimony • Maximum Likelihood • Have already seen a distance method • UPGMA Phylogenetics Part II

  3. Why need any other method? • What’s wrong with UPGMA? • Let’s revisit the example • Can this be? Doesn’t the derived tree imply that B is equidistant from C and D A B C D Phylogenetics Part II

  4. Molecular clock hypothesis • UPGMA averaged the two and put them both (branches for C and D) at 1.5 • What if don’t have equal rates of evolution after a divergence .5 .5 4 2.5 1 2 A B C D Phylogenetics Part II

  5. Very similar taxa • Differing rates of evolution can sometimes cause problems with UPGMA • Especially if very similar (small distances) This tree Yields this matrix Yields this tree 1 1 2 1 A B B C A C Phylogenetics Part II

  6. Next: maximum parsimony • Also called minimum evolution method • Definition of parsimony: 1 a : the quality of being careful with money or resources : thrift b : the quality or state of being stingy 2 : economy in the use of means to an end; especially : economy of explanation in conformity with Occam's razor • Ockham's razor: the simplest explanation is usually the best Phylogenetics Part II

  7. Approach • Looks at each column of an MSA and attempts to find a tree that describes • Builds a consensus tree atgccgca-actgccgcaggagatcaggactttcatgaatatcatcatgcgtggga-ttcag acctccatacgtgccccaggagatctggactttcacc---tggatcatgcgaccgtacctac t-atgg-t-cgtgccgcaggagatcaggactttca-gt--g-aatcatctgg-cgc--c-aa t--tcgt-ac-tgccccaggagatctggactttcaaa---ca-atcatgcgcc-g-tc-tat aattccgtacgtgccgcaggagatcaggactttcag-t--a-tatcatctgtc-ggc--tag Phylogenetics Part II

  8. Which tree? • What do we mean when we say “attempts to find a tree that describes” • Attempts to fit all possible trees in each column and choose best • How determine all possible trees? • How determine which one has the best fit? • Assume that majority nucleotide represents ancestor Total mutations that explain this tree = 1 Pretty darn good AGCT AACT AACT AACT A or a G 0 if A 0 if A A A or a G One possible tree 0 0 if A 1 if A 0 A A A G Phylogenetics Part II

  9. How determine all possible trees • When there are two organisms there is only one possible tree A B Phylogenetics Part II

  10. How determine all possible trees • What about when there are three • Third could go… A B Phylogenetics Part II

  11. What about 4? • For each of the previous 3 trees, could add 4th to any of its branches (or could form a new root) • Each of the possible trees had 4 branches so could add to one of 4 locations (or splice in at top) • So total number of trees with 4 leaves: • 3*5=15 If this were the tree A B Phylogenetics Part II

  12. Number of trees • Ni is number of trees given itaxa • Bi is the number of branches in a tree given itaxa • Bi=Bi-1+2, also i x 2-2 • Ni=Ni-1*(Bi-1+1) • plus 1 due to possible new root • N2= 1 • B2=2 Defined by a recurrence relation so … That’s right, as usual, exponential What does this growth rate look like? Phylogenetics Part II

  13. Can save some by going unrooted • Rooted vs. un-rooted • Wherever the root is, un-kink it Phylogenetics Part II

  14. Rules for un-rooted trees • Always bifurcated • Can never have 3 branches “from” a single node • What are the odds? A D B C Phylogenetics Part II

  15. With four nodes • Three possible trees A A A D B D B D C C C B Are there any other combinations? Phylogenetics Part II

  16. With 5 nodes • For each of the three trees (having 4 taxa) could add a branch to any of the 5 branches • 3*5=15 trees A D B C Phylogenetics Part II

  17. Rooting a tree • Outgroup • Include an organism that is known to be further away from all taxa than they are from each other A D B C If outgroup goes here… A B C D outgroup Phylogenetics Part II

  18. Number of un-rooted trees • Ni is number of trees given itaxa • Bi is the number of branches in a tree given itaxa • Bi=Bi-1+2, also i x 2-3 • Ni=Ni-1*(Bi-1) • No need for a “plus 1” for a possible new root because there are no roots • N2= 1 • B2=2 Phylogenetics Part II

  19. Very bright mathematicians • Noticed that for un-rooted trees: • Bi=2i-3 (for i 2) • Also noticed • Ni=Ni-1*Bi-1 • And reduced to • (2n-5)(2n-7)(2n-9)…(3)(1) where n is number of taxa • Shorthand: (2n-5)!! • For rooted • Ni=Ni-1*(Bi-1+1) • Reduced to • (2n-3)!! Ni=Bi-1*Ni-1 =(2(i-1)-3)Ni-1 =(2i-5)Ni-1 =(2i-5)(2i-7)Ni-2 Till the N term gets to 3 Double factorial: each successive number reduced by two Phylogenetics Part II

  20. Compare • Radical reduction in the number • Still only bought one additional taxa Phylogenetics Part II

  21. Rooted and unrooted • Even brighter mathematicians Can you see why? Phylogenetics Part II

  22. How can we reduce the complexity? • Not really a candidate for dynamic programming • Don’t repeat a bunch of sub-problems over and over • Each sub-problem is a tree, and they are all unique Still exponential Phylogenetics Part II

  23. Branch and bound (pruning) • Discard large subsets of possible solutions • Use heuristics or predictions Don’t bother Phylogenetics Part II

  24. Upper bound on tree length • Calculate a reasonable upper bound using a fast algorithm like UPGMA (hierarchical clustering) • Incrementally grow potential trees • Any branch that any that go over threshold stop investigating A D B C Don’t bother, over threshold X X X Phylogenetics Part II

  25. Back to max parsimony algorithm • Some columns all same • Add no meaning • All trees minimum • Columns that are all different • Also add no meaning • Must have minimum 2 nt’s (or aa’s) that are the same • Useful in one respect • If all the same infer makeup of ancestor AGCT AACT AACT ACCT A 0 A 0 A 0 0 0 0 A A A A Phylogenetics Part II

  26. Now consensus tree • Each column yields a tree • If all agree done • If some different use majority rule • If sample too small perform bootstrapping • randomly draw sequences from MSA • Generate more trees • labeled branches with the percentage of bootstrap trees in which they appear • Used as a measure of support (repeatability) Phylogenetics Part II

  27. What’s left? • Still have maximum likelihood • Also, some inferential stuff, but that’s all in the next lecture Stay tuned for part III Phylogenetics Part II

  28. Phylogenetics Part III

More Related