1 / 20

Distances

Distances. Correction for multiple changes. Distance Calculations. A 5 page excerpt from the book “Molecular Systematics” is on the course web page, as a PDF file DNA distances are more easily analyzed Only 4 letter alphabet More directly affected by mutation. Underlying mechanisms

toby
Download Presentation

Distances

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distances Correction for multiple changes

  2. Distance Calculations • A 5 page excerpt from the book “Molecular Systematics” is on the course web page, as a PDF file • DNA distances are more easily analyzed • Only 4 letter alphabet • More directly affected by mutation

  3. Underlying mechanisms • Jukes-Cantor model • Assumes all substitutions are equally probable • Uncorrected distance D = 1-proportion unchanged • Corrected distance = • Felsenstein,

  4. Tree Searching Using Optimality Criteria Maximum Parsimony and Maximum Likelihood Methods

  5. Searching for the Best Tree • A score for each tree can be evaluated using an objective function that uses the multiple alignment as a fixed parameter and varies the tree topology and branch lengths • Goal is to find the tree with the optimum score, which is defined as the “best” tree

  6. Computational Problem • for small ntaxa, can evaluate the score for all trees and then pick the tree that gives the best score • as ntaxa increases full evaluation rapidly becomes impossible, number of trees and complexity of calculation for each tree both increase, so heuristics must be used

  7. Distance Based • Can try to minimize the total tree length (Minimum Evolution = ME) by varying the internal branch lengths • This is a calculation that has to be performed for each tree topology, it is not an algorithm for constructing the tree

  8. Maximum Parsimony • character based, not distance based • for a given tree all the character states of each homologous character can be reconstructed with some minimum number of changes on any given tree • if you sum the number of changes over all characters, you get tree length

  9. you want to find the tree with the lowest score. This is called a Maximum Parsimony tree because it is based on the idea that the explanation that requires the fewest changes is the best • no analytical approach for this process, so you need algorithm that will a) evaluate tree length as fast as possible and b) search the tree-space with a high likelihood of evaluating the shortest tree

  10. Three Options • Exhaustive - simply evaluates the length of every tree, therefore guaranteed to find the shortest tree(s) • Branch and Bound - searches tree space, but stops constructing a family of trees once length exceeds a pre-existing minimum, guaranteed to find shortest tree

  11. Heuristic - constructs an approximately shortest tree, then does a series or rearrangements, evaluating length in each case, selecting the shortest tree from among the rearrangements, and iterating until a shorter tree is not found • usually works well, but certain data sets will give an incorrect answer

  12. Informative Sites • sites at which at least two character states appear at least twice • reason - single appearance of any character state is most parsimoniously explained as a change at the end of the graph

  13. Example • consider a four taxon set of data, three possible trees, one character • Taxon 1 = G • Taxon 2 = A • Taxon 3 = A • Taxon 4 = G

  14. Work Through on Board

  15. Homoplasy • when you are considering more than one character, they may not all be consistent with the same tree • principle of maximum parsimony says that you pick the tree with the lowest number of homoplasies, multiple independent origins of a character state

  16. Add to Worked Example

  17. More Complex Trees • there is an algorithm for finding the lowest score attributable to any distribution of character states on any bifurcating tree • trace back from terminal taxa to each node, define the nodal state as the intersection set of the two descendants, unless the intersection is null, in which case, define as the union

  18. each time a union is required, that adds to the score, because one descendant of the union must have changed • Repeat the process going from scored nodes to unscored nodes • for each tree, perform the same analysis for all characters and sum the scores; that number is the tree score • the tree with the shortest score is most parsimonious

  19. Heuristic Search • For exhaustive search or branch and bound the search algorithm covers all possible trees • For heuristic search need to define a non-exhaustive search algorithm • Most commonly used is tree bisection-reconnection (TBR)

  20. Can bisect any tree at any of the branches, creating two sub-trees, then reconnect by joining any pair of branches from each tree • If all the trees that are generated by a cycle of TBR are not shorter than the parent tree, then the parent tree is accepted as the shortest tree • If one of the TBR-generated trees is shorter, then it is taken as the next candidate shortest tree, and is in turn subjected to a round of TBR analysis

More Related