1 / 39

Tree searching

Tree searching. Kai Müller. Tree searching: exhaustive search. branch addition algorithm. Branch and bound. L min =L (random tree) „search tree“ as in branch addition at each level, if L < L min  go back one level to try another path

aquila
Download Presentation

Tree searching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tree searching Kai Müller

  2. Tree searching: exhaustive search • branch addition algorithm

  3. Branch and bound • Lmin=L(random tree) • „search tree“ as in branch addition • at each level, if L < Lmin go back one level to try another path • if at last level, Lmin=L and go back to first level unless all paths have been tried already

  4. Heuristicsearches best • stepwise addition • as branch addition, but on each level only the path that follows the shortest tree at this level is searched

  5. Star decomposition

  6. Branchswapping NNI: nearest neighbour interchanges SPR: subtree pruning and regrafting TBR: tree bisection and reconnection

  7. Tree inference with many terminals • general problem of getting trapped in local optima • searches under parsimony: parsimony ratchet • searches under likelihood: estimation of • substitution model parameters • branch lengths • topology

  8. Parsimonyratchet • generate start tree • TBR on this and the original matrix • perturbe characters by randomly upweighting 5-25%. TBR on best tree found under 2). Go to 2) [200+ times] • once more TBR on current best tree & original matrix • get best trees from those collected in steps 2) and 4)

  9. Bootstrapping • estimates properties of an estimator (such as its variance) by constructing a number of resamples of the observed dataset (and of equal size to the observed dataset), each of which is obtained by random sampling with replacement from the original dataset

  10. Bootstrapping • variants • FWR (Frequencies within replicates) • SC (strict consensus)

  11. Bootstrapping

  12. Bremer support / decay • Bremer support (decay analysis) is the number of extra steps needed to "collapse" a branch. • searches under reverse constraints: keep trees only that do NOT contain a given node • Takes longer than bootstrapping: parsimony ratchet beneficial (~20 iterations)

  13. Homoplasie-Indices • Consistency Index CI = m/s. • m = die kleinste theoretisch mögliche Schrittzahl die das Merkmal auf einem Baum zeigen könnte • s = Anzahl an tatsächlichen Schritten, die ein Merkmal auf einem gegebenen Baum zeigt • Merkmale ohne Homoplasie haben also einen CI von 1. • Sobald „überschüssige“ Schritte nötig werden, also z.B. s = 3, steigt der Homoplasiegehalt und erniedrigt sich der CI, etwa auf 1/3 = 0.33.

  14. Homoplasie-Indices (2) • Ensemble Consistency Index • Der Ensemble Consistency Index ist dann 1, wenn alle Merkmale nicht homoplastisch sind, also alle perfekt auf den Baum passen. • Nachteile des CI • Parsimonie-uninformative Merkmale tragen immer einen CI von 1 bei und erhöhen so den summarischen CI künstlich. • Andererseits kann der CI nie 0 werden. Gerade das wäre aber eine wünschenswerte Eigenschaft für eine Skala aller denkbaren Homoplasiegrade, die idealerweise von 0 bis 1 reichen sollte. • Drittens wird der CI bei erhöhter Taxonanzahl kleiner, auch wenn sich nichts Wesentliches an dem Informationsgehalt im Datensatz ändert

  15. Homoplasie-Indices (3) • Retention Index (RI) • Wenn g die größtmögliche Schrittzahl eines Merkmals auf jedem denkbaren Baum ist (die auf einem völlig unaufgelösten „Besen“), dann ist RI = (g-s)/(g-m)

  16. Homoplasie-Indices (4)

  17. Overview: treebuilding methods

  18. Data types: discretecharacters vs. distances

  19. Distance methods • observed number vs. actual number of substitutions

  20. Distance methods • observed number vs. actual number of substitutions

  21. Types of substitutions • transitions/transversions • synonymous/non-synonymous

  22. Distance correction correction

  23. Substitution models • p-distance:uncorrected • substitutionmodels • characterizedbysubstitutionprobabilitymatrices:

  24. Substitution models • Jukes-Cantor • oldest (1969), simplest • nucleotide frequencies all identical • nucleotide substitutions all equally likely

  25. P(t) • JC69: • probability of a substitution after time t if mean instant. subst. rate = 10^-8 per site per year

  26. Distances • simple considerations & rearrangements of Pij(t) show that the JC-corrected distance when observing a fraction P of differing nucleotides is

  27. K2P • Kimura 2-parameter model • 2 different nucleotide substitution types • transitions • transversions • nucleotide frequencies all identical

  28. More models • Felsenstein (1981), F81: • 1 nucleotide substitution type, 4 base frequencies • HKY85 • 2 different nucleotide substitution types, 4 base frequencies • GTR • 6 different nucleotide substitution types, 4 base frequencies

  29. Heterogeneityamongsites

  30. Amongsite rate variationmodelled via gammadistribution

  31. Hierarchicalrelationshipsamongcommonmodels

  32. Amino acid models

  33. Codon models • GY94, MG94 • 61 x 61 matrix (stop codons ignored) = frequency of codon j = transition/transversion ratio = ratio nonsynonymous/synonymous

  34. Models getting more "realistic" • example: covarion models • DNA sites change between „on“ and „off“ states: changes allowed vs. forbidden. • transition rates s01s10, kappa= proportion of „on“:

  35. Additivityofdistances

  36. Additivityofdistances • condition: triangle-inequality • four-point-condition

  37. Correcteddistancesarerarelytree additive! • two approaches try to find the tree that minimizes the error e when fitting the distances on it: • both are tree search-, 2-step methods • least-squares-fit criterion: general: goodness of fit methods • minimum evolution • length L of sum of all branches

  38. Clusteringmethods • 1-step, algorithmic methods • UPGMA • condition of an ultrametrictree

  39. Clustering methods • neighbor joining • star decomposition d(pair members  new) node: d(other taxa new node):

More Related