1 / 20

Statistical Evaluation in a Parsimony Framework

Statistical Evaluation in a Parsimony Framework. The logic of looking at tree length. The more consistency there is among characters the more we tend to believe our results If all the characters have the same signal the optimal tree is shorter

nolen
Download Presentation

Statistical Evaluation in a Parsimony Framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Evaluation in a Parsimony Framework

  2. The logic of looking at tree length • The more consistency there is among characters the more we tend to believe our results • If all the characters have the same signal the optimal tree is shorter • We need to compare observed tree length with what we would expect for random (non-phylogenetic) data

  3. How can we remove phylogenetic signal? • Permute the data so that agreement among characters only arises by chance (Archie, 1989; Faith 1991) The permutation tail probability (PTP) test

  4. Permuting data removes phylogenetic signal Taxon 1 ACATTTA Taxon 2 ACGATTA Taxon 3 AGGATAG Taxon 4 GAAAAC? Taxon 5 GATA?CG Permuted data sets Taxon 1 GAAA?AA Taxon 2 ACAATC? Taxon 3 GAGTATG Taxon 4 AGTATCG Taxon 5 ACGATTA

  5. Example with signal Number of Tree length replicates ------------------------- 1222* 1 1669 1 1671 1 1672 1 1673 1 1674 1 1675 2 1676 2 1678 1 1679 2 1680 4 1681 5 1682 8 1683 4 1684 4 1685 2 Number of Tree length replicates ------------------------- 1686 8 1687 7 1688 6 1689 8 1690 6 1691 3 1692 2 1693 3 1694 3 1695 3 1696 3 1697 2 1699 2 1702 1 1704 2 1705 1

  6. Example without signal Number of Tree length replicates ------------------------- 1924 3 1926 1 1927 4 1928 1 1929 2 1930 8 1931 6 1932 5 1933 4 1934 4 1935 5 1936 1 1937 8 1938* 11 1939 7 Number of Tree length replicates ------------------------- 1940 6 1941 7 1942 4 1943 2 1944 1 1945 1 1946 1 1947 1 1950 3 1952 1 1953 1 1955 1 1958 1

  7. The PTP test is slow • Hillis and Huelsenbeck (1991) observed a difference between the shape of the tree length distribution as a function of phylogenetic signal

  8. A data set without signal mean=599.182107 sd=4.944738 g1=-0.150922 582.00000 /------------------------------------------------------------------------ 583.80000 | (5) 585.60000 |# (25) 587.40000 |### (71) 589.20000 |######### (209) 591.00000 |####### (161) 592.80000 |####################### (521) 594.60000 |####################################### (883) 596.40000 |################################################## (1132) 598.20000 |################################################################# (1469) 600.00000 |################################### (788) 601.80000 |######################################################################## (1631) 603.60000 |################################################################## (1486) 605.40000 |############################################## (1047) 607.20000 |######################### (567) 609.00000 |####### (157) 610.80000 |######## (171) 612.60000 |### (57) 614.40000 | (11) 616.20000 | (3) 618.00000 | (1) \------------------------------------------------------------------------

  9. A data set with signal mean=611.572872 sd=31.049455 g1=-0.942643 501.00000 /------------------------------------------------------------------------ 508.65000 |# (15) 516.30000 |## (60) 523.95000 |### (84) 531.60000 |##### (135) 539.25000 |# (21) 546.90000 |# (26) 554.55000 |### (96) 562.20000 |###### (166) 569.85000 |########## (290) 577.50000 |########################## (737) 585.15000 |######################################## (1118) 592.80000 |######################## (665) 600.45000 |#### (120) 608.10000 |########## (268) 615.75000 |################## (497) 623.40000 |############################ (796) 631.05000 |############################################### (1337) 638.70000 |######################################################################## (2031) 646.35000 |######################################################### (1610) 654.00000 |########### (323) \------------------------------------------------------------------------

  10. Skewness test • Hillis and Huelsenbeck (1991) generated random data for different numbers of taxa/characters to find the null distribution of g1 scores • One can compare observed g1 statistics with this null distribution

  11. Tests for phylogenetic signal (g1 and PTP) • Are sensitive to any signal in the data • For example • g1 of permuted data = -0.04 (ns) • Duplicate one taxon and g1 = -1.56** • Useful for identifying truly useless data (very rare)

  12. What aspects of a tree should I believe? • Clade support measures: bootstrap/decay • Statistical tests of alternative hypotheses

  13. C C C C C C C C G G G G G G G G T T T T T T T T T C C T C C C C C C C C C C C C BootstrapCreate “pseudoreplicate” data sets One TACATAAACAAGCCTAAAATGCGACACTACGTTCACTGTTACGCTCTCCACTGCCTAGACGAAGAAGCTTCA Two TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGCCTAGACGAAGACGCTTCA Three TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTACGCTCTTCACTGCCTAGACGAGGATGCCTCG Four TACATAAATAAGCCAAAAATGCGACACTACGTTCATTGTTACGCACTCCATTGCCTCGACGAAGAAGCTTCA Five TACATAAACAAACCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGTCTAGACGAAGACGCTTCG Six TACATAAACAAGCCCAAGATGCGTCACTACGTCCACTGCTACGCCCTCCACTGTCTCGACGAGGAGGCCTCG Seven TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCA Eight TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCA ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. One Two Three Four Five Six Seven Eight

  14. Boostrapping - what actually happens One TACATAAACAAGCCTAAAATGCGACACTACGTTCACTGTTACGCTCTCCACTGCCTAGACGAAGAAGCTTCA Two TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGCCTAGACGAAGACGCTTCA Three TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTACGCTCTTCACTGCCTAGACGAGGATGCCTCG Four TACATAAATAAGCCAAAAATGCGACACTACGTTCATTGTTACGCACTCCATTGCCTCGACGAAGAAGCTTCA Five TACATAAACAAACCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGTCTAGACGAAGACGCTTCG Six TACATAAACAAGCCCAAGATGCGTCACTACGTCCACTGCTACGCCCTCCACTGTCTCGACGAGGAGGCCTCG Seven TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCA Eight TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCA Weight 011203010200101112122010030011200201011100010111040002201000100020011001 Sum of weights = the original # characters

  15. Bootstrap procedure • Find the optimal trees for many “pseudoreplicates” • Bootstrap percentage = percentage of time a clade is found • Clades with high bootstrap support are better supported

  16. Bootstrap percentage • Xi = Proportion of the optimal trees for pseudoreplicate i that have clade? • BS of X after N pseudoreplicates ∑i=NXi i=1 N

  17. How should bootstrap numbers be interpreted? • Controversial (more later when we discuss Bayesian phylogenetics) • For now: • high (especially over 90%) indicates strong support for a clade • Low (especially below 70%) indicates weak support

  18. Bootstrap Polytomy A A 50 What we get 90 B B 55 C C D D E E 99 F F 95 G G 95 H H 45 I What we believe I

  19. Another commonly used method: the decay index • How many steps longer is the shortest tree that lacks the clade? • Find the shortest tree and record its length (=Lopt) • Find the shortest tree under the constraint that it lack the clade and record its length (=Lcon) • The decay index is: Lopt- Lcon

  20. Evaluating the statistical significance of a decay index • Topological PTP test (T-PTP) • What is the decay index of the clade in permuted data sets? (assumes that the clade was hypothesized a priori) • Templeton test (Wilcoxon sign-rank test) • Is the best constrained tree significantly longer than the optimal tree? • Parametric bootstrap (SOWH test) • Repeatedly simulate evolution up the best constrained tree. How often do we get a decay index as big or bigger than observed?

More Related