1 / 35

A Study on Measuring Distance between Two Trees

A Study on Measuring Distance between Two Trees. Advisor: 阮夙姿 教授 Presenter : 林陳輝. Outline. Introduction Problem definition Related work The metric and algorithms Mixture distance Basic algorithm The modified algorithm Mixture - matching distance Mixture - matching distance

wboykin
Download Presentation

A Study on Measuring Distance between Two Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Study on Measuring Distance between Two Trees Advisor: 阮夙姿 教授 Presenter : 林陳輝

  2. Outline • Introduction • Problem definition • Related work • The metric and algorithms • Mixture distance • Basic algorithm • The modified algorithm • Mixture - matching distance • Mixture - matching distance • Conclusions and Future work CSIE, National Chi Nan University

  3. Introduction • Evolutionary tree • Comparing trees • Comparing trees is not easy -Phylogenetic tree, wikipedia CSIE, National Chi Nan University

  4. Mixture tree Time taxa S.-C. Chen and B. G. Lindsay, “Building Mixture Trees from Binary Sequence Data,” Biometrika, 2006. CSIE, National Chi Nan University

  5. Problem definition • The leaves are associating taxas • There is a time parameter on every internal node 11 v1 9 v2 8 v3 v5 7 v7 3 1 v4 5 v6 H G F A B C D E CSIE, National Chi Nan University

  6. Outline • Introduction • Problem definition • Related work • The metric and algorithms • Mixture distance • Basic algorithm • The modified algorithm • Mixture - matching distance • Mixture - matching distance • Conclusions and Future work CSIE, National Chi Nan University

  7. Related work • Path difference metricdp(T1, T2) = ||d(T1) – d(T2)||2d(Ti) is a vector that contains all pair leaves distance of Ti. • M. A. Steel and D. Penny, “Distributions of Tree Comparison Metrics – Some New Results,” Syst. Biol. 42(2):126-141, 1993. CSIE, National Chi Nan University

  8. Related work • Nodal metric • In full binary trees, the complexity is O(n3). • In complete binary trees, the complexity is O(n2log n). • John Bluis and Dong-Guk Shin, “Nodal Distance Algorithm: Calculating a Phylogenetic Tree Comparison Metric,” Proc. of the 3rd IEEE Symposium on BioInformatics and BioEngineering, 87- 94, 2003 CSIE, National Chi Nan University

  9. Related work • Matching distance • P. W. Diaconis and S. P. Holmes, “Matchings and Phylogenetic Trees.," Proc. Natl Acad Sci U S A, Vol. 95, No. 25, pp. 14600~14602, 1998. • The algorithm for matching distance • G. Valiente, A Fast Algorithmic Technique for Comparing Large Phylogenetic Trees," SPIRE, pp. 370~375, 2005. CSIE, National Chi Nan University

  10. Matching Representation 0 11 9 0 10 0 0 8 7 0 3 4 5 6 1 2 {1,2} {5,6} {3,7} {4,8} {9,10} CSIE, National Chi Nan University

  11. {1,2} {5,6} {3,7} {4,8} {9,10} {1,3} {4,6} {2,7} {5,8} {9,10} Matching distance 11 11 T2 T1 10 10 9 9 8 8 7 7 3 4 2 5 5 6 4 6 1 2 1 3 T1 T2 The distance is 2 CSIE, National Chi Nan University

  12. Outline • Introduction • Problem definition • Related work • The metric and algorithms • Mixture distance • Basic algorithm • The modified algorithm • Mixture - matching distance • Mixture - matching distance • Conclusion and Future work CSIE, National Chi Nan University

  13. Mixture distance and algorithms • Definition: • pTi (x, y) is time parameter of the LCA of leaves x, y 9 v1 9 v1 3 2 3 v3 1 v3 v2 v2 A D C B A B C D CSIE, National Chi Nan University

  14. Distance conditions • The distance from an object to itself is zero. • The distance from A to B is the same as the distance from B to A. • The Triangle Inequality holds true. - J. Felsenstein, Inferring phylogenies. Sunderland, MA: Sinauer Associates, 2004. CSIE, National Chi Nan University

  15. AB: |8 – 1| = 7 AC: |8 – 9| = 1 AD: |8 – 9| = 1 BC: |4 – 9| = 5 BD: |4 – 9| = 5 CD: |1 – 3| = 2 Distance = 21 Algorithm • C(n, 2) • Algorithmic idea: grouping • Full binary tree 8 v1 9 v1 v2 4 3 1 v2 v3 1 v3 B A C D A B C D CSIE, National Chi Nan University

  16. Algorithm T2 T1 9 v1 9 v1 6 8 v2 v3 7 v2 8 v3 5 3 v7 1 4 5 v7 3 v4 v5 2 v4 v5 4 v6 v6 H E A B G D F C H G F A B C E D CSIE, National Chi Nan University

  17. 9 v1 T1 |pT1(v1) - pT2(v7)| × (0 × 0+1 × 1) = |9 - 5| × (0*0+1*1) = 4 v3 |pT1(v1)- pT2(v6)| × (1 × 1+0 × 0) = |9 - 4| × (1*1+0*0) = 5 7 v2 8 |pT1(v1)- pT2(v3)| × (1 × 1+1 × 1) = |9 - 8| × (1*1+1*1) = 2 5 v7 3 2 v4 v5 4 v6 H G F A B C E D T2 v1 9 Red:2 Green:2 6 8 v2 v3 Red:1Green:1 Red:1 Green:1 5 3 1 4 v4 v7 v5 v6 H A G D E F C B Red:0 Green:1 Red:1 Green:0 Red:0 Green:1 Red:1 Green:0 CSIE, National Chi Nan University

  18. T1 |pT1(v2)- pT2(v2)| × (2 × 0 + 0 × 0) = |7 - 6| × (2 × 0 + 0 × 0) = 0 9 v1 |pT1(v2)- pT2(v3)| × (0 × 1 + 0 × 1) = |7 - 8| × (0 × 1 + 0 × 1) = 0 |pT1(v2)- pT2(v1)| × (2 × 2 + 0 × 0) = |7 - 9| × (2 × 2 + 0 × 0) = 8 7 v2 8 v3 5 v7 3 2 v5 4 v4 v6 T2 H G F A B C E D Red:2Green:2 9 v1 Red:2Green:0 6 8 v2 v3 Red:0Green:2 5 3 1 4 v7 v4 v5 v6 B H A G C D F E Red:1 Green:0 Red:1 Green:0 Red:0Green:1 Red:0 Green:1 Red:0 Green:0 Red:0 Green:0 CSIE, National Chi Nan University

  19. Complexity analysis • For every internal node of T1, coloring all leaves needs O(n). • Counting distance in T2 needs O(n). • The time complexity is O(n2). CSIE, National Chi Nan University

  20. The modified algorithm • Boost up the basic algorithm • Too much empty color information CSIE, National Chi Nan University

  21. T1 |pT1(v2)- pT2(v2)| × (2 × 0 + 0 × 0) = |7 - 6| × (2 × 0 + 0 × 0) = 0 9 v1 |pT1(v2)- pT2(v3)| × (0 × 1 + 0 × 1) = |7 - 8| × (0 × 1 + 0 × 1) = 0 |pT1(v2)- pT2(v1)| × (2 × 2 + 0 × 0) = |7 - 9| × (2 × 2 + 0 × 0) = 8 7 v2 8 v3 5 v7 3 2 v5 4 v4 v6 T2 H G F A B C E D Red:2Green:2 9 v1 Red:2Green:0 Empty color information 6 8 v2 v3 Red:0Green:2 5 3 1 4 v7 v4 v5 v6 B H A G C D F E Red:1 Green:0 Red:1 Green:0 Red:0Green:1 Red:0 Green:1 Red:0 Green:0 Red:0 Green:0 CSIE, National Chi Nan University

  22. T2 9 v1 6 8 v2 v3 5 3 1 4 v7 v4 v5 v6 B H A G C D F E T2 9 v1 8 v3 1 v4 B A C D CSIE, National Chi Nan University

  23. The modified algorithm • Finding LCA in constant time with O(n) preprocessing • MA Bender, MIF Colton, The LCA Problem Revisited, Proc. LATIN, 2000 • 2-way merge problem • R.C.T. Lee, S. S. Tseng, R.C. Chang and Y. T. Tsai, Introduction to the Design and Analysis of Algorithms. McGraw-Hill Education, 2005 CSIE, National Chi Nan University

  24. T2 15 T1 9 v1 9 v1 14 7 7 8 v2 v3 6 v2 8 v3 13 10 3 6 5 3 v7 2 4 5 v7 3 v4 v5 1 v4 v5 4 v6 v6 H E A B G D F C H G F A B C E D 5 8 1 2 4 5 8 9 11 12 11 12 9 4 1 2 CSIE, National Chi Nan University

  25. 15 T1 3 v4 |1 – 2|  (1 1 + 0 0) = 1 9 v1 1 14 7 v2 6 8 v3 13 10 3 6 5 v7 3 1 v4 v5 4 v6 2 1 H G F A B C D E 1 2 4 5 8 9 11 12 T2 9 v1 7 8 v2 v3 4, 9 1, 2 11, 12 5,8 5 3 v7 2 4 v4 v5 v6 H E A B G D F C 1 2 12 5 11 8 9 4 CSIE, National Chi Nan University

  26. 15 T1 9 v1 15 9 14 v1 7 6 v2 8 v3 13 10 3 6 13 3 5 v7 3 1 v4 v5 4 5 1 v6 v7 v4 H G F A B C D E 1 2 4 5 8 9 11 12 T2 H A B G 1 2 11 12 1, 2, 4, 5, 8, 9, 11, 12 9 v1 1, 2, 11, 12 4, 5, 8, 9 |9 – 7| (2 2 – 0 0) = 8 7 8 v2 v3 4, 9 1, 2 11, 12 5,8 5 3 v7 2 4 v4 v5 v6 C E F H A B G D 8 9 4 5 11 12 1 2 CSIE, National Chi Nan University

  27. Complexity analysis • To reconstruct subtree of T1 is in linear time • Counting distance in reconstructed subtreeneeds O(m). • The height of complete binary tree is O(logn) • The total complexity is O(nlogn) in complete binary tree. CSIE, National Chi Nan University

  28. Outline • Introduction • Problem definition • Related works • The metric and algorithms • Mixture distance • Basic algorithm • The modified algorithm • Mixture - matching distance • Mixture - matching distance • Conclusions and Future work CSIE, National Chi Nan University

  29. Mixture-matching distance • Distance = • i is matching distance between T1 and T2. • PTm denotes the product of all time parameter in Tm CSIE, National Chi Nan University

  30. T1 T2 9 15 9 15 7 8 6 8 14 14 13 13 5 3 2 4 5 3 1 4 10 9 9 11 12 12 10 11 H E A B G D F C G H D E F B C A 8 8 5 6 6 7 1 2 4 7 4 5 1 3 2 3 T1 {1, 2} {3, 4} {5, 6} {7, 8} {9,10} {11, 12} {13, 14} {1, 2} {3, 6} {4, 5} {7, 8} {9,12} {10, 11} {13, 14} T2 Distance = 1 - (25920 / 60480) + 2 ≒ 2.571 CSIE, National Chi Nan University

  31. Distance = Distance = 1 - (25920 / 60480) + 2 ≒ 2.571 1 0 i ∞ Distance No different leaves i transposition The same The time complexity is O(n) CSIE, National Chi Nan University

  32. Outline • Introduction • Problem definition • Related works • The metric and algorithms • Mixture distance • Basic algorithm • The modified algorithm • Mixture - matching distance • Mixture - matching distance • Conclusions and Future work CSIE, National Chi Nan University

  33. Conclusions CSIE, National Chi Nan University

  34. Future work • Improve the time complexity • Extend to k - ary trees • Add mutation point CSIE, National Chi Nan University

  35. Thanks for Your Listening.

More Related