1 / 74

Analysis of Tree Edit Distance Algorithms Serge Dulucq and H é l è ne

Analysis of Tree Edit Distance Algorithms Serge Dulucq and H é l è ne. B89902009 黃鼎翔 B89902011 田知本 B89902045 巨彥霖. Outline. Introduction Edit Distance for Trees and Forests Cover Strategies. Introduction Edit Distance for Trees and Forests Cover Strategies. Motivation.

bluma
Download Presentation

Analysis of Tree Edit Distance Algorithms Serge Dulucq and H é l è ne

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Tree Edit Distance AlgorithmsSerge Dulucq and Hélène B89902009黃鼎翔 B89902011 田知本 B89902045 巨彥霖

  2. Outline • Introduction • Edit Distance for Trees and Forests • Cover Strategies

  3. Introduction • Edit Distance for Trees and Forests • Cover Strategies

  4. Motivation • One way of comparing two ordered trees is by measuring their edit distance • Application areas • Comparison of hierarchically structured data • Alignment of RNA secondary structures in computational biology • Two algorithms using dynamic programming • Zhang-Shasha • Klein

  5. Purpose • A general analysis of dynamic programming for edit distance algorithm • Study the complexity of those decompositions by counting the exact number of distinct recursive calls • Define a new edit distance algorithm for trees which improves original algorithms with respect to the number of recursive calls

  6. Introduction • Edit Distance for Trees and Forests • Cover Strategies

  7. Trees and forests 2 2 • A tree is a node (called the root) connected to an ordered sequence of disjoint trees • Such a sequence is called a forest • We write l(A1◦…◦An) for the tree composed of the node l connected to the sequence of trees A1, …, An ≠ 3 4 4 3 5 1 5 1 l ˙˙˙ A2 A1 An ˙˙˙

  8. F 1 10 2 4 7 8 9 • |F| denotes the number of nodes of the forest F • SF(F) is the set of all subforests of F • F(i), i is a node of F, denotes the subtree of F rooted at i • deg(i) is the degree of i, that is the number of children of i 3 5 6 |F| = 10 4 ∈ SF(F) 9 5 6 2 F(2) = 3 deg(4) = 2

  9. Edit distance • Let F and G be two forests. The edit distancebetween F and G, denoted d(F, G), is the minimal cost of edit operations needed to transform F into G • Operations • Substitution • Insertion • Deletion • Let Cs, Ci, Cddenote the costs of substitution, insertion, deletion

  10. Recursive relationship(1/3) • Strings • u, v are strings; x, y are alphabet symbols • d(xu, yv) = min{ Cd(x) + d(u, yv), Ci(y) + d(xu, v), Cs(x, y) + d(u, v) } • d(ux, vy) = min{ Cd(x) + d(u, vy), Ci(y) + d(ux, v), Cs(x, y) + d(u, v) } y y x u y v

  11. Recursive relationship(2/3) • Trees • l, l’ are roots; F, F’ are forests • d(l(F), l’(F’)) = min{ Cd(l) + d(F, l’(F’)), Ci(l’) + d(l(F), F’), Cs(l, l’) + d(F, F’) } l’ l l’ l’

  12. Recursive relationship(3/3) • Forests • T, T’ are forests • Left decomposition d(l(F)◦T, l’(F’)◦T’) = min{ Cd(l) + d(F◦T, l’(F’)◦T’), Ci(l’) + d(l(F)◦T, F’◦T’), d(l(F), l’(F’)) + d(T, T’) } • Right decomposition d(T◦l(F), T’◦l’(F’)) = min{ Cd(l) + d(T◦F, T’◦l’(F’)), Ci(l’) + d(T◦l(F), T’◦F’), d(l(F), l’(F’)) + d(T, T’) } • direction to indicate left or right

  13. Example Left decomposition 4 3 1 3 4 5 2 3 2 4 5 5 4 5 4 5 2 Right decomposition 3 4 5 1 3 2 4 5 4 3 5 2 4 5 2 4 5 4 5 4 2 2

  14. Strategy & Relevant forests • Let F and G be two forests. A strategyis a mapping fromSF(F)×SF(G) to {left, right} • Let (F, F’) be a pair of forests provided with a strategyφ.The set RFφ(F, F’) of relevant forestsis defined as the least subset of SF(F)×SF(F’) such that ifthe decomposition of (F, F’) meets the pair (G, G’), then (G, G’) belongs to RFφ(F, F’) • RFφ(F) and RFφ(F’) denote the projection of RFφ(F, F’) on SF(F) and SF(F’) • #relevant denote the number of relevant forests

  15. Proposition(1/2) • F=F’=Ø → RFφ(F, F’)=Ø • φ(F, F’)=left, F=l(G)◦T, F’=Ø → RFφ(F, F’) = {(F, F’)}∪RFφ(G◦T, F’) • φ(F, F’)=right, F=T◦l(G), F’=Ø → RFφ(F, F’) = {(F, F’)}∪RFφ(T◦G, F’) • φ(F, F’)=left, F=Ø, F=l’(G’)◦T’ → RFφ(F, F’) = {(F, F’)}∪RFφ(F, G’◦T’) d(l(G)◦T, l’(G’)◦T’) = min{ Cd(l) + d(G◦T, l’(G’)◦T’), Ci(l’) + d(l(G)◦T, G’◦T’), Cs(l(G), l’(G’)) + d(G◦T, G’◦T’) } d(T◦l(G), T’◦l’(G’)) = min{ Cd(l) + d(T◦G, T’◦l’(G’)), Ci(l’) + d(T◦l(G), T’◦G’), Cs(l(G), l’(G’)) + d(T◦G, T’◦G’) }

  16. Proposition(2/2) • φ(F, F’)=right, F=Ø, F’=T’◦l’(G’) →RFφ(F, F’) = {(F, F’)}∪RFφ(F, T’◦G’) • φ(F, F’)=left, F=l(G)◦T, F’=l’(G’)◦T’ → RFφ(F, F’) = {(F, F’)}∪ RFφ(G◦T, F’)∪ RFφ(F, G’◦T’)∪RFφ(l(G), l’(G’))∪RFφ(T, T’) • φ(F, F’)=right, F=T◦l(G), F’=T’◦l’(G’) → RFφ(F, F’) = {(F, F’)}∪ RFφ(T◦G, F’)∪ RFφ(F, T’◦G’)∪RFφ(l(G), l’(G’))∪RFφ(T, T’) d(l(G)◦T, l’(G’)◦T’) = min{ Cd(l) + d(G◦T, l’(G’)◦T’), Ci(l’) + d(l(G)◦T, G’◦T’), Cs(l(G), l’(G’)) + d(G◦T, G’◦T’) } d(T◦l(G), T’◦l’(G’)) = min{ Cd(l) + d(T◦G, T’◦l’(G’)), Ci(l’) + d(T◦l(G), T’◦G’), Cs(l(G), l’(G’)) + d(T◦G, T’◦G’) }

  17. Lemma 1 • Given a tree A=l(A1◦…◦An), for any strategy we have #relevant(A) ≥ |A| - |Ai|+ #relevant(A1) +…+ #relevant(An) where i∈[1…n] is such that the size of Aiis maximal

  18. Proof(1/2) Let F = A1◦…◦An ⇒ RF(A) = {A}∪RF(F) ⇒ #relevant(A) = 1 + #relevant(F) When n=1: F = A1, A=l(A1) ⇒ #relevant(A) = 1 + #relevant(A1) ≥ |A| - |A1| + #relevant(A1) When n>1: Suppose left, Let A1 = l(F1), T = A2◦…◦An RF(F) = {F}∪RF(A1)∪RF(T)∪RF(F1◦T) | RF(F1◦T) – (RF(F1)∪RF(T)) | ≥ min{|F1|, |T|} ⇒ #relevant(F) ≥ 1 + #relevant(A1) + #relevant(T) + min{|F1|, |T|} Let j∈[2…n] st |Aj| is maximal among |A2|, …, |An| ⇒ #relevant(F) ≥ 1 + #relevant(A1) +…+ #relevant(An) + |T| - |Aj| + min{|F1|, |T|}

  19. Take a look #relevant(A) ≥ |A| - |Ai| + #relevant(A1) +…+ #relevant(An) ⇒ #relevant(F) ≥|F| + |Ai| + #relevant(A1) +…+ #relevant(An) #relevant(F) ≥1 + |T| - |Aj| + min{|F1|, |T|} + #relevant(A1) +…+ #relevant(An)

  20. Proof(2/2) 1 + |T| - |Aj| + min{|F1|, |T|} ≥ |F| - |Ai| 1) If |F1| ≤ |T| ⇒ 1 + |T| + min{|F1|, |T|} = |F| Since |Aj| ≤ |Ai| ∴1 + |T| - |Aj| + min{|F1|, |T|} = |F| - |Aj| ≥ |F| - |Ai| 2) If |F1| > |T| ⇒ |F| - |Ai| = |T| (∵i=1) ∴1 + |T| - |Aj| + min{|F1|, |T|} = 1 + |T| + |T| - |Aj| ≥ 1 + |T| > |F| - |Ai| ∴ #relevant(F) ≥ |F| - |Ai| + #relevant(A1) +…+ #relevant(An) ⇒ #relevant(A) ≥ |A| - |Ai| + #relevant(A1) +…+ #relevant(An)

  21. Lemma 2 • For every nature number n, there exists a tree A of size n such that for any strategy, #relevant(A) has a lower bound in O(n logn) • For complete balanced binary tree Tn of size n, prove by induction on n that #relevant(Tn) ≥ (n+1)log2(n+1)/2

  22. Introduction • Edit Distance for Trees and Forests • Cover Strategies

  23. Idea • Suppose the direction is left RF(l(F)◦T) = {l(F)◦T}∪RF(l(F))∪RF(F◦T)∪RF(T) • Since T⊆F◦T,We want to eliminate in priority nodes of F in F◦T, such that RF(F◦T) and RF(T) share relevant forests as most as possible!

  24. Cover • Let F be a forest. A cover r of F is a mapping from F to F∪{left, right}satisfying for each node i in F • if deg(i) = 0 or 1, then r(i)∈{left, right} • if deg(i) > 1, then r(i) is a child of i 2 2 4 3 4 3 1 1 left, right

  25. Cover strategy • Given a pair of trees (A, B) and a cover r for A, we associate a unique strategyφas follows. • if deg(i) = 0 or 1, then φ(A(i), G) = r(i), for each forest G in B • If A(i) is of the form l(A1◦…◦An) with n > 1, then let p∈{1, …, n} such that the favorite child r(i) is the root of Ap. For each forest G of B, we define • φ(A(i), G) = right whenever p = 1, left otherwise • φ(T◦Ap◦…◦An, G) = left, for each forest T of A1◦…◦Ap-1 • φ(Ap◦T, G) = right, for each forest T of Ap+1◦…◦An • The tree A is called the cover tree. A strategy is a cover strategy if there exists a cover tree associated to it

  26. φ(A(i), G) = right whenever p = 1, left otherwise • φ(T◦Ap◦…◦An, G) = left, for each forest T of A1◦…◦Ap-1 • φ(Ap◦T, G) = right, for each forest T of Ap+1◦…◦An i A(i) G A2 A1 A4 A3

  27. Some Tasks • The order of our Tasks • 研究Tree A … • 研究Tree B … • 將 Tree A & Tree B的研究資料做結合 • 求得# distinct pairs (recursively)

  28. 研究 Tree A …

  29. Tree A • Focus on relevant(A) (detail) • Cover strategies in A • A將牽引著B 走

  30. Lemma 3 • (F(i), G(j))∈RF(F,G) 1 j F 1 i G This is trivial

  31. Lemma 4 RF(l(F)◦T) = {l(F) ◦T, F1 ◦T, ….. ,Fk◦T}∪RF(l(F))∪RF(T) 這是幹什麼的呢? Term : k = |F| : F所有node的個數 Fk+1 為 Fk 作left decomposition 而得到 的forest , so F1 , F2 , …… , Fk 是由一 連串的left decomposition 所產生的 forests. 目標 : 利用cover strategy 為 φ(l(F) ◦ T) = left 看看是否可以減少recursive的次數?

  32. RF(l(F)◦T) T F Since cover strategy, the direction is left T T F F RF(l(F)) RF(T) RF(F◦T) RF(l(F)◦T) = {l(F) ◦T} ∪RF(l(F)) ∪ RF(T) ∪RF(F◦T)

  33. RF(F◦T) Continue…….. T F Since cover strategy, the direction is left T T F1 F1 ∈RF(l(F)) RF(T)

  34. T So ……. F T T F F {F1 ◦T , ….. , Fk◦T}

  35. Conclusion RF(l(F)◦T) = {l(F) ◦T, F1 ◦T, ….. ,Fk◦T}∪RF(l(F))∪RF(T)

  36. Lemma 5 • #relevant(A) = |A| - |Aj| + #relevant(A1) + #relevant(A2) +… + #relevant(An) Term : A = l(A1 ◦A2 ◦ … ◦ An). Aj 為 A的favorite child. 目標 : 算出一個cover tree的relevant forests的個數

  37. A l … … A1 An Aj Aj 為A的 favorite child j∈[1…n]

  38. Part 1 : |A| - |Aj| Note : Φ(A(i), G) = right whenever p = 1, left otherwise Φ(T◦Ap◦…◦An, G) = left, for each forest T of A1◦…◦Ap-1 Φ(Ap◦T, G) = right, for each forest T of Ap+1◦…◦An 說明 :由於Aj 為 A的 favorite child , 所以|A| - |Aj| 相當於在算{A} ∪ {所有包含Aj的 forests} 的 個數 Aj

  39. Part 2: #relevant(A1) + #relevant(A2) + … + #relevant(An) Note : RF(A1◦A2◦A3◦A4◦... ◦An) ={A1◦A2◦A3◦A4◦... ◦An} ∪RF(F1◦A2◦A3◦A4◦... ◦An)∪RF(A1)∪RF(A2◦A3◦A4◦... ◦An ) A1 A2 A3 A4 An …..

  40. Conclusion • #relevant(A) = • |A| - |Aj| + #relevant(A1) + #relevant(A2) + • … + #relevant(An)

  41. free node • 什麼是free node? • 不是獨生子 • 不是父母最愛的孩子 • Definition • the root of A • the node whose parent is of degree grater than 1 and is not the favorite child favorite child free node

  42. 研究 Tree B…

  43. Tree B • B 是被 A 牽引著走 • So no any cover strategy • Focus on following three things: • Rightmost forests • Leftmost forests • Special forests

  44. Three Things (1) Rightmost ∪ leftmost = special? NO! • Definition • Rightmost forests 由 B 開始,做一連串的 left decomposition到結束,產生的所有 subforests • Leftmost forests 由 B 開始,做一連串的 right decomposition到結束,產生的所有 subforests • special forests 由 B 開始,做一連串的 left or right decomposition到結束,產生的所有 subforests

  45. B 1 2 3 4 5 6 7 2 3 4 2 5 6 7 3 4 3 4 4 4 5 6 6 7 5 6 5 6 7 7 7 5 6 7 5 6 example Left decomposition all rightmost forests of B

  46. Three Things (2) • Three categories • relevant forests of A fall within three categories • (α) those are compared with all rightmost forests of B • (β) those are compared with all leftmost forests of B • (γ) those are compared with all special forests of B why?

  47. Three Things (3) • The of rightmost , leftmost , special forests ( ) • #right(B) = ∑(|B(i)|,i∈B) - ∑(|B(i)|,i is a rightmost child) • #left(B) = ∑(|B(i)|,i∈B) - ∑(|B(i)|,i is a leftmost child) • #special(B) = |B|(|B|+3) / 2 - ∑(|B(i)|,i∈B) number #right#left#special

  48. 說明 #right(B) , #left(B) • Rightmost forests – all cover strategies are that “favorite child is rightmost child” because of all left decomposition • Leftmost forests – all cover strategies are that “favorite child is leftmost child” because of all right decomposition #right(B) =∑(|B(i)|,i∈B) - ∑(|B(i)|,i is a rightmost child) #right(B) = |B| - |B右| + #right(B1) + … + #right(Bn) recursively #left(B) =∑(|B(i)|,i∈B) - ∑(|B(i)|,i is a leftmost child) #left(B) = |B| - |B左| + #left(B1) + … + #left(Bn) recursively #relevant(B) = |B| - |Bj| + #relevant(B1) + … + #relevant(Bn) Review

  49. comparison • two types (對於A) • Tree’s comparison • free node • favorite child • Forests’ comparison

More Related