A New Top-down Algorithm for Tree Inclusion

A New Top-down Algorithmfor Tree Inclusion Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9

Outline • Motivation • Basic algorithm for tree inclusion problem - Definition - Algorithm description • Improvements • Summary

T: T: a d d a b c b e f e f Motivation Given two ordered labeled trees P and T, called the pattern and the target, respectively. An interesting problem is: Can we obtain pattern P by deleting some nodes from target T? That is, is there a sequence v1 , ..., vk of nodes such that for T0 = T and Ti+1 = delete(Ti, vi +1) for i = 0, ..., k - 1, we have Tk = P. If this is the case, we say, P is included in T, T contains P, or say, T covers P. delete(T, c)

s s np vp vp det n v np adv v n adv “The” “student” “reads” det adj n “again and again” “reads” “book” “the” “interesting” “book” Motivation • Linguistic analysis

a b a b b d e b b Tree inclusion algorithm • Definition Definition 1 Let F and G be labeled ordered forests. We define an ordered embedding (, G, F) as an injective function  : V(G) V(F) such that for all nodes v, u V(G), i) label(v) = label((v)); (label preservation condition) ii) v is an ancestor of u iff (v) is an ancestor of (u);(ancestor condition) iii) v is to the left of u iff (v) is to the left of (u); (Sibling condition) F: G:

Tree inclusion algorithm • Algorithm • Let T = <t; T1, ..., Tk> (k  1) be a tree and G = <P1 , ..., Pl> • (l  1) be a forest. We handle G as a tree P = <pv; P1, ..., Pl>, • where pv represents a virtual node, matching any node in T. • Consider a node in P with children v1, ..., vj. We use a pair <i, v> • (i j) to represent an ordered forest containing the first i subtrees • of v: <P[v1], ..., P[vi]>. Then, <j, pv> represents the first j trees • in G. P: v vi vk v1 … … <i, v>

Tree inclusion algorithm • Algorithm • In addition, h(v) represents the height of v in a tree; and (v) • represents a link from v in P to the leaf node on the left-most • path in P[v]. P: Let v’ be a leaf node in P. We denote by -1(v’) a set of nodes x such that for each v x (v) = v’. -1(v3) = {v1, v2, v3} (v1) v1 (v2) v2 v5 v4 v3

Tree inclusion algorithm • Algorithm The tree inclusion checking is done by calling two functions recursively: top-down(T, G), bottom-up(T’, G), where T is a tree, and T’ and G are two forests. T = <t; T1, ..., Tk> T’ = <T1’, ..., Tk’> G = <P1, ..., PL> Each of the two functions returns a pair <i, v> with v being pv or a node on the left-most path in P1.

T: t … … P2 Pl Tree inclusion algorithm • Function: top-down(T, G) In top-down(T, G), two cases will be handled. Case 1: G = <P1>; or G = <P1, ..., Pl> (l >1), but |T |  |P1| + |P2|. In this case, we try to find a pair <i, v> such that T contains the first i subtrees of v, where v = pv , or v -1(v’) and v’ is the leaf node on the left-most path in P1. pv T: G: t p1 P1 G: pv p1 |T|  |P1| + |P2|. P1

… … Pl Tree inclusion algorithm • Function: top-down(T, G) case 1: i) If t is a leaf node, we will check whether label(t) = label((p1)), where p1 is the root of P1. If it is the case, return <1, parent of (p1)>. Otherwise, return <0, parent of (p1)>. T = <t; T1, ..., Tk>: pv G: t P1 G: pv T = <t; T1, ..., Tk>: t |T |  |P1| + |P2|. P1 P2

p1 P11 P1i P1j … … … … Pl Tree inclusion algorithm • Function: top-down(T, G) case 1: • If |T| < |P1| or height(t) < height(p1), we will make a recursive call • top-down(T , <P11, ..., P1j>), where <P11, ..., P1j> be a forest of • the subtrees of p1. The return value of top-down(T , <P11, ..., P1j>) • is used as the return value of top-down(T, G) pv G: T: t |T | < |P1|

t p1 label(t) = label(p1) T1 P11 Ti Tk P1i P1j … … … … Tree inclusion algorithm • Function: top-down(T, G) case 1: • If |T|  |P1| (but |T |  |P1| + |P2|) and height(t)  height(p1), two cases • need to be considered: • label(t) = label(p1). Call bottom-up(<T1, ..., Tk>, <P11, ..., P1j>). • label(t)  label(p1). Call bottom-up(<T1, ..., Tk>, <P1>). t p1 label(t)  label(p1) T1 P11 Ti Tk P1i P1j … … … …

P1: p1 v Tree inclusion algorithm • Function: top-down(T, G) case 1: In both sub-cases, assume that the return value is <i, v>. A further checking needs to be conducted: • If label(t) = label(v) and i = the outdegree of v, the return value should • be <1, v’s parent>. • Otherwise, the return value is the same as <i, v>. label(t) =label(v) T: t or label(t)  label(v)

… … … … Tk Pl Tree inclusion algorithm • Function: top-down(T, G) Case 1: G = <P1>; or G = <P1, ..., Pl> (l >1), but |T |  |P1| + |P2|. Case 2: G = <P1, ..., Pl> (l >1), and |T| > |P1| + |P2|. In this case, we will call bottom-up(<T1, ..., Tk>, G). Assume that the return value is <i, v>. The following checkings will be continually conducted. T: G: t pv |T | > |P1| + |P2| T1 T2 P1 P2

… … Pl Pl Tree inclusion algorithm • Function: top-down(T, G) Case 2: G = <P1, ..., Pl> (l >1), and |T | > |P1| + |P2|. In this case, we will call bottom-up(<T1, ..., Tk>, G). Assume that the return value is <i, v>. The following checkings will be continually conducted. iv) If v = p1’s parent, the return value is the same as <i, v>. v) If v p1’s parent, check whether label(t) = label(v)) and i = the outdegree of v. If so, the return value will be changed to <1, v’s parent>. Otherwise, the return value remains <i, v>. v p1’s parent v = p1’s parent = pv G: pv pv … … v P1 P2 P1 P2 Pi

P1 T2 Ti Tk Pi Pq … … Tree inclusion algorithm • Function: bottom-up(T’, G) bottom-up(T’, G)is designed to handle the case that both T’ and G are forests. Let T’ = <T1, ..., Tk> and G = <P1, ..., Pq>. In bottom-up(T’, G), we will make a series of calls top-down(Tl, <Pjl, ..., Pq>), where l = 1, ..., k, j1 = 0, and j1j2 ... jhq (for some h k), controlled as follows. G: T’: T1 … … … top-down(Tl, <Pjl, ..., Pq>)

Tree inclusion algorithm • Function: bottom-up(T’, G) bottom-up(T’, G) is designed to handle the case that both T’ and G are forests. Let T’ = <T1, ..., Tk> and G = <P1, ..., Pq>. In bottom-up(T’, G), we will make a series of calls top-down(Tl, <Pjl, ..., Pq>), where l = 1, ..., k, j1 = 0, and j1j2 ... jhq (for some h k), controlled as follows. • Two index variables l, j are used to scan T1, ..., Tk and P1, ..., Pq, • respectively. • Let <il, vl>be the return value of top-down(Tl, <Pj, ..., Pq>). If vl = pj’s • parent, set j to be j + il - 1. Otherwise, j is not changed. Set l to be l + 1. • Go to (2). • 3. The loop terminates when all Tl’s or all Pj’s are examined.

P1 Ti T2 Tk Pi Pj Pq … … Tree inclusion algorithm • Function: bottom-up(T’, G) • If j > 0 when the loop terminates, bottom-up(T’, G) returns • <j, p1’s parent>. T1 … … …

Tree inclusion algorithm • Function: bottom-up(T’, G) • If j > 0 when the loop terminates, bottom-up(T’, G) returns • <j, p1’s parent>. • Otherwise, j = 0. In this case, we will continue to searching for a pair • <i, v> such that T’ contains the first i subtrees of v, where v -1(v’) and • v’ is the leaf node on the left-most path in P1, as described below. • Let <i1, v1>, <i2, v2>, ..., <ik, vk> be the respective return values of • top-down(T1, <P1, ..., Pq>), • top-down(T2, <P1, ..., Pq>), • ... ... • top-down(Tk, <P1, ..., Pq>). • Since j = 0, each vl-1(v’) (l = 1, ..., k). P1 v2 v1 … vk

T2 Tree inclusion algorithm • Function: bottom-up(T’, G) i) Let <i1, v1>, ..., <ik, vk> be the return values of top-down(T1, <P1, ..., Pq>), ..., top-down(Tk, <P1, ..., Pq>), respectively. Since j = 0, each vl-1(v’) (l = 1, ..., k). ii) If each il = 0, return <0, ,>, where  is considered to be a descendant of any node in G. Otherwise, find the first vg with children w1, ..., wh such that vg is not a descendant of any other vj, and ig > 0. Call bottom-up(<Tg+1, ..., Tk>, <P[wig+1], ..., P[wh]>). • Let <x, y> be its return value. Ify = vg,then thereturn value of • bottom-up(T’, G) is set to be <ig + x, vg>. • Otherwise, the return value is <ig, vg>. P1 T1 Tg Tg+1 Tk vg … … … … v1 vk ig

T2 Tree inclusion algorithm • Further improvements In the case j = 0: Let <i1, v1>, ..., <ik, vk> be the return values of top-down(T1, <P1, ..., Pq>), ..., top-down(Tk, <P1, ..., Pq>). We will find the first vg such that it is not a descendant of any other vj and ig > 0. Then, bottom-up(<Tg+1, ..., Tk>, <P[wig+1], ..., P[wh]>). is invoked. This shows that all the return values except <ig, vg> are not used in the subsequent computation. Thus, the work for looking for such values should be avoided. P1 T1 Tg Tg+1 Tk vg … … … … v1 vk

Tree inclusion algorithm • Further improvements Let <ij, vj> be the return value of top-down(Tj, <P1, ..., Pq>) such that ij > 0 and vj is p1 or a descendant of p1. Then, during the execution of top-down(Tj+1, <P1, ..., Pq>), once we have detected that it can only produce a return value <ij+1, vj+1> with vj+1 being a descendant of vj, we should stop the corresponding computation immediately since this return value will not be used in the subsequent searching. For this purpose, we rearrange top-down(Tj+1, <P1, ..., Pq>) to top-down(Tj+1, <P1, ..., Pq>, vj) with vj being used to transfer information, called a controlling-node. Assume that in the execution of top-down(Tj+1, <P1, ..., Pq>, vj), we have the following function calls: top-down(Tj+1,1, <P1, ..., Pq>, u1) returns <a1, u1>, top-down(Tj+1,2, <P1, ..., Pq>, u2) returns <a1, u2>, … … With all uj’s being a proper descendant of vj. Then the bottom-up function call with some ui as a controlling node should not be conducted. bottom-up(<Tj+1,i, ... >, <… …>, ui).

Summary • An efficient method for tree inclusion problem • - O|T|min{DP, |leaves(P)|})time and • - O(|T| + |P|) space • where DP – the height of P, and • Future work • - adapt the algorithm to a data stream environment • - adapt the algorithm to an indexing environment leaves(P) - set of the leaf nodes of P.

Thank you.

A New Top-down Algorithm for Tree Inclusion

A New Top-down Algorithm for Tree Inclusion

Presentation Transcript

Decision Tree Algorithm

Junction tree Algorithm

KLT, a new algorithm for SETI

A New Algorithm for 3D Isovist

Top Down Proteomics: A Datasource for PRO

TOP-DOWN !

Cut down a tree plant 2!

Linkage Tree Genetic Algorithm

Top 5 Tips for Inclusion

A new algorithm for bidirectional deconvolution

Prims Algorithm for finding a minimum spanning tree

SlimSS -tree: A New Tree Combined SS -tree With Slim-down Algorithm

Spanning Tree Algorithm

Junction Tree Algorithm

Chopping Down a Tree

Top-Down Clustering Method Based On TV-Tree

The Algorithm for Constructing Phylogenetic Tree

A New Algorithm for Evaluating Ordered Tree Pattern Queries

A New Top-down Algorithm for Tree Inclusion

A Linear-Space Top-down Algorithm for Tree Inclusion Problem

A New Algorithm for Evaluating Ordered Tree Pattern Queries

Decision Tree Algorithm