Tree Inclusion, Signatures, and Evaluation of Path-Oriented Queries Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg, Canada. Motivation Path-Oriented Queries and Tree Inclusion Problem Evaluation of Path-Oriented Queries - Top-down Algorithm for Tree Inclusion
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Tree Inclusion, Signatures, and Evaluation of
Path-Oriented Queries
Dr. Yangjun ChenDept. Applied Computer Science, University of Winnipeg, Canada
Path-Oriented Queries and Tree Inclusion Problem
Path-Oriented Queries and Tree Inclusion Problem
Path-Oriented Queries and Tree Inclusion Problem
T:
P:
Hotel-room-reservation
Hotel-room-reservation
name
location
reservation
name
location
type
address
?x
City-or-district
price
from
to
rooms
country
address
state
City-or-
district
Travel-lodge
Post-
code
number
number
street
street
one-bed-
room
April 20,
2005
April 28,
2005
Winnipeg
Winnipeg
$119.00
Manitoba
Canada
R3B 2E9
515
Portage Ave.
515
Portage Ave.
Path-Oriented Queries and Tree Inclusion Problem
Path-Oriented Queries and Tree Inclusion Problem
Evaluation of Path-Oriented Queries
Evaluation of Path-Oriented Queries
- Top-down Algorithm
Case 1: root(G) pv (i.e., G = <P> is a tree and root(G) = p), and
label(p) label(t). If G is embedded in T, then there must exist a subtree Ti of
t such that it contains the whole G. The algorithm should return 1 if an
embedding can be found and 0 if it cannot.
label(root(T)) label(root(G))
G:
T:
Ti
Tree G is included in Ti.
Evaluation of Path-Oriented Queries
- Top-down Algorithm
Case 2: root(G) pv (i.e., G = <P> and root(G) = p), and label(p) = label(t).
Let <P1, ..., Pl> (l 0) be the forest of subtrees of pand <T1, ..., Tk> the forest
of subtrees of t. If G is embedded in T, there must exist two sequences of
integers: k1, ..., kg and l1, ..., lg (g l) such that includes < , ..., >
(i = 1, ..., g, l0 = 0, lg = l), where < , ..., > represents a forest containing
subtrees , ..., and . Thus, if lg = l, the algorithm should return 1 since we
have a root preserving inclusion of G in T. Otherwise, it should return 0.
label(root(T)) = label(root(G))
G:
T:
p
t
= Pl
Tk
P1
T1
…
…
…
…
…
…
include
include
Evaluation of Path-Oriented Queries
- Top-down Algorithm
Case 2: root(G) = pv and there exists an integer j (0 j q) such that
<P1, ..., Pj> is included in T. If j = q, then the whole G is embedded in T.
There are two possibilities to be considered when looking for j. The first
possibility is similar to Case 2, where there are two sequences of integers:
k1, ..., kg and l1, ..., lg (g q) that represent the order, in which the subtrees
of root(G) are embedded in the subtrees of root(T). In thiscase, j = lg.
If j = 0, we will check the second possibility to see whether there exists a
root preserving inclusion of P1 in T, i.e., label(p1) = label(t) and the subtrees
of p1 are included in the subtrees of t. In this case, j = 1.
qv(virtual node)
T:
G:
t
= Pl
Tk
P1
T1
…
…
…
…
…
…
include
include
Evaluation of Path-Oriented Queries
- Top-down Algorithm
possibility 2:
label(root(T)) = label(root(P1))
qv(virtual node)
G:
T:
t
= Pl
Tk
P1
T1
…
…
…
…
…
…
include
Evaluation of Path-Oriented Queries
- Top-down Algorithm
functiontop-down-process(T, G)
input: T = <t; T1, ..., Tk>, G = <p; P1, ..., Pq>
(*p may or may not be a virtual node.*)
output: if root(G) is virtual, returns j 0;
else returns 1 if T includes G; otherwise returns 0.
begin
1. ifroot(G) is virtual then
2. if (|T| < |P1| + |P2| or p has only one child)
3. thenG := P1;
4. else {j := bottom-up-process(T, G);
5. if (j = 0 and label(t) = label(P1’s root))
(*second possibility in Case 3*)
6. then {change P1’s root to a virtual node;
x := bottom-up-process(T, P1);
7. if (x = the number of the children of P1’s root)
thenj := 1 else j := 0;}
8. return j;}}
9. if |T| < |G| return 0;
10. else {if (label(t) = label(p)) (*handling Case 2*)
11. then {p := virtual node;
functionbottom-up-process(T, G)
input: T = <t; T1, ..., Tk>, G = <p; P1, ..., Pq>
output: j - an integer
begin
1. j := 0; i := 1;
2. while (j < q and i k) do
3. { x := top-down-process(Ti, G);
4. j := j + x; G := <p; Pj+1, ..., Pq>; i := i + 1; }
end
Integration of Signatures into Top-down Inclusion
Integration of Signatures into Top-down Inclusion
Definition A signature for a key word or an attribute value is
hash-coded bit string.
- Important parameters:
m: number of 1s in bit string
F: length of bit string
D: size of a block (or average number of the key words of an element)
optimal choice of the parameters:
Fln2 =mD(1)
S. Christodoulakis and C. Faloutsos, “Design consideration for a message
file server,” IEEE Trans. Software Engineering, 10(2) (1984) 201-210.
0011
0001
0010
1010
1100
a:
b:
c:
d:
e:
f:
0000
1000
0101
1000
1000
0000
T:
a
e
b
t0
t1
t2
t1
t2
t0
e
c
d
f
t22
t12
t22
t21
t11
t11
t12
t21
Integration of Signatures into Top-down Inclusion
- Assigning signatures to tree nodes
Let v be a node in a tree T. If v is a leaf node, its signature svis equal
to the signature assigned to its label. Otherwise, sv= s v1 ... vn, where
s represents the signature for the label associated with v, and s1, ... ,
and snare the signatures of v’s children: v1, ..., vn, respectively.
T:
a
1111 1101
e
b
1111 1101
1111 1000
f
e
c
d
1100 0000
0001 0101
0010 1000
1010 1000
t0
t1
p0
t2
t21
p1
t22
t11
p2
t12
e
1111 1101
c
d
0010 1000
0001 0101
Integration of Signatures into Top-down Inclusion
virtual
node
T:
P:
This subtree will
not be explored.
a
1111 1101
a
b
1111 1000
0011 1101
e
c
d
f
0010 1000
1100 0000
1010 1000
0001 0101
Integration of Signatures into Top-down Inclusion
Integration of Signatures into Top-down Inclusion
Evaluation of Path-Oriented Queries
Evaluation of Path-Oriented Queries
In the figure, F stands for the initial length of the signatures and m for
the initial number of bits set to 1.
- Tested queries
Group I - for testing path length impact
- Tested queries
Group III - for testing impact of matching at higher level
(1, <1, 45>, 0) ...
hotel-room-reservation
(1, <2, 4>, 1) ...
name
(1, <5, 28>, 2) ...
location
... ...
...
... ...
T-index:
(1, 3, 2) ...
Travel-lodge
(1, 7, 3) ...
Winnipeg
(1, 10, 3) ...
Manitoba
... ...
...
... ...
Experiment Results
Example:
*
+
+
*
*
*
+
+
TIS
TIA
IEW
12
IPW
IPW
+
+
+
TIS
TIS
TIA
TIA
Execution time (sec.)
6
•
•
•
*
*
Q1
Q2
Q3
Q4
Q5
Results of Group III
Experiment Results
- Tested results
2
1000
IPW
TIS
TIA
Execution time (sec.)
Execution time (sec.)
1
100
•
•
•
•
+
*
+
*
*
*
+
+
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Results of Group I
Results of Group II
12
+
+
+
Execution time (sec.)
6
•
•
•
*
*
Q1
Q2
Q3
Q4
Q5
Results of Group IV
*
+
IPW
TIS
TIA
Experiment Results
- Tested results
12
+
+
+
Execution time (sec.)
6
•
•
•
*
*
Q1
Q2
Q3
Q4
Q5
Results of Group V