A new algorithm for evaluating ordered tree pattern queries
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

A New Algorithm for Evaluating Ordered Tree Pattern Queries PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on
  • Presentation posted in: General

A New Algorithm for Evaluating Ordered Tree Pattern Queries. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9. Outline. Motivation Algorithm for tree pattern query evaluation based on ordered tree matching

Download Presentation

A New Algorithm for Evaluating Ordered Tree Pattern Queries

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A new algorithm for evaluating ordered tree pattern queries

A New Algorithm for Evaluating Ordered Tree Pattern Queries

Yangjun Chen

Dept. Applied Computer Science,

University of Winnipeg

515 Portage Ave.

Winnipeg, Manitoba, Canada R3B 2E9


Outline

Outline

  • Motivation

  • Algorithm for tree pattern query evaluation based on ordered tree matching

    -Tree encoding

    -Algorithm description

  • Experiment results

  • Summary


A new algorithm for evaluating ordered tree pattern queries

a tree pattern query

XML documents

Motivation

  • Efficient method to evaluate XPath expression queries – XML query processing


A new algorithm for evaluating ordered tree pattern queries

<Purchase>

<Seller>

<Name>dell</Name>

<Item>

<Manufacturer>IBM</Manufacturer>

<Name>part#1</Name>

<Item>

<Manufacturer>Intel</Manufacturer>

</Item>

</Item>

<Item>

<Name>Part#2</Name>

</Item>

<Location>Houston</Location>

</Seller>

<Buyer>

<Location>Winnipeg</Location>

<Name>Y-Chen</Name>

</Buyer>

</Purchase>

P

S

B

N

I

I

L

L

N

Houston

Winnipeg

Y-Chen

Dell

M

N

I

N

IBM

Part#1

Part#2

M

Intel

Motivation

Document:


A new algorithm for evaluating ordered tree pattern queries

P

S

B

N

I

I

L

L

N

Houston

Winnipeg

Y-Chen

Dell

M

N

I

N

IBM

Part#1

Part#2

M

Intel

Motivation

Document:

Query – XPath expressions:

Q1: /Purchase[Seller[Loc=‘Boston’]]/

Buyer[Loc = ‘New York’

Purchase

Buyer

Seller

Location

Location

‘Winnipeg’

‘Houston’

Q2: /Purchase//Item[Manufacturer = ‘Intel’]

Buyer

d-edge: ancestor-

descendant relationship

Item

Manufacturer

c-edge: parent-child

relationship

‘Intel’


A new algorithm for evaluating ordered tree pattern queries

a

b

b

c

d

c

e

d

book

title

author

Art of Programming

fn

ln

Knuth

Donald

Motivation

  • XPath evaluation against XML documents

-XPath expression

a[b[c and .//d]]/b[c and e//d]

book[title = ‘Art of Programming’]//author[fn = ‘Donald’ and

ln = ‘Knuth’]

<document>

<book>

<title>

Art of Programming

</title>

<author>

<fn>Donald Knuth</fn>

… …


A new algorithm for evaluating ordered tree pattern queries

a

c

c

a

b

b

c

b

b

Motivation

  • XPath evaluation against XML documents

-Evaluation based on unordered tree matching:

Definition An embedding of a tree pattern Q into an XML document T is a mapping f: Q  T, from the nodes of Q to the nodes of T, which satisfies the following conditions:

(i)Preserve node label: For each u  Q, label(u) matches label(f(u)).

(ii)Preserve parent-child/ancestor-descendant relationships: If uv in Q, then f(v) is a child of f(u) in T; if uv in Q, then f(v) is a descendant of f(u) in T.

T:

Q:

q3

v6

q1

q2

v4

v5

v1

v3

v2


A new algorithm for evaluating ordered tree pattern queries

s

s

np

vp

vp

det

n

v

np

adv

v

n

adv

“The”

“student”

“reads”

det

adj

n

“carefully”

“reads”

“book”

“the”

“interesting”

“book”

Motivation

  • XPath evaluation against XML documents

-Evaluation based on ordered tree matching

XPath expression:

s/vp[v = “reads” and /following-sibling::n = “book” and /following

sibling::adv]


A new algorithm for evaluating ordered tree pattern queries

a

c

c

a

b

b

c

b

b

Motivation

  • XPath evaluation against XML documents

-Evaluation based on ordered tree matching:

Definition An embedding of a tree pattern Q into an XML document T is a mapping f: Q  T, from the nodes of Q to the nodes of T, which satisfies the following conditions:

(i)Preserve node label: For each u  Q, label(u) matches label(f(u)).

(ii)Preserve parent-child/ancestor-descendant relationships: If uv in Q, then f(v) is a child of f(u) in T; if uv in Q, then f(v) is a descendant of f(u) in T.

(iii)Preserve sibling order: For any two nodes v1 Q and v2 Q, if v1is to the left of v2, then f(v1)is to the left of f(v2) in T.

T:

Q:

q3

v6

q1

q2

v4

v5

v1

v3

v2


A new algorithm for evaluating ordered tree pattern queries

T:

(1, 1, 11, 1)

A v1

B v8

(1, 2, 9, 2)

B v2

(1, 10, 10, 2)

v3 C

(1, 3, 3, 3)

B v4

(1, 4, 8, 3)

v5 C

v6 C

D v7

(1, 7, 7, 4)

(1, 5, 5, 4)

(1, 6, 6, 4)

Algorithm for query evaluation

  • Tree encoding

Let T be a document tree. We associate each node v in T with a quadruple (DocId, LeftPos, RightPos, LevelNum), where DocId is the document identifier; LeftPos and RightPos are generated by counting word numbers from the beginning of the document until the start and end of the element, respectively; and LevelNum is the nesting depth of the element in the document.

1

<A>

<B>

<C>

</C>

<B>

<C>

</C>

<C>

</C>

<D>

</D>

</B>

</B>

<B>

</B>

</A>

2

3

3

4

5

5

6

6

7

7

8

9

10

10

11


A new algorithm for evaluating ordered tree pattern queries

  • Tree encoding

Let T be a document tree. We associate each node v in T with a quadruple (DocId, LeftPos, RightPos, LevelNum), denoted as a(v), where DocId is the document identifier; LeftPos and RightPos are generated by counting word numbers from the beginning of the document until the start and end of the element, respectively; and LevelNum is the nesting depth of the element in the document.

(i)ancestor-descendant: a node v1 associated with (d1, l1, r1, ln1) is an ancestor of another node v2 with (d2, l2, r2, ln2) iff d1= d2, l1< l2, and r1> r2.

(ii)parent-child: a node v1 associated with (d1, l1, r1, ln1) is the parent of another node v2 with (d2, l2, r2, ln2) iff d1= d2, l1< l2, r1> r2, and ln2 = ln1 + 1.

(iii)from left to right: a node v1 associated with (d1, l1, r1, ln1) is to the left of another node v2 with (d2, l2, r2, ln2) iff d1= d2, r1< l2.


A new algorithm for evaluating ordered tree pattern queries

T:

(1, 1, 11, 1)

A v1

B v8

(1, 2, 9, 2)

B v2

(1, 10, 10, 2)

v3 C

(1, 3, 3, 3)

B v4

(1, 4, 8, 3)

v5 C

v6 C

D v7

(1, 7, 7, 4)

(1, 5, 5, 4)

(1, 6, 6, 4)

Algorithm for query evaluation

  • Tree encoding

1

<A>

<B>

<C>

</C>

<B>

<C>

</C>

<C>

</C>

<D>

</D>

</B>

</B>

<B>

</B>

</A>

2

3

3

4

5

5

6

6

7

7

8

9

10

Data streams:

10

sorted by LeftPos values

11

A:(1, 1, 11, 1)

B:(1, 2, 9, 2)(1, 4, 8, 3), (1, 10, 10, 2)

C:(1, 3, 3, 3)(1, 5, 5, 4), (1, 6, 6, 4)

D:(1, 7, 7, 4)


A new algorithm for evaluating ordered tree pattern queries

A q1

L(q1)

Q:

B q5

q2 B

L(q2)

L(q5)

q3 C

C q4

L(q4)

L(q3)

Algorithm for query evaluation

  • Algorithm description

  • Our algorithm works bottom-up. Therefore, we need to sort XML

  • streams by (DocID, RightPos) values.

  • Each time a query Q is submitted to the system, we will associate

  • each query node q with a data stream L(q) such that for

  • each vL(q) label(v) = label(q), in which each query node is

  • attached with a list of matching nodes of the document tree.

T:

Q:

sorted by RightPos values

{v1}

L(q1 ) =(1, 1, 11, 1) -

{v4, v2, v8}

L(q2 ) = L(q5)=(1, 4, 8, 3),(1, 2, 9, 2)(1, 10, 10, 2) -

{v3, v5, v6}

L(q3) = L(q4) = (1, 3, 3, 3)(1, 5, 5, 4), (1, 6, 6, 4) -


A new algorithm for evaluating ordered tree pattern queries

Algorithm for query evaluation

  • Algorithm description

1.First, we will number the nodes of Q in postorder. So the nodes in

Q will be referenced by their postorder numbers. Additionally,

we set a virtual node for Q, numbered 0, which is considered to be

to the left of any node in Q.

  • For each node q of Q, a link from it to the left-most leaf node in

  • Q[q], denoted by (q), is established. For a leaf node q’, (q’) = q’.

Q :

(q1)

A q1

A q1

5

3

(q2)

4

0

q2 B

B q5

q2 B

B q5

virtual node

2

1

q3 C

C q4

C q4

q3 C


A new algorithm for evaluating ordered tree pattern queries

Algorithm for query evaluation

  • Algorithm description

  • Let q’ be a leaf node in Q. We denote by -1(q’) a set of nodes x

  • such that for each q x (q) = q’.

(q1)

-1(q3) = {q1, q2 , q3}

A q1

-1(q4) = {q4}

(q2)

q2 B

B q5

-1(q5) = {q5}

C q4

q3 C


A new algorithm for evaluating ordered tree pattern queries

Algorithm for query evaluation

  • Algorithm description

  • Each node v in T is associate it with an array Av of length |Q|,

  • indexed from 0 to |Q| - 1. In Av, each entry is a query node or ,

  • defined below:

If there is a least leaf q’

larger than q such that

-1(q’) contains at least

one node x with Q[x]

being embedded in T[v];

Max{x | x -1(q’)  T[v] embeds Q[x]},

Av[q] =

Otherwise.

Av:

Q:

x1

Av[q]

q

xj

q’


A new algorithm for evaluating ordered tree pattern queries

Algorithm for query evaluation

  • Algorithm description

Av:

Q:

x1

Av[q]

q

xj

q’

  • X1 is the largest ancestor of q’ such that T[v] contains Q[X1].

  • q’ is the closest leaf node to the right of q.

In this way, both the subtree embedding and the left-to-right

ordering can be recorded.


A new algorithm for evaluating ordered tree pattern queries

01234

01234

1

Av3:

Av3:

Algorithm for query evaluation

  • Setting values in Av

  • If we find Q[x] can be embedded in T[v], we will set Av[q1], ...,

  • Av[qk] to x, where each ql (1 l k) is a query node to the left of

  • x, to record the fact that x is the closest node to the right of ql

  • such that T[v] embeds Q[x].

T:

A v1

Q :

A q1

5

B v8

B v2

0

3

4

q2 B

B q5

v3 C

B v4

2

1

q3 C

C q4

v5 C

v6 C

D v7


A new algorithm for evaluating ordered tree pattern queries

01234

01234

1

1

2

Av3:

Av3:

Algorithm for query evaluation

  • Setting values in Av

  • If some time later we find another node x’, which is to the right

  • of x,such that Q[x’] can be embedded in T[v], we will set

  • Av[p1], ..., Av[ps] to x’, where each pl (1l s) is to the left of

  • x’ but to the right of qk.

T:

A v1

Q :

A q1

5

0

B v8

B v2

3

4

q2 B

B q5

v3 C

B v4

2

1

q3 C

C q4

v5 C

v6 C

D v7


A new algorithm for evaluating ordered tree pattern queries

01234

01234

3

1

Av4:

Av4:

Algorithm for query evaluation

  • Setting values in Av

  • If x’ is an ancestor of x, we will find all those entries pointing to a

  • descendant of x’ on the left-most path in Q[x’]. Replace these

  • entries with x’.

  • For all the other nodes v’ such that T[v’] embeds Q[x], we will set

  • values for the entries in Av’ in the same way as (i), (ii), and (iii).

T:

A v1

Q :

A q1

5

B v8

B v2

4

q2 B

B q5

0

3

v3 C

B v4

2

1

q3 B

C q4

v5 B

v6 C

D v7


A new algorithm for evaluating ordered tree pattern queries

Algorithm for query evaluation

  • Using Av to check tree embedding

  • Let q in Q and v in T be the nodes encountered. Let v1, ..., vk be

  • the child nodes of v. Let q1, ..., ql be the child nodes of q. We first

  • check Av1starting from Av1[h], where h = (q) - 1. We begin the

  • search from (q) - 1 because it is the closest node to the left

  • of the first child of q. Let Av1[h] = q’. If q’ is not q1, noran ancestor

  • of q1, we will check Av2[h] in a next step. This process continues

  • until one of the following conditions is satisfied:

  • (i)All Avj’s have been checked, or

  • (ii)There exists vj such that Avj[h] is q1 or an ancestor of q1.

?

label(v) = label(q)

q

v

q

Av1:

q1

q1

ql

v1

vk

h = (q) - 1


A new algorithm for evaluating ordered tree pattern queries

Algorithm for query evaluation

  • Using Av to check tree embedding

  • If all Avj’s are checked (case (i)), it shows that Q[q1] cannot be

  • embedded in any subtree rooted at a child node of v. So T[v]

  • cannot embed Q[v]. If it is case (ii), we know that T[vj] embeds

  • Q[q1]. If q1 is a //-child, or both q1 and vj are /-children, we will

  • continue to check Av(j+1)[g] against q2, where g = Avj[h].

  • (Otherwise, we will continue to check Av(j+1)[h] against q1.)

q

Av1:

q1

q2

h = (q) - 1

Av2:

Av2:[q1]


A new algorithm for evaluating ordered tree pattern queries

[2, 2, , , ]

[1, , , , ]

[2, 2, , , ]

A

A

A

A

A

A

:

:

:

:

:

:

[1, , , , ]

v2

v8

v3

v5

v6

v4

[1, , , , ]

[1, 4, 4, 4, ]

[5, , , , ]

Algorithm for query evaluation

  • Algorithm description

[1, 4, 4, 4, ]

[3, , , , ]

[3, 4, 4, 4, ]

[3, 2, 4, 4, ]

[3, 2, , , ]

[3, 2, 4, 4, ]

A

[5, 2, 4, 4, ]

:

v1


A new algorithm for evaluating ordered tree pattern queries

Algorithm for query evaluation

  • Experiments

  • We conducted our experiments on a DELL desktop PC

  • equipped with Pentium(R) 4 CPU 2.80GHz, 0.99GB RAM

  • and 20GB hard disk. The code was compiled using

  • Microsoft Visual C++ compiler version 6.0, running

  • standalone.

  • Tested methods

  • In the experiments, we have tested four methods:

  • -TwigStack (TS for short) [3],

  • -Twig2Stack (T2S for short) [10],

  • -PRIX [30],

  • -tree-embedding (discussed in this paper, TE for short).


A new algorithm for evaluating ordered tree pattern queries

Algorithm for query evaluation

  • Experiments

  • Tested methods

  • In the experiments, we have tested four methods:

  • -TwigStack (TS for short) [1],

  • -Twig2Stack (T2S for short) [2],

  • -PRIX [3],

  • -tree-embedding (discussed in this paper, TE for short).

[1]N. Bruno, N. Koudas, and D. Srivastava, Holistic Twig Joins: Optimal XML

Pattern Matching, in Proc. SIGMOD Int. Conf. on Management of Data,

Madison, Wisconsin, June 2002, pp. 310-321.

[2]S. Chen, H-G. Li, J. Tatemura, W-P. Hsiung, D. Agrawa, and K.S. Canda,

Twig2Stack: Bottom-up Processing of General­ized-Tree-Pattern Queries

over XML Documents, in Proc. VLDB, Seoul, Korea, Sept. 2006,

pp. 283-294.

[3]P. Rao and B. Moon, Sequencing XML Data and Query Twigs for Fast

Pattern Matching, ACM Transaction on Data­base Systems, Vol. 31,

No. 1, March 2006, pp. 299-345.


A new algorithm for evaluating ordered tree pattern queries

Algorithm for query evaluation

  • Experiments

  • Theoretical computational complexities

  • Indexes

XB-trees used for TwigStack, Twig2Stack, TE.

Trie structure used for PRIX


A new algorithm for evaluating ordered tree pattern queries

Algorithm for query evaluation

  • Experiments

  • Data sets

The TreeBank dataset is a real data set with narrow and deeply recursive

structure that includes multiple recursive elements.

(U. of Washington, The Tukwila System, available from

http://data.cs.washington.edu/integration/tukwila/.)

Data size: 82 MB

Num. of nodes: 2.43 million

Max/average tree depth: 36/7.9

  • Queries

Q1: //VP[DT]//PRP_DOLLAR

Q2: //S/VP/PP[IN]/NP

Q3: //S/VP//PP[NP/VB]/IN

Q4: //VP[.//PP/IN]//NP/*//JJ

Q5: //S[CC][.//PP]//NP[VBZ][IN]//JJ


A new algorithm for evaluating ordered tree pattern queries

Algorithm for query evaluation

  • Experiments

  • Test results

For all the experiments, the buffer pool size was fixed at 2000 pages. The

page size of 8KB was used. For each data set, all the tag names are stored

in a single list and then each tag name is represented by its order number

in that list during the evaluation of queries. In our implementation, each

DocId occupies 4 bytes while a number in a Prüfer sequence, a LeftPos or

a RightPos occupies 2 bytes. A levelNum value takes only 1 byte. I

Page numbers


A new algorithm for evaluating ordered tree pattern queries

Algorithm for query evaluation

  • Experiments

execution time (sec.)

I/O time (sec.)

24

16

+

+

+

PRIX

TE

TS

T2S

PRIX

TE

TS

T2S

18

12

12

+

8

+

+

6

4

+

+

+

+

1

1

Q1

Q2

Q3

Q4

Q5

Q1

Q2

Q3

Q4

Q5


Summary

Summary

  • An efficient method for evaluating ordered tree pattern queries in XML document databases

  • -parent/child and ancestor/descendant relations

  • -from-left-to-right relations

  • Computational complexity

  • -O(|T|leafQ) time

  • -O(leafTleafQ) space

  • Experiments

  • -TreeBank database

  • -I/O time and CPU processing time


A new algorithm for evaluating ordered tree pattern queries

Thank you.


  • Login