Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries
This presentation is the property of its rightful owner.
Sponsored Links
1 / 36

Unordered Tree Matching and Strict Unordered Tree Matching: the Evaluation of Tree Pattern Queries PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on
  • Presentation posted in: General

Unordered Tree Matching and Strict Unordered Tree Matching: the Evaluation of Tree Pattern Queries. Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9. Outline. Motivation

Download Presentation

Unordered Tree Matching and Strict Unordered Tree Matching: the Evaluation of Tree Pattern Queries

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Unordered Tree Matching and Strict Unordered Tree Matching: the Evaluation of Tree Pattern Queries

Dr. Yangjun Chen

Dept. Applied Computer Science,

University of Winnipeg

515 Portage Ave.

Winnipeg, Manitoba, Canada R3B 2E9


Outline

Outline

  • Motivation

  • Algorithm for unordered tree pattern query evaluation

    -Tree encoding

    -Algorithm description

  • On the strict unordered tree matching

  • Experiment results

  • Summary


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

a tree pattern query

XML documents

Motivation

  • Efficient method to evaluate XPath expression queries – XML query processing


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

<Purchase>

<Seller>

<Name>dell</Name>

<Item>

<Manufacturer>IBM</Manufacturer>

<Name>part#1</Name>

<Item>

<Manufacturer>Intel</Manufacturer>

</Item>

</Item>

<Item>

<Name>Part#2</Name>

</Item>

<Location>Houston</Location>

</Seller>

<Buyer>

<Location>Winnipeg</Location>

<Name>Y-Chen</Name>

</Buyer>

</Purchase>

P

S

B

N

I

I

L

L

N

Houston

Winnipeg

Y-Chen

Dell

M

N

I

N

IBM

Part#1

Part#2

M

Intel

Motivation

Document:


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

P

S

B

N

I

I

L

L

N

Houston

Winnipeg

Y-Chen

Dell

M

N

I

N

IBM

Part#1

Part#2

M

Intel

Motivation

Document:

Query – XPath expressions:

Q1: /Purchase[Seller[Loc=‘Boston’]]/

Buyer[Loc = ‘New York’

Purchase

Buyer

Seller

Location

Location

‘Winnipeg’

‘Houston’

Q2: /Purchase//Item[Manufacturer = ‘Intel’]

Buyer

d-edge: ancestor-

descendant relationship

Item

Manufacturer

c-edge: parent-child

relationship

‘Intel’


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

a

b

b

c

d

c

e

d

book

title

author

Art of Programming

fn

ln

Knuth

Donald

Motivation

  • XPath evaluation against XML documents

-XPath expression

a[b[c and .//d]]/b[c and e//d]

book[title = ‘Art of Programming’]//author[fn = ‘Donald’ and

ln = ‘Knuth’]

<document>

<book>

<title>

Art of Programming

</title>

<author>

<fn>Donald Knuth</fn>

… …


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

a

c

c

a

b

b

c

b

b

Motivation

  • XPath evaluation against XML documents

-Evaluation based on unordered tree matching:

Definition An embedding of a tree pattern Q into an XML document T is a mapping f: Q  T, from the nodes of Q to the nodes of T, which satisfies the following conditions:

(i)Preserve node label: For each u  Q, label(u) matches label(f(u)).

(ii)Preserve parent-child/ancestor-descendant relationships: If uv in Q, then f(v) is a child of f(u) in T; if uv in Q, then f(v) is a descendant of f(u) in T.

T:

Q:

q3

v6

q1

q2

v4

v5

v1

v3

v2


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

a

c

c

a

b

b

c

b

b

Motivation

  • XPath evaluation against XML documents

-Evaluation based on unordered tree matching:

Definition 1 An embedding of a tree pattern Q into an XML document T is a mapping f: Q  T, from the nodes of Q to the nodes of T, which satisfies the following conditions:

(i)Preserve node label: For each u  Q, label(u) matches label(f(u)).

(ii)Preserve parent-child/ancestor-descendant relationships: If uv in Q, then f(v) is a child of f(u) in T; if uv in Q, then f(v) is a descendant of f(u) in T.

T:

Q:

q3

v6

q1

q2

v4

v5

v1

v3

v2


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

T:

(1, 1, 11, 1)

A v1

B v8

(1, 2, 9, 2)

B v2

(1, 10, 10, 2)

v3 C

(1, 3, 3, 3)

B v4

(1, 4, 8, 3)

v5 C

v6 C

D v7

(1, 7, 7, 4)

(1, 5, 5, 4)

(1, 6, 6, 4)

Algorithm for query evaluation

  • Tree encoding

Let T be a document tree. We associate each node v in T with a quadruple (DocId, LeftPos, RightPos, LevelNum), where DocId is the document identifier; LeftPos and RightPos are generated by counting word numbers from the beginning of the document until the start and end of the element, respectively; and LevelNum is the nesting depth of the element in the document.

1

<A>

<B>

<C>

</C>

<B>

<C>

</C>

<C>

</C>

<D>

</D>

</B>

</B>

<B>

</B>

</A>

2

3

3

4

5

5

6

6

7

7

8

9

10

10

11


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

  • Tree encoding

Let T be a document tree. We associate each node v in T with a quadruple (DocId, LeftPos, RightPos, LevelNum), denoted as a(v), where DocId is the document identifier; LeftPos and RightPos are generated by counting word numbers from the beginning of the document until the start and end of the element, respectively; and LevelNum is the nesting depth of the element in the document.

(i)ancestor-descendant: a node v1 associated with (d1, l1, r1, ln1) is an ancestor of another node v2 with (d2, l2, r2, ln2) iff d1= d2, l1< l2, and r1> r2.

(ii)parent-child: a node v1 associated with (d1, l1, r1, ln1) is the parent of another node v2 with (d2, l2, r2, ln2) iff d1= d2, l1< l2, r1> r2, and ln2 = ln1 + 1.

(iii)from left to right: a node v1 associated with (d1, l1, r1, ln1) is to the left of another node v2 with (d2, l2, r2, ln2) iff d1= d2, r1< l2.


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

T:

(1, 1, 11, 1)

A v1

B v8

(1, 2, 9, 2)

B v2

(1, 10, 10, 2)

v3 C

(1, 3, 3, 3)

B v4

(1, 4, 8, 3)

v5 C

v6 C

D v7

(1, 7, 7, 4)

(1, 5, 5, 4)

(1, 6, 6, 4)

Algorithm for query evaluation

  • Tree encoding

1

<A>

<B>

<C>

</C>

<B>

<C>

</C>

<C>

</C>

<D>

</D>

</B>

</B>

<B>

</B>

</A>

2

3

3

4

5

5

6

6

7

7

8

9

10

Data streams:

10

sorted by LeftPos values

11

A:(1, 1, 11, 1)

B:(1, 2, 9, 2)(1, 4, 8, 3), (1, 10, 10, 2)

C:(1, 3, 3, 3)(1, 5, 5, 4), (1, 6, 6, 4)

D:(1, 7, 7, 4)


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

A q1

L(q1)

Q:

B q5

q2 B

L(q2)

L(q5)

q3 C

C q4

L(q4)

L(q3)

Algorithm for query evaluation

  • Algorithm description

  • Our algorithm works bottom-up. Therefore, we need to sort XML

  • streams by (DocID, RightPos) values.

  • Each time a query Q is submitted to the system, we will associate

  • each query node q with a data stream L(q) such that for

  • each vL(q) label(v) = label(q), in which each query node is

  • attached with a list of matching nodes of the document tree.

T:

Q:

sorted by RightPos values

{v1}

L(q1 ) =(1, 1, 11, 1) -

{v4, v2, v8}

L(q2 ) = L(q5)=(1, 4, 8, 3),(1, 2, 9, 2)(1, 10, 10, 2) -

{v3, v5, v6}

L(q3) = L(q4) = (1, 3, 3, 3)(1, 5, 5, 4), (1, 6, 6, 4) -


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Algorithm for query evaluation

  • Algorithm description

1.S(v) – a set of query nodes q associated with document tree node

v such that Q[q] can be embedded in T[v].

  • (q) – a variable associated with query node q.

    • During the process, (q) will be dynamically assigned a series

    • of values a0, a1, ..., am for some m in sequence, where a0 = 

    • and ai’s (i = 1, ..., m) are different nodes of T.

    • Initially, (q) is set to a0 = . (q) will be changed from ai-1to

    • ai = v (i = 1, ..., m) when the following conditions are satisfied.

i)v is the node currently encountered.

ii)q appears in S(u) for some child node u of v.

iii)q is a d-child, or

q is a c-child, and u is a c-child with label(u) = label(q).

T:

Q:

S(v)

(q)

v

q


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

q’

v

u

q

S(u) = {…, q, …}

Algorithm for query evaluation

  • Algorithm description

i)v is the node currently encountered.

ii)q appears in S(u) for some child node u of v.

iii)q is a d-child, or

q is a c-child, and u is a c-child with label(u) = label(q).

d-child

(q) is changed

from ai-1 to ai = v

q’

v

u

c-child

q

label(u) = label(q).

S(u) = {…, q, …}

u must be a direct child of v.


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Algorithm for query evaluation

  • Algorithm description

3.Subtree embedding by using(q)

label(v) = label(q’}

v

q’

ql

q1

(q1)= (q2) = … = (ql) =v

T[v] embedsQ[q’].


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Q :

A q1

5

3

4

q2 B

B q5

2

1

q3 C

C q4

Algorithm for query evaluation

  • Algorithm description

4.Construction of S(v)

  • The nodes of Q are numberedin postorder. So the nodes in Q will be referenced by their postorder numbers.

  • For a leaf node v in T, any encountered leaf node q in Q will be inserted into S(v) if label(u) = label(q). (Since Q is explored bottom-up, the query nodes in S(v)must be increasingly sorted.)

  • For a non-leaf node v with children v1, …, vk, S(v)  S(v1) …  S(vk)  {all nodes q with T[v] embedding Q[q]}.

Since S(v) is sorted, ‘union’ can be

implemented using the ‘merge’ operation.


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

a

a

c

c

c

b

b

b

b

Strict unordered tree matching

  • Definition

Definition 2 A strict embedding of a tree pattern Q into an XML

document T is a mapping f: Q  T, from the nodes of Q to the nodes

of T, which satisfies the following conditions:

  • same as (i) in Definition 1.

  • same as (ii) in Definition 1.

  • For any two nodes v1 Q and v2 Q, if v1and v2 are not

  • related by an ancestor-descendant relationship, then f(v1)and

  • f(v2) in T are not related by an ancestor-descendant relationship.

T:

Q:

q3

v6

q1

q2

v4

v5

v1

v3

v2


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

a

c

c

a

b

c

b

b

b

Strict unordered tree matching

  • Definition

Definition 2 A strict embedding of a tree pattern Q into an XML

document T is a mapping f: Q  T, from the nodes of Q to the nodes

of T, which satisfies the following conditions:

  • same as (i) in Definition 1.

  • same as (ii) in Definition 1.

  • For any two nodes v1 Q and v2 Q, if v1and v2 are not

  • related by an ancestor-descendant relationship, then f(v1)and

  • f(v2) in T are not related by an ancestor-descendant relationship.

T:

Q:

q3

v6

q1

q2

v4

v5

v1

v3

By Definition 2, this mapping

is not allowed.

v2


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Q:

T:

c

b

y

e

d

c

d

a

a

e

b

x

Strict unordered tree matching

Strict unordered tree matching needs the exponential time.

O(|T||Q|dk)

d – the largest out-degree of the nodes in T.

k – the largest out-degree of the nodes in Q.

Using the concept of hypergraphs, the time complexity can be slightly improved

to O(|T||Q|2k).


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Algorithm for query evaluation

  • Experiments

  • We conducted our experiments on a DELL desktop PC

  • equipped with Pentium(R) 4 CPU 2.80GHz, 0.99GB RAM

  • and 20GB hard disk. The code was compiled using

  • Microsoft Visual C++ compiler version 6.0, running

  • standalone.

  • Tested methods

  • In the experiments, we have tested four methods:

  • -TwigStack (TS for short),

  • -Twig2Stack (T2S for short),

  • -Twig-List (TL for short),

  • -One-Phase Holistic (OPH for short),

  • -tree-embedding (discussed in this paper, TE for short).


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Algorithm for query evaluation

  • Experiments

  • Tested methods

  • In the experiments, we have tested five methods:

  • -TwigStack (TS for short) [1],

  • -Twig2Stack (T2S for short) [2],

  • -Twig-List (TL for short) [3],

  • -One-Phase Holistic (OPH for short) [4],

  • -tree-embedding (discussed in this paper, TE for short).

[1]N. Bruno, N. Koudas, and D. Srivastava, Holistic Twig Joins: Optimal XML Pattern Matching, in Proc. SIGMOD Int. Conf. on Management of Data, Madison, Wisconsin, June 2002, pp. 310-321.

[2]S. Chen, H-G. Li, J. Tatemura, W-P. Hsiung, D. Agrawa, and K.S. Canda, Twig2Stack: Bottom-up Processing of General­ized-Tree-Pattern Queries over XML Documents, in Proc. VLDB, Seoul, Korea, Sept. 2006, pp. 283-294.

[3]Qin, L., Yu, J. X., and Ding, B., “TwigList: Make Twig Pattern Matching fast,” In Proc. 12th Int’l Conf. on Database Systems for Advanced Applications (DASFAA),

pp. 850-862, Apr. 2007.

[4]Jiang, Z., Luo, C., Hou, W.-C., Zhu, Q., and Che, D., “Efficient Processing of XML Twig Pattern: A Novel One-Phase Holistic Solution,” In Proc. The 18th Int’l Conf. on Database and Expert Systems Applications (DEXA), pp. 87-97, Sept. 2007.


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Algorithm for query evaluation

  • Experiments

  • Theoretical computational complexities

D- a largest data stream associated with a node q of Q such that for each v

in the data stream we have label(v) = label(q).


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Algorithm for query evaluation

  • Experiments

  • Indexes

All the tested methods use XB-trees as their index

structure.


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Algorithm for query evaluation

  • Experiments

  • Data sets

The data sets used for the tests are DBLP data set [30] and a synthetic

XMARK data set [35]. The DBLP data set is another real data set with

high similarity in structure. It is in fact a wide and shallow document.


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Algorithm for query evaluation

  • Experiments

  • Queries – altogether 20 queries, divided into 4 groups

Group I:

Q1://inproceedings [author]//year [text() = ‘2004’]

Q2://inproceedings [author and title]//year [text() = ‘2004’]

Q3://inproceedings [author and title and .//pages]//year[text() = ‘2004’]

Q4://inproceedings [author and title and .//pages and .//url]

//year [text() = 2004’]

Q5://articles [author and title and .//volume and .//pages and //url]//year [text() = ‘2004’]


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Algorithm for query evaluation

  • Experiments

  • Test results

For all the experiments, the buffer pool size was fixed at 2000 pages. The

page size of 8KB was used. For each data set, all the tag names are stored

in a single list and then each tag name is represented by its order number

in that list during the evaluation of queries. In our implementation, each

DocId occupies 4 bytes while a number in a Prüfer sequence, a LeftPos or

a RightPos occupies 2 bytes. A levelNum value takes only 1 byte.


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

*

Algorithm for query evaluation

  • Experiments

execution time (sec.)

5

+

TS

T2S

OPH

TL

TM

4

+

*

3

+

*

+

2

1

Q1

Q2

Q3

Q4

Q5


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Algorithm for query evaluation

  • Experiments

  • Queries – altogether 20 queries, divided into 4 groups

Group II:

Q6://inproceedings[author/* and ./*]/year

Q7://inproceedings[author/* and title and ./*]/year

Q8://inproceedings[author/* and title and .//pages and ./*]/year

Q9://inproceeding[author/* and title and .//pages and .//url and ./*]/year

Q10://articles[author/* and title and .//volume and .//pages and .//pages and .//url and ./*]/year


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

+

TS

T2S

OPH

TL

TM

*

Algorithm for query evaluation

  • Experiments

execution time (sec.)

60

+

50

40

*

+

*

*

30

*

20

Q6

Q7

Q8

Q9

Q10


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Algorithm for query evaluation

  • Experiments

  • Queries – altogether 20 queries, divided into 4 groups

Group III:

Q11:/site//open_auction[.//seller/person]//date[text() = ‘10/23/2006’]

Q12:/site//open_auction[.//seller/person and .//bidder]//date[text() =

‘10/23/2006’]

Q13:/site//open_auction[.//seller/person and /./bidder/increase]//date

[text() = ‘10/23/2006’]

Q14:/site//open_auction[.//seller/person and .//bidder/increase and .//initial]//date [text() = ‘10/23/ 2006’]

Q15:/site//open_auction[.//seller/person and .//bidder/increase and //initial and .//description]//date [text() = ‘10/23/2006’]


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

+

TS

T2S

OPH

TL

TM

*

Algorithm for query evaluation

  • Experiments

execution time (sec.)

5

+

4

+

3

+

*

2

*

*

*

1

Q11

Q12

Q13

Q14

Q15


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Algorithm for query evaluation

  • Experiments

  • Queries – altogether 20 queries, divided into 4 groups

Group IV:

Q16:/site//open_auction[.//seller/person/* and ./*]/date

Q17:/site//open_auction[.//seller/person/* and .//bidder and ./*]/date

Q18:/site//open_auction[.//seller/person/* and .//bidder/increase]/date

Q19:/site//open_auction[.//seller/person/* and .//bidder/increase and .//initial and ./*]/date

Q20:/site//open_auction[.//seller/person/* and .//bidder/increase and .//initial and .//description and ./*]/ date


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

+

TS

T2S

OPH

TL

TM

*

Algorithm for query evaluation

  • Experiments

execution time (sec.)

5

+

4

*

3

*

*

2

*

1

Q16

Q17

Q18

Q19

Q20


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Run time space usage

Algorithm for query evaluation

  • Experiments

  • Test results

In the following figure, we compare the runtime memory usage of all the

five tested approaches for the second group of queries. By the memory

usage, we mean the intermediate data structures, not including data

stream (concretely, path stacks for TwigStack; hierarchical stacks for

Twig2Stack, TL and OPH; and QSs for ours.)


Summary

Summary

  • An efficient method for evaluating unordered tree pattern queries in XML document databases

  • -parent/child and ancestor/descendant relation

  • -O(|D||Q|) time and O(|D||Q|) space

  • Strict unordered tree matching

  • -exponential time

  • Experiments

  • -TreeBank database, DBLP and XMark documents

  • -I/O time and CPU processing time


Unordered tree matching and strict unordered tree matching the evaluation of tree pattern queries

Thank you.


  • Login