Indexing and querying xml data for regular path expressions
Download
1 / 29

Indexing and Querying XML Data for Regular Path Expressions - PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on

Indexing and Querying XML Data for Regular Path Expressions. Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001. Querying XML. XML has tree structured data model. Queries involve navigating data using regular path expressions.(e.g., XPath)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Indexing and Querying XML Data for Regular Path Expressions' - lyre


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Indexing and querying xml data for regular path expressions

Indexing and Querying XML Data for Regular Path Expressions

Quanzhong Li and Bongki Moon

Dept. of Computer Science

University of Arizona

VLDB 2001.


Querying xml
Querying XML

  • XML has tree structured data model.

  • Queries involve navigating data using regular path expressions.(e.g., XPath)

    e.g. /chapter/-*/figure[@caption=“Tree Frogs”]

    • Accessing all elements with same name string.

    • Ancestor-descendant relationship between elements.


Contribution
Contribution

  • New system for Indexing XML data.

  • Querying XML data based on a numbering scheme for elements

  • Join algorithms for processing complex regular path expressions.


Outline
Outline

  • Numbering scheme

  • Index structure

  • Join algorithms

  • Experimental results


Path expression evaluation
Path expression evaluation

  • Previous approaches

    • Conventional tree traversals

      • Disadvantage: Overhead of traversing for long or unknown path lengths.

  • New approach

    • Indexing for efficient element access.

    • Numbering scheme for ancestor-descendant relationship.


Dietz s numbering scheme
Dietz’s Numbering Scheme

(1,7)

  • for two given nodes x and y, x is an ancestor of y, if and only if

    • x occurs before y in the preorder traversal of T and

    • after y in postorder traversal.

(6,6)

(2,4)

(7,5)

(3,1)

(5,3)

(4,2)


Proposed numbering scheme
Proposed numbering scheme

This associates with each node

a pair of numbers <order, size>

as follows:

  • For a tree node y and its parent x,

    • order(x) < order(y)

    • order(y)+size(y) =< order(x) + size(x)

  • For two sibling nodes x and y, if x is the predecessor of y in preorder traversal then

    • order(x) + size(x) < order(y)

(1,100)

(10,30)

(41,10)

(45,5)

(25,5)

(11,5)

(17,5)


Advantages
Advantages

  • Efficient Updates

    • Extra space can be reserved to accommodate future insertions.


Ancestor descendant relationship
Ancestor–descendant relationship

  • For two given nodes x and y of a tree T, x is an ancestor of y if and only if

    • order(x) < order(y) =< order(x) + size(x).


Outline1
Outline

  • Numbering scheme

  • Index structure

  • Join algorithms

  • Experimental results


Index and data organization
Index and Data Organization

Query

Processor

Query

Result

XISS

Element

Index

Attribute

Index

Structure

Index

Name

Index

Value

Table

XML Raw

Data

Document

Loader

Paged File


Element index
Element Index

Element nid

Element nid

Document ID list

B+-tree

B+-tree

<Order, Size>

Depth,

Parent ID

Element

Record

Element list with the

Same name in the

Same Document


Structure index
Structure Index

B+-tree

Document ID

(did)

nid,

<order,size>,

Parent order,

Child order,

Sibling order,

Attribute order

Array of All Elements

And Attributes in the

Same Document


Outline2
Outline

  • Numbering scheme

  • Index structure

  • Join algorithms

  • Experimental results


Regular path expression
Regular Path expression

  • complex regular path expressions.

    • e.g., /chapter/_*/figure[@caption=“Tree Frogs”]


Regular expression decomposition
Regular expression Decomposition

  • A regular path expression can be decomposed to a combination of following basic subexpressions:

    • A subexpression with a single element or a single attribute,

    • A subexpression with an element and an attribute ( e.g., figure[@caption = “Tree Frogs”])

    • A subexpression with two elements (e.g., chapter/figure or chapter/_*/figure),

    • A subexpression with a Kleene closure (+,*) of another subexpression, and

    • A subexpression that is a union of two other subexpressions.


Example
Example

  • ( E1 / E2 ) * / E3 / ( ( E4 [ @A = v ] ) | ( E5 / _* / E6 ) )

E2

E3

E4

@A=v

E5

E6

E1

[ ]

EE-Join

/

EA-Join

/_*/

EE-Join

*

KC-Join

/

Union

/

EE-Join

/

EE-Join


Join algorithms
Join algorithms

  • Element – Attribute join

  • Element – Element join

  • Kleene – Closure join


Ea join algorithm
EA-Join Algorithm

  • Input:

    • {E1..Em}: Ei is a set of elements having a common document identifier;

    • {A1..An}: Aj is a set of attributes having a common document identifier;

  • Output:

    • A set of (e,a) pairs such that the element e is the parent of the attribute a.

      //Sort-merge {Ei} and {Aj} by document identifier.

      For each Ei and Aj with the same did do

      //Sort-merge Ei and Aj by PARENT-CHILD relationship.

      For each e in Ei and a in Aj do

      If ( e is a parent of a) then output (e,a);

      End

      End.


Example1
Example

book

chapter

chapter

chapter

appendix

Figure

Figure

Figure


Attribute element position
Attribute-element position

chapter <1,3>

chapter <1,3>

chapter<2,1>

chapter <3,1>

name <4,0>

name<2,0>

name <4,0>

name <3,0>


Ee join algorithm
EE-Join Algorithm

  • Input:

    • {E1..Em} and {F1..Fn}: Ei and Fj is a set of elements having a common document identifier.

  • Output:

    • A set of (e,f) pairs such that the element e is an ancestor of the element f.

      //Sort-merge {Ei} and {Fj} by doc. identifier.

      For each Ei and Fj with the same did do

      //Sort-merge Ei and Fj by ANCESTOR-DESCENDANT relationship.

      For each e in Ei and f in Fj do

      If (e is an ancestor of f ) then output (e,f)

      End

      End


Extreme case of ee join
Extreme case of EE-Join

chapter <1,90>

chapter <2,80>

chapter <8,20>

chapter <9,10>

figure <19,0>

figure <10,0>

figure <11,0>


Kc join algorithm
KC-Join Algorithm

  • Input:

    • {E1..Em}: where Ei is a group of elements from an XML document.

  • Output:

    • A Kleene Closure of {E1..Em}

      //Apply EE-Join algorithm repeatedly.

      Set x = 1;

      Set Ki = {E1..Em};

      Repeat

      Set I = I +1;

      Set Ki = EE-Join(Ei-1, E1);

      Until ( Ki is empty);

      Output union of K1,K2..Ki-1.


Outline3
Outline

  • Numbering scheme

  • Index structure

  • Join algorithms

  • Experimental results


Experiment results
Experiment Results

  • Comparison with top-down and bottom-up evaluation methods.

  • Comparison for

    • EE-Join ( E1 /_*/ E2 )

    • EA-Join ( E[@A] )

  • Scalability test




Results
Results

  • EE-Join algorithm outperformed bottom-up.

  • EA-Join algorithm is comparable with top-down but outperformed bottom-up.

  • Both are linearly scalable.


ad