1 / 29

Indexing and Querying XML Data for Regular Path Expressions

Indexing and Querying XML Data for Regular Path Expressions. Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001. Querying XML. XML has tree structured data model. Queries involve navigating data using regular path expressions.(e.g., XPath)

lyre
Download Presentation

Indexing and Querying XML Data for Regular Path Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.

  2. Querying XML • XML has tree structured data model. • Queries involve navigating data using regular path expressions.(e.g., XPath) e.g. /chapter/-*/figure[@caption=“Tree Frogs”] • Accessing all elements with same name string. • Ancestor-descendant relationship between elements.

  3. Contribution • New system for Indexing XML data. • Querying XML data based on a numbering scheme for elements • Join algorithms for processing complex regular path expressions.

  4. Outline • Numbering scheme • Index structure • Join algorithms • Experimental results

  5. Path expression evaluation • Previous approaches • Conventional tree traversals • Disadvantage: Overhead of traversing for long or unknown path lengths. • New approach • Indexing for efficient element access. • Numbering scheme for ancestor-descendant relationship.

  6. Dietz’s Numbering Scheme (1,7) • for two given nodes x and y, x is an ancestor of y, if and only if • x occurs before y in the preorder traversal of T and • after y in postorder traversal. (6,6) (2,4) (7,5) (3,1) (5,3) (4,2)

  7. Proposed numbering scheme This associates with each node a pair of numbers <order, size> as follows: • For a tree node y and its parent x, • order(x) < order(y) • order(y)+size(y) =< order(x) + size(x) • For two sibling nodes x and y, if x is the predecessor of y in preorder traversal then • order(x) + size(x) < order(y) (1,100) (10,30) (41,10) (45,5) (25,5) (11,5) (17,5)

  8. Advantages • Efficient Updates • Extra space can be reserved to accommodate future insertions.

  9. Ancestor–descendant relationship • For two given nodes x and y of a tree T, x is an ancestor of y if and only if • order(x) < order(y) =< order(x) + size(x).

  10. Outline • Numbering scheme • Index structure • Join algorithms • Experimental results

  11. Index and Data Organization Query Processor Query Result XISS Element Index Attribute Index Structure Index Name Index Value Table XML Raw Data Document Loader Paged File

  12. Element Index Element nid Element nid Document ID list B+-tree B+-tree <Order, Size> Depth, Parent ID Element Record Element list with the Same name in the Same Document

  13. Structure Index B+-tree Document ID (did) nid, <order,size>, Parent order, Child order, Sibling order, Attribute order Array of All Elements And Attributes in the Same Document

  14. Outline • Numbering scheme • Index structure • Join algorithms • Experimental results

  15. Regular Path expression • complex regular path expressions. • e.g., /chapter/_*/figure[@caption=“Tree Frogs”]

  16. Regular expression Decomposition • A regular path expression can be decomposed to a combination of following basic subexpressions: • A subexpression with a single element or a single attribute, • A subexpression with an element and an attribute ( e.g., figure[@caption = “Tree Frogs”]) • A subexpression with two elements (e.g., chapter/figure or chapter/_*/figure), • A subexpression with a Kleene closure (+,*) of another subexpression, and • A subexpression that is a union of two other subexpressions.

  17. Example • ( E1 / E2 ) * / E3 / ( ( E4 [ @A = v ] ) | ( E5 / _* / E6 ) ) E2 E3 E4 @A=v E5 E6 E1 [ ] EE-Join / EA-Join /_*/ EE-Join * KC-Join / Union / EE-Join / EE-Join

  18. Join algorithms • Element – Attribute join • Element – Element join • Kleene – Closure join

  19. EA-Join Algorithm • Input: • {E1..Em}: Ei is a set of elements having a common document identifier; • {A1..An}: Aj is a set of attributes having a common document identifier; • Output: • A set of (e,a) pairs such that the element e is the parent of the attribute a. //Sort-merge {Ei} and {Aj} by document identifier. For each Ei and Aj with the same did do //Sort-merge Ei and Aj by PARENT-CHILD relationship. For each e in Ei and a in Aj do If ( e is a parent of a) then output (e,a); End End.

  20. Example book chapter chapter chapter appendix Figure Figure Figure

  21. Attribute-element position chapter <1,3> chapter <1,3> chapter<2,1> chapter <3,1> name <4,0> name<2,0> name <4,0> name <3,0>

  22. EE-Join Algorithm • Input: • {E1..Em} and {F1..Fn}: Ei and Fj is a set of elements having a common document identifier. • Output: • A set of (e,f) pairs such that the element e is an ancestor of the element f. //Sort-merge {Ei} and {Fj} by doc. identifier. For each Ei and Fj with the same did do //Sort-merge Ei and Fj by ANCESTOR-DESCENDANT relationship. For each e in Ei and f in Fj do If (e is an ancestor of f ) then output (e,f) End End

  23. Extreme case of EE-Join chapter <1,90> chapter <2,80> chapter <8,20> chapter <9,10> figure <19,0> figure <10,0> figure <11,0>

  24. KC-Join Algorithm • Input: • {E1..Em}: where Ei is a group of elements from an XML document. • Output: • A Kleene Closure of {E1..Em} //Apply EE-Join algorithm repeatedly. Set x = 1; Set Ki = {E1..Em}; Repeat Set I = I +1; Set Ki = EE-Join(Ei-1, E1); Until ( Ki is empty); Output union of K1,K2..Ki-1.

  25. Outline • Numbering scheme • Index structure • Join algorithms • Experimental results

  26. Experiment Results • Comparison with top-down and bottom-up evaluation methods. • Comparison for • EE-Join ( E1 /_*/ E2 ) • EA-Join ( E[@A] ) • Scalability test

  27. EE-Join performance

  28. EA-Join performance

  29. Results • EE-Join algorithm outperformed bottom-up. • EA-Join algorithm is comparable with top-down but outperformed bottom-up. • Both are linearly scalable.

More Related