1 / 22

Structural Joins: A Primitive for Efficient XML Query Pattern Matching

This research paper explores the use of structural joins for efficient XML query pattern matching. It presents two algorithms for processing ancestor-descendant and parent-child relationships, with and without stacks. The paper also discusses the complexity and worst-case examples of the algorithms.

wbryan
Download Presentation

Structural Joins: A Primitive for Efficient XML Query Pattern Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002

  2. Element Numbering • (documentId, startpos:endpos, level)

  3. Join Conditions Using Numbering • (D1, S1:E1, L1) (D2, S2:E2, L2) • Ancestor-Descendant • D1 = D2, S1 < S2 < E2 < E1 • Parent-Child • D1 = D2, S1 < S2 < E2 < E1, L1 + 1 = L2

  4. Tree pattern >> Structural Relationship

  5. Structural Join • Input • 2 element lists • Ancestor and descendant; parent and child • Sorted by start position • Output • Pairs of ancestor/descendant or parent/child • Sorted by first or second element • 2 algorithms presented • With and without stacks • Both with ordering by ancestor and by descendant

  6. 1 20 2 11 12 19 Example of results 1,20 Ancestor 2,11 12,19 Descendant 3,10 13,18 4,5 6,7 8,9 14,15 16,17 Parent/child Interval representation

  7. Tree Merge Joinordered by ancestor

  8. TREE 4,13 5,12 14,15 16,23 17,22 4,13 5,12 14,15 2,3 6,7 8,9 10,11 24,25 18,19 20,21 skip skip skip skip skip loop loop no match 1,26, 2,3 4,13 14,15 16,23 24,25 • Skip descendants with START < ancestor.start • FOR each ancestor • Check/output descendants • until START > ancestor.end 5,12 17,22 6,7 8,9 10,11 18,19 20,21 Results: [4,13+6,7][4,13+8,9][4,13+10,11] Results: [5,13+6,7][5,13+8,9][5,13+10,11] …

  9. Tree Merge Joinordered by descendant

  10. TREE 5,12 14,15 16,23 17,22 4,13 skip skip 2,3 2,3 6,7 8,9 no match 1,26, 2,3 4,13 14,15 16,23 24,25 • Skip ancestors with END < descendant.start • FOR each descendant • Check/output ancestors • until START > descendant.end 5,12 17,22 6,7 8,9 10,11 18,19 20,21 Results: [6,7+4,13][6,7+5,12] [8,9+4,13][8,9+5,12] … 2,3 6,7 8,9 10,11 24,25 18,19 20,21

  11. Complexity • For ancestor-descendant relationships: • Tree-Merge-Anc time complexity optimal • May be quadratic, but proportional to output size • But can have poor IO performance • For parent-child relationships • Tree merge cost may still be quadratic, but output size can only be linear • Tree-Merge-Desc can be quadratic in output size

  12. Worst-Case Examples • a1 has the whole d list as descendants • a2 has from d2 to d2n-1 as descendants and so on • Which means: practically quadratic performance (each ancestor has to check the whole descendant list)

  13. Worst-Case Examples • Equivalent situation considering when considering Tree-Merge-Desc

  14. Stack-Tree Algorithm • Basic idea: depth first traversal of XML tree • Linear time with stack size = depth of tree • All ancestor-descendant relationships appear on stack during traversal • Traverse the lists only once • Main problem: do not want to traverse the whole database, just nodes in A-list/D-list

  15. Stack-Tree-Desc

  16. 2,3 6,7 8,9 10,11 24,25 18,19 20,21 skip POP!! and keep going stack 5,12 5,12 4,13 Results: [4,13+6,7] [5,12+6,7] 4,13 Results: [4,13+6,7] Results: [4,13+6,7] [5,12+6,7] 4,13 Print 8,9 with the whole stack: [4,13+8,9] [4,13+5,12] TREE • Print in order of descendants • Keep ancestors in the same path in a stack • When descendant comes, it is descendant of the whole stack, then print them • Pop from stack when a different path is processed • e.g. when 14,15 comes, both previous ancestors are popped 1,26, 2,3 4,13 14,15 16,23 24,25 5,12 17,22 6,7 8,9 10,11 18,19 20,21 5,12 14,15 16,23 17,22 4,13 stack

  17. Example of Stack-Tree-Desc Execution

  18. Stack-Tree-Anc • Basic problem: results from a particular descendant cannot be output immediately • Later descendants may match earlier ancestor • Solution: keep lists of matching descendant nodes with each stack node • Self-list • Descendants that match this node • Add descendant node to self-lists of all matching ancestor nodes • Inherit list • Inherited from descendants already popped from stack, to be output after self-list matches are output

  19. Stack-Tree Analysis • Stack-Tree-Desc • Time complexity (for anc-desc and par-child) • O(|Alist| + |Dlist| + |OutputList|) • IO Complexity (for anc-desc and par-child) • O(|Alist|/B + |Dlist|/B + |OutputList|/B) • Where B is blocking factor • Stack-Tree-Anc • Requires careful handling of lists • Complexity is same as for Desc case

  20. Performance Study

More Related