1 / 25

Structural Join Algorithms – Examples

Structural Join Algorithms – Examples. Key property: x is a descendant (resp., child) of y iff x.docId = y.docId & x.StartPos < y.StartPos <= y.EndPos < x.EndPos (and y.Level = x.Level+1). A node n for us is (D, S:E, L). Call this node id for convenience. What is structural join?

maya
Download Presentation

Structural Join Algorithms – Examples

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structural Join Algorithms – Examples • Key property: x is a descendant (resp., child) of y iff x.docId = y.docId & x.StartPos < y.StartPos <= y.EndPos < x.EndPos (and y.Level = x.Level+1). • A node n for us is (D, S:E, L). Call this node id for convenience. • What is structural join? • given lists Alist and Dlist of nodes • output pairs (x,y) of nodes [x in Alist, y in Dlist], s.t. x is a <relative> of y. • frequently, assume i/p lists are ordered by node id. • might want to order o/p by first operand(‘s node id) or second. (what diff. does it make?) • TPQ = compute several SJs and stitch ‘em together.

  2. SJ variants • There is also a so-called holistic join algorithm (Bruno, Koudas, and Srivastava SIGMOD 2002). • Extend binary join ideas to finding matches for paths/twigs. • In XML query processing, also need following variants of SJ: • Given Alist and Dlist, whenever x in Alist has a relative y in Dlist, output (x,y); else just output x. (structural outerjoin). • Given …, output x in Alist whenever there exists y in Dlist such that y is a relative of x. (structural semijoin.) • Given …, output x in Alist whenever it has no relative y in Dlist. (structural semi-antijoin.)

  3. Tree-Merge Join (ordered by ancestor) d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d1 d4 d2 d3 a3 d5 d5 d6 a4 d6 output (a1,d1), …, (a1,d6),

  4. Tree-Merge Join (ordered by ancestor) d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d1 d4 d2 d3 a3 d5 d5 d6 a4 d6 output (a1,d1), …, (a1,d6), (a2,d1),

  5. Tree-Merge Join (ordered by anc). d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d1 d4 d2 d3 a3 d5 d5 d6 a4 d6 output (a1,d1), …, (a1,d6), (a2,d1),(a3,d2), (a3,d3),

  6. Tree-Merge Join (O.B. anc). d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d1 d4 d2 d3 a3 d5 d5 d6 a4 d6 output (a1,d1), …, (a1,d6), (a2,d1),(a3,d2), (a3,d3), (a4,d5), (a4,d6).

  7. Tree-Merge Join (ordered by descendant). d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d1 d4 d2 d3 a3 d5 d5 d6 a4 d6 output (a1,d1), (a2,d1),

  8. Tree-Merge Join (ordered by descendant). d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d1 d4 d2 d3 a3 d5 d5 d6 a4 d6 output (a1,d1), (a2,d1), (a1,d2), (a3,d2),

  9. Tree-Merge Join (ordered by descendant). d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d1 d4 d2 d3 a3 d5 d5 d6 a4 d6 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3),

  10. Tree-Merge Join (ordered by descendant). d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d1 d4 d2 d3 a3 d5 d5 d6 a4 d6 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4),

  11. Tree-Merge Join (ordered by descendant). d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d1 d4 d2 d3 a3 d5 d5 d6 a4 d6 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4), (a1,d5), (a4,d5),

  12. Tree-Merge Join (ordered by descendant). d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d1 d4 d2 d3 a3 d5 d5 d6 a4 d6 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4), (a1,d5), (a4,d5), (a1,d6),(a4,d6).

  13. Which is more efficient? • Tree-Merge-anc: time and space complexity – O(|Alist| + |Dlist| + |OutputList|). • Note: it is not quadratic in input size. • However, Tree-Merge-desc has quadratic worst-case time complexity. • Saw some evidence in previous example. • Here is another “bad” input: a0 a1 a2 an What is amount of the work done by Tree-Merge-desc on this input? d1 d2 dn

  14. More analysis • What about finding (par,child) pairs? • Does the same upper bound apply for T-M-par? • Consider the input below. • The size of the o/p list is • O(|Alist| + |Dlist|). • What’s the amount of work done by • T-M-par on this input? a1 d2n d1 a2 d2 d2n-1 an dn dn+1 A breed of stack-tree SJ algorithms have been developed to overcome the deficiencies of T-M algorithms.

  15. Stack-Tree Join (ordered by descendant) d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d4 d1 a3 d2 d3 d5 a4 a1 d5 d6 d6 output

  16. Stack-Tree Join (ordered by descendant) d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d4 d1 a3 d2 d3 a2 d5 a4 a1 d5 d6 d6 output

  17. Stack-Tree Join (ordered by descendant) d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d4 d1 a3 d2 d3 a2 d5 a4 a1 d5 d6 d6 output (a1,d1), (a2,d1),

  18. Stack-Tree Join (ordered by descendant) d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d4 d1 a3 d2 d3 d5 a4 a1 d5 d6 d6 output (a1,d1), (a2,d1),

  19. Stack-Tree Join (ordered by descendant) d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d4 d1 a3 d2 d3 a3 d5 a4 a1 d5 d6 d6 output (a1,d1), (a2,d1),

  20. Stack-Tree Join (ordered by descendant) d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d4 d1 a3 d2 d3 a3 d5 a4 a1 d5 d6 d6 output (a1,d1), (a2,d1), (a1,d2), (a3,d2),

  21. Stack-Tree Join (ordered by descendant) d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d4 d1 a3 d2 d3 a3 d5 a4 a1 d5 d6 d6 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3),

  22. Stack-Tree Join (ordered by descendant) d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d4 d1 a3 d2 d3 d5 a4 a1 d5 d6 d6 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4),

  23. Stack-Tree Join (ordered by descendant) d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d4 d1 a3 d2 d3 a4 d5 a4 a1 d5 d6 d6 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4), (a1,d5), (a4,d5),

  24. Stack-Tree Join (ordered by descendant) d1 a1 a1 d2 d4 a2 a3 a2 d3 a4 d4 d1 a3 d2 d3 a4 d5 a4 a1 d5 d6 d6 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4), (a1,d5), (a4,d5),(a1,d6), (a4,d6). • Time & space complexity: O(|Alist| + |Dlist| + |Outputlist|). • (for both ad and pc relationships!) • Unlike T-M-anc, I/O complexity is similarly bounded (modulo • blocking factor). • Can handle streaming i/p lists: non-blocking algorithm. • Stack-Tree-anc is similar with similar bounds.

  25. Extensions • Can you adapt the SJ algorithms to handle SJ variants mentioned before? • Can you make the Tree-Merge algorithms more efficient, e.g., by bookkeeping? • We have seen, a TPQ = a sequence of joins on the results of SJs; what’s the best way to order these joins? Can we reuse join order optimization from relational DB optimization? (what’s a right cost model?) • What if (universal) quantifiers are present? • How can we handle aggregation?

More Related