1 / 86

On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques

On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques. Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen Ting, Ling Tok Wang. Outline. Background Define our problem: XML twig pattern matching

avital
Download Presentation

On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen Ting, Ling Tok Wang

  2. Outline • Background • Define our problem: XML twig pattern matching • Previous two algorithms: TwigStack and TwigStackList • Our holistic Twig Pattern Matching algorithms • Two Refined Indexing Schemes: Tag+Level and PPS • A generalized holistic matching algorithm: iTwigJoin • Experiments • Conclusion On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  3. XML Twig Pattern Matching • An XML document is commonly modeled as a rooted, ordered and taggedtree. book chapter preface chapter …………. “Intro” section section paragraph title section title paragraph figure paragraph “Data” figure figure “XML” On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  4. Regional Coding • Node Label1: (startPos: endPos, LevelNum) • E.g. book (0: 32, 1) preface (1:3, 2) chapter (4:29, 2) chapter(30:31, 2) section (5:28, 3) “Intro” (2:2, 3) section(18:23, 4) title: (6:8, 4) section(9:17, 4) paragraph(24:27, 4) paragraph(19:22, 5) title: (10:12, 5) “Data” (7:7, 3) figure (25:26, 5) paragraph(13:16, 5) figure (20:21, 6) “XML” (11:11, 3) figure (14:15, 6) M.P. Consens and T.Milo. Optimizing queries on files. In In Proceedings of ACM SIGMOD, 1994. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  5. What is a Twig Pattern? • A twig pattern is a small tree whose nodes are tags, attributes or text values and edges are either Parent-Child (P-C) edges or Ancestor-Descendant (A-D) edges. • E.g. Selects Figure elements which are descendants of Paragraph elements which in turn are children of Section elements having child element Title • XPath: Section[Title]/Paragraph//Figure • Twig pattern : Section Paragraph Title Figure On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  6. XML Twig Pattern Matching • Problem Statement • Given a query twig pattern Q, and an XML database D, weneed to compute ALL the answers to Q in D. • E.g. Consider Query and Document: • Query solutions: • (s1, t1, f1) • (s2, t2, f1) • (s1, t2, f1) Query: Section Document: s1 t1 s2 title figure t2 p1 f1 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  7. XML Twig Pattern Matching • Problem Statement • Given a query twig pattern Q, and an XML database D, weneed to compute ALL the answers to Q in D. • E.g. Consider Query and Document: • Query solutions: • (s1, t1, f1) • (s2, t2, f1) • (s1, t2, f1) Query: Section Document: s1 t1 s2 title figure t2 p1 f1 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  8. XML Twig Pattern Matching • Problem Statement • Given a query twig pattern Q, and an XML database D, weneed to compute ALL the answers to Q in D. • E.g. Consider Query and Document: • Query solutions: • (s1, t1, f1) • (s2, t2, f1) • (s1, t2, f1) Query: Section Document: s1 t1 s2 title figure t2 p1 f1 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  9. Outline • Background • Define our problem: XML twig pattern matching • Previous two algorithms: TwigStack and TwigStackList • Our holistic Twig Pattern Matching algorithms • Two Refined Indexing Schemes: Tag+Level and PPS • A generalized holistic matching algorithm: iTwigJoin • Experiments • Conclusion On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  10. Previous work: TwigStack • TwigStack2: a holistic approach • Each element in the document is labeled with region encoding labeling scheme. • The input data is the labels of all elements whose tags occur in the query twig. The output data is the matching solutions with the format of n-tuple, where n is the number of nodes in query. • For each node in the query, there exists a corresponding input stream. • Each label in a stream is scanned only once. That is, the cursor of each stream is not allowed to go back in any time. 2. N. Bruno, D. Srivastava, and N. Koudas. Holistic twig joins: optimal xml pattern matching. In In Proceedings of ACM SIGMOD, 2002. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  11. Previous work: TwigStack • TwigStack2: a holistic approach • Two-phase algorithm: • Phase 1 TwigJoin: intermediate root-leaf paths are outputted • Phase 2 Merge: merge the intermediate paths to get the final results 2. N. Bruno, D. Srivastava, and N. Koudas. Holistic twig joins: optimal xml pattern matching. In In Proceedings of ACM SIGMOD, 2002. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  12. Previous work: TwigStack • A node q in a twig pattern Q is associated with a stack Sq • Insertion and deletion in a stack Sq • Insertion: An element eq from stream Tq is pushed into its stack Sq if and only if • eq has a descendanteqi in each Tqi , where qi is a child of q • Each node eqi recursively has the first property • Deletion: An element eqis popped out from its stack if all matches involving it have been output. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  13. XML Twig Pattern Matching Document: Query: s1 t1 Section s2 f2 title figure t2 f1 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  14. XML Twig Pattern Matching Document: 1:12,1 Query: s1 2:3,2 10:11,2 t1 4:9,2 Section s2 f2 5:6,3 7:8,3 title figure t2 f1 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  15. XML Twig Pattern Matching Document: 1:12,1 Query: s1 2:3,2 10:11,2 t1 4:9,2 Section s2 f2 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section title (2:3,2), (5:6,3) figure (7:8,3), (10:11,2) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  16. XML Twig Pattern Matching Document: 1:12,1 Query: s1 2:3,2 10:11,2 t1 4:9,2 Section s2 f2 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section title (2:3,2), (5:6,3) figure (7:8,3), (10:11,2) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  17. XML Twig Pattern Matching Document: 1:12,1 Query: s1 2:3,2 10:11,2 1:12,1 t1 4:9,2 Section s2 f2 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section title (2:3,2), (5:6,3) figure (7:8,3), (10:11,2) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  18. XML Twig Pattern Matching Document: 1:12,1 Query: s1 2:3,2 10:11,2 1:12,1 t1 4:9,2 Section s2 f2 2:3,2 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section title (2:3,2), (5:6,3) figure (7:8,3), (10:11,2) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  19. XML Twig Pattern Matching Document: 1:12,1 Query: s1 2:3,2 10:11,2 1:12,1 t1 4:9,2 Section s2 f2 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section Output path solutions: <s1, t1> title (2:3,2), (5:6,3) figure (7:8,3), (10:11,2) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  20. XML Twig Pattern Matching Document: 1:12,1 Query: 4:9,2 s1 2:3,2 10:11,2 1:12,1 t1 4:9,2 Section s2 f2 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section Output path solutions: <s1, t1> title (2:3,2), (5:6,3) figure (7:8,3), (10:11,2) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  21. XML Twig Pattern Matching Document: 1:12,1 Query: 4:9,2 s1 2:3,2 10:11,2 1:12,1 t1 4:9,2 Section s2 f2 5:6,3 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section Output path solutions: <s1, t1> title (2:3,2), (5:6,3) figure (7:8,3), (10:11,2) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  22. XML Twig Pattern Matching Document: 1:12,1 Query: 4:9,2 s1 2:3,2 10:11,2 1:12,1 t1 4:9,2 Section s2 f2 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section Output path solutions: <s1, t1>, <s1,t2>,<s2,t2>, title (2:3,2), (5:6,3) figure (7:8,3), (10:11,2) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  23. XML Twig Pattern Matching Document: 1:12,1 Query: 4:9,2 s1 2:3,2 10:11,2 1:12,1 t1 4:9,2 Section s2 f2 7:8,3 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section Output path solutions: <s1, t1>, <s1,t2>,<s2,t2>, title (2:3,2), (5:6,3) figure (7:8,3), (10:11,2) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  24. XML Twig Pattern Matching Document: 1:12,1 Query: 4:9,2 s1 2:3,2 10:11,2 1:12,1 t1 4:9,2 Section s2 f2 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section Output path solutions: <s1, t1>, <s1,t2>,<s2,t2>, <s1,f1>,<s2,f1>, title (2:3,2), (5:6,3) figure (7:8,3), (10:11,2) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  25. XML Twig Pattern Matching Document: 1:12,1 Query: s1 2:3,2 10:11,2 1:12,1 t1 4:9,2 Section s2 f2 5:6,3 7:8,3 10:11,2 title figure t2 f1 (1:12,1), (4:9,2) Section Output path solutions: <s1, t1>, <s1,t2>,<s2,t2>, <s1,f1>,<s2,f1> title (2:3,2), (5:6,3) figure (7:8,3), (10:11,2) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  26. XML Twig Pattern Matching Document: 1:12,1 Query: s1 2:3,2 10:11,2 1:12,1 t1 4:9,2 Section s2 f2 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section Output path solutions: <s1, t1>, <s1,t2>,<s2,t2>, <s1,f1>,<s2,f1>,<s1,f2> title (2:3,2), (5:6,3) figure (7:8,3), (10:11,2) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  27. XML Twig Pattern Matching Document: 1:12,1 Query: s1 2:3,2 10:11,2 t1 4:9,2 Section s2 f2 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section Output path solutions: <s1, t1>, <s1,t2>,<s2,t2>, <s1,f1>,<s2,f1>,<s1,f2> Merge: <s1,t1,f1>,<s1,t1,f2>, <s1,t2,f1>,<s1,t2,f2>,<s2,t2,f1> title (2:3,2), (5:6,3) figure (7:8,3), (10:11,2) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  28. Sub-optimality of TwigStack • If the query contains any parent-child relationship, TwigStack may output some intermediate path solutions that cannot contribute to final results. • We call that TwigStack is sub-optimal for queries with parent-child relationships. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  29. Example: sub-optimality of TwigStack Document: 1:12,1 Query: s1 2:3,2 t1 4:9,2 Section s2 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section title (2:3,2), (5:6,3) figure (7:8,3) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  30. Example: sub-optimality of TwigStack Document: 1:12,1 Query: s1 2:3,2 1:12,1 t1 4:9,2 Section s2 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section title Because f1 and t1 are descendants of s1 , s1 is pushed to the stack. Note that f1 is not a child of s1. (2:3,2), (5:6,3) figure (7:8,3) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  31. Example: sub-optimality of TwigStack Document: 1:12,1 Query: s1 2:3,2 1:12,1 t1 4:9,2 Section s2 2:3,2 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section title (2:3,2), (5:6,3) figure (7:8,3) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  32. Example: sub-optimality of TwigStack Document: 1:12,1 Query: s1 2:3,2 1:12,1 t1 4:9,2 Section s2 5:6,3 7:8,3 title figure t2 f1 (1:12,1), (4:9,2) Section Output solution: <s1,t1>. But it is a useless intermediate solution and do not contribute to any final solution. title (2:3,2), (5:6,3) figure (7:8,3) On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  33. TwigStackList • The main problem of TwigStack is to assume all edges are ancestor-descendant relationship in the first phase. So it is not efficient for queries with parent-child relationships. • Alternative: TwigStackList3 [CIKM 2004] • TwigStackList3 is an improvement algorithm for TwigStack, which consider parent-child relationships in the first phase and identify a large query class to be optimal than TwigStack. 3. J. Lu, T. Chen, and T. W. Ling. Efficient processing of xml twig patterns with parent child edges: a look-ahead approach. In CIKM, pages 533- 542, 2004. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  34. Optimal class of TwigStack and TwigStackList O :optimal S: sub-optimal On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  35. Challenges (1) • Although TwigStackList enlarges the optimal query class of TwigStack, it still shows sub-optimal for a large class of twig query. • For example: two sub-optimal twig queries for TwigStackList : Section Section title figure title figure On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  36. Challenges (2) • In algorithms TwigStack and TwigStackList, to answer a twig query, they need to read labels for all elements whose tags occur in the query. • Can we accelerate the query processing by reading only parts of them ? Query: Document : Level 1: s1 Section Level 2: t1 title figure Level 3: …… f1 f2 fn There is no answerin the document,since no figure elements in level 2. But previous algorithms still need to read all figure elements in Level 3. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  37. Outline • Background • Define our problem: XML twig pattern matching • Previous two algorithms: TwigStack and TwigStackList • Our holistic Twig Pattern Matching algorithms • Two Refined Indexing Schemes: Tag+Level and PPS • A generalized holistic matching algorithm: iTwigJoin • Experiments • Conclusion On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  38. Our solution • We proposed two data streaming schemes: tag+leveland prefix path streaming. • Basic idea: Separate the elements with the same tag name to different streams • Tag+level: elements with the same tag and level are grouped together • Prefix path: elements with the same root-to-node path are grouped together On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  39. Two Refined Streaming Schemes(1) • Tag + Level: elements with the same tag and level are grouped together. Level1: a1 a Level2: a2 , a3 Level 1: a1 Level2: b2 b a3 b2 a2 Level3: 2: b1 c Level4: C1, C2 d3 d1 3: d2 b1 4: c2 c1 d d1 ,d2,d3 Level3: Document On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  40. Two Refined Streaming Schemes(2) • Prefix Path Streaming (PPS): elements with the same root-to-node path are grouped together. a: a1 a a/a: a2 , a3 Level 1: a1 a/b: b2 b a3 b2 a2 2: a/a/b: b1 a/a/b/c: C1 c d3 d1 3: d2 b1 a/b/d/c: C2 4: c2 c1 d1 , d2 a/a/d: d a/b/d: d3 Document On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  41. Two benefits of refined streaming schemes(1) • (1)Enlarge the optimal query classes • For example, considering the document and query, previous algorithms: TwigStack and TwigStackList will output one useless solution <s1,t1>. • But based on tag+level, <s1,t1> is not output, since we know there is no figure elements in level 2. S1 Level1: Level 1: Section s1 S2 Level2: Section s2 t1 2: t1 Level2: title title figure 3: f1 Level3: t2 t2 Document Query figure f1 Level2: On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  42. Two benefits of refined streaming schemes(2) • (2) Skip irrelevant elements • For the document and query, since there is no title elements in level 3, we may skip reading all figure elements in level 3. Document : Query: Level 1: s1 Section Level 2: t1 figure title Level 3: …… f1 f2 fn On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  43. Outline • Background • Define our problem: XML twig pattern matching • Previous two algorithms: TwigStack and TwigStackList • Our holistic Twig Pattern Matching algorithms • Two Refined Indexing Schemes: Tag+Level and PPS • A generalized holistic matching algorithm: iTwigJoin • Experiments • Conclusion On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  44. A general algorithm: iTwigJoin • We propose a general algorithm, called iTwigJoin , which can be used on various data streaming schemes. • Our key idea is to classify all current head elements to three classes: • Subtree-matching • Useless • Blocked On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  45. Classifying Head Elements • Subtree-Matching Element • Element e of tag E is called a subtree-matching element for queryQ • e is in a match to QE (QE is the sub-tree of Q rooted at E); and • NOT in any future match to QP where P is the parent of E in Q • Useless Element • Element e is called a useless element if e is not in any future match to QE. • Blocked Element • An element which is neither subtree-matching nor useless On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  46. Example 1: Classifying Head Elements (Tag+Level) a1 D: A Level1: a1 a Q1: Level2: a2 , a3 a3 b2 a2 D B Level2: b2 b d1 d2 b1 d3 Level3: b1 C c1 c2 c Level4: C1, C2 : head element d d1 ,d2,d3 Level3: On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  47. Example 1: Classifying Head Elements (Tag+Level) a1 D: A Level1: a1 a Q1: Level2: a2 , a3 a3 b2 a2 D B Level2: b2 b d1 d2 b1 d3 Level3: b1 C c1 c2 c Level4: C1, C2 : head element d d1 ,d2,d3 Level3: On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  48. Example 1: Classifying Head Elements (Tag+Level) a1 D: A Level1: a1 a Q1: Level2: a2 , a3 a3 b2 a2 D B Level2: b2 b d1 d2 b1 d3 Level3: b1 C c1 c2 c Level4: C1, C2 : head element d d1 ,d2,d3 Level3: On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  49. Example 2: Classifying Head Elements (Tag+Level) a1 D: A Level1: a1 a Q1: Level2: a2 , a3 a3 b2 a2 D B Level2: b2 b d1 d2 b1 d3 Level3: b1 C c1 c2 c Level4: C1, C2 : head element d d1 ,d2,d3 Level3: On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

  50. Example 2: Classifying Head Elements (Tag+Level) a1 D: A Level1: a1 a Q1: Level2: a2 , a3 a3 b2 a2 D B Level2: b2 b d1 d2 b1 d3 Level3: b1 C c1 c2 c Level4: C1, C2 : head element d d1 ,d2,d3 Level3: On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

More Related