1 / 80

Efficient Processing of Ordered XML Twig Pattern

Efficient Processing of Ordered XML Twig Pattern. by Jiaheng Lu, Tok Wang Ling, Tian Yu, Changqing Li, Wei Ni Presented by: Tian Yu 23, Aug 2005. Outline. Introduction and motivation Background XML tree and twig pattern matching Previous two algorithms: TwigStack and TwigStackList

salaam
Download Presentation

Efficient Processing of Ordered XML Twig Pattern

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Processing of Ordered XML Twig Pattern by Jiaheng Lu, Tok Wang Ling, Tian Yu, Changqing Li, Wei Ni Presented by: Tian Yu 23, Aug 2005

  2. Outline • Introduction and motivation • Background • XML tree and twig pattern matching • Previous two algorithms: TwigStack and TwigStackList • Our Ordered Twig Algorithms • Ordered Children Extension (for short OCE) • A generalized holistic matching algorithm: OrderedTJ • Experiments • Conclusion Efficient Processing of Ordered XML Twig Pattern

  3. Outline • Introduction and motivation • Background • XML tree and twig pattern matching • Previous two algorithms: TwigStack and TwigStackList • Our Ordered Twig Algorithms • Ordered Children Extension (for short OCE) • A generalized holistic matching algorithm: OrderedTJ • Experiments • Conclusion Efficient Processing of Ordered XML Twig Pattern

  4. Introduction • XML data representation rapidly increases popularity • XML documents modeled as ordered trees. • XML queries specify patterns of selection predicates on multiple elements having some structural relationships (parent-child, ancestor-descendant) Efficient Processing of Ordered XML Twig Pattern

  5. What is a Twig Pattern? • A twig pattern is a small tree whose nodes are tags, attributes or text values and edges are either Parent-Child (P-C) edges or Ancestor-Descendant (A-D) edges. • E.g. Query description: Selects Figure elements which are descendants of Paragraph elements which in turn are children of Section elements having child element Title • Twig pattern : Section Paragraph Title Figure Efficient Processing of Ordered XML Twig Pattern

  6. Motivation • XML documents modeled as ordered trees, it’s natural to have ordered queries. • Four ordered axes: following-sibling, preceding-sibling, following, preceding. • Example: ordered query: //book/title/following-sibling::chapter unordered query : //book/title/chapter Efficient Processing of Ordered XML Twig Pattern

  7. Order axis • Four axis: following-sibling, preceding-sibling, following, and preceding. • In the sample document: Set the context node to be f Context node: f Following of f:i and j Preceding of f: b, c and e Following-sibling of f: i Preceding-sibling of f: e a d b e f c i j g h Sample XML document Following-sibling of f = following of f and share the same parent with f Preceding-sibling of f = preceding of f and share the same parent with f Efficient Processing of Ordered XML Twig Pattern

  8. Ordered Twig Pattern • //chapter[title=“related work”]/following::section • Intuitive meaning: search for all the sections that appear after (but are not descendents of) chapter elements with the title “related work” in the XML document. • The query node Book is ordered Efficient Processing of Ordered XML Twig Pattern

  9. Ordered Twig Pattern • //chapter[title=“related work”]/following::section Efficient Processing of Ordered XML Twig Pattern

  10. Ordered Twig Pattern • //chapter[title=“related work”]/following::section If the twig pattern is unordered: section1, section2, and section3 are all matching elements. Efficient Processing of Ordered XML Twig Pattern

  11. Ordered Twig Pattern • //chapter[title=“related work”]/following::section But for ordered query, section1 and section2are not in the solution. How to know that in our method? Efficient Processing of Ordered XML Twig Pattern

  12. Motivation • Naïve Method: Use the existing algorithm to output the intermediate path solutions for each individual root-leaf query path Merge path solutions so that the final solutions are guaranteed to satisfy the order predicates of the query. • Disadvantage of the naïve method: Many intermediate results may not contribute to final answers. • Our Solution: efficient processing of ordered XML twig patterns. Efficient Processing of Ordered XML Twig Pattern

  13. Outline • Introduction and motivation • Background • XML tree and twig pattern matching • Previous two algorithms: TwigStack and TwigStackList • Our Ordered Twig Algorithms • Ordered Children Extension (for short OCE) • A generalized holistic matching algorithm: OrderedTJ • Experiments • Conclusion Efficient Processing of Ordered XML Twig Pattern

  14. XML Twig Pattern Matching • An XML document is commonly modeled as a rooted, ordered and taggedtree. book chapter preface chapter …………. “Intro” section section paragraph title section title paragraph paragraph “…” “Data” “…” “…” “XML” Efficient Processing of Ordered XML Twig Pattern

  15. Region Coding • Node Label1: (startPos, endPos, LevelNum) • E.g. (1,21,1) book (2,4,2) (13,20,2) (5,12,2) preface chapter chapter (3,3,3) (9,11,3) (17,19,3) “Intro” (6,8,3) (14,16,3) title section title section (7,7,4) (15,15,4) (18,18,4) (10,10,4) “Data” “Data” “…” “…” “…” M.P. Consens and T.Milo. Optimizing queries on files. In In Proceedings of ACM SIGMOD, 1994. Efficient Processing of Ordered XML Twig Pattern

  16. Region Coding Given e1, e2: e1 is ancestor of e2: iff e1.start < e2.start and e1.end > e2.end. (1,21,1) e1 book (2,4,2) (13,20,2) (5,12,2) preface chapter chapter (3,3,3) (9,11,3) (17,19,3) “Intro” (6,8,3) (14,16,3) title section title section e2 (7,7,4) (15,15,4) (18,18,4) (10,10,4) “Data” “Data” “…” “…” M.P. Consens and T.Milo. Optimizing queries on files. In In Proceedings of ACM SIGMOD, 1994. Efficient Processing of Ordered XML Twig Pattern

  17. Region Coding Given e1, e2: e1 is parent of e2: iff e1.start < e2.start and e1.end > e2.end , and e1.level + 1=e2.level (1,21,1) e1 book (2,4,2) (13,20,2) (5,12,2) e2 preface chapter chapter (3,3,3) (9,11,3) (17,19,3) “Intro” (6,8,3) (14,16,3) title section title section (7,7,4) (15,15,4) (18,18,4) (10,10,4) “Data” “Data” “…” “…” M.P. Consens and T.Milo. Optimizing queries on files. In In Proceedings of ACM SIGMOD, 1994. Efficient Processing of Ordered XML Twig Pattern

  18. Outline • Introduction and motivation • Background • XML tree and twig pattern matching • Previous two algorithms: TwigStack and TwigStackList • Our Ordered Twig Algorithms • Ordered Children Extension (for short OCE) • A generalized holistic matching algorithm: OrderedTJ • Experiments • Conclusion Efficient Processing of Ordered XML Twig Pattern

  19. Previous work: TwigStack • TwigStack2: a holistic approach • Two-phase algorithm: • Phase 1 TwigJoin: part of intermediate root-leaf paths are outputted • Phase 2 Merge: merge the intermediate paths to get the final results 2. N. Bruno, D. Srivastava, and N. Koudas. Holistic twig joins: optimal xml pattern matching. In In Proceedings of ACM SIGMOD, 2002. Efficient Processing of Ordered XML Twig Pattern

  20. Sub-optimality of TwigStack • TwigStack: optimal when the query contains only ancester-descendant relationship • If the query contains any parent-child relationship, TwigStack may output some intermediate path solutions that cannot contribute to final results. • We call that TwigStack is sub-optimal for queries with parent-child relationships. Efficient Processing of Ordered XML Twig Pattern

  21. TwigStackList • The main problem of TwigStack is to assume all edges are ancestor-descendant relationship in the first phase. So it is not efficient for queries with parent-child relationships. • Improved method: TwigStackList3 [CIKM 2004] • There is an additional list structure for each query node to cache elements that likely participate in final solutions. • TwigStackList3 is an improvement algorithm for TwigStack, since it considers parent-child relationships in the first phase. • TwigStackList is optimal when there is no P-C edge for branching nodes (a branch node is a node with more than one descendant or child) 3. J. Lu, T. Chen, and T. W. Ling. Efficient processing of xml twig patterns with parent child edges: a look-ahead approach. In CIKM, pages 533- 542, 2004. Efficient Processing of Ordered XML Twig Pattern

  22. TwigStackList v.s. TwigStack Root Twig Pattern An XML tree • TwigStack output the it output the “uesless” path solution < s1,t1>, since it doesn’t check for parent-child relationsihp. • TwigStackList has no uesless output. < s1,t1> is not in the output. section s2 s1 s1 title p2 t3 paragraph t1 p1 t1 No Parent-child relationship for branching node p3 t2 figure f1 f2 Efficient Processing of Ordered XML Twig Pattern

  23. Outline • Introduction and motivation • Background • XML tree and twig pattern matching • Previous two algorithms: TwigStack and TwigStackList • Our Ordered Twig Algorithms • Ordered Children Extension (for short OCE) • A generalized holistic matching algorithm: OrderedTJ • Experiments • Conclusion Efficient Processing of Ordered XML Twig Pattern

  24. Ordered Children Extension (OCE) • Definition: An element en(of Type n)has an OCE if: 1) In the query Q, for all A-D children of n (if any), n’, there is an element en’ (with tag n’) that is a descendant of en , and en’ also has an OCE; and 2) In the query Q, for all P-C children of n (if any), n’, there is an element e’ (with tag n) in the path en to en’ such that e’ is the parent of en’, and en’ also has an OCE; and 3) For each child (or descendant) n’of n, if there is an node m that isthe immediate rightSibling of n, there are elements en’ and em such that en’ is a child (or descendant) of element en, en’.end < em.start, and both en’ and emi have OCE. The first two conditions are guaranteed in twigStackList Our main focus is in the third condition Efficient Processing of Ordered XML Twig Pattern

  25. Ordered Children Extension (OCE) • Definition: Condition 3) For each child (or descendant) n’of n, if there is an node m that isthe immediate rightSibling of n, there are elements en’ and em such that en’ is a child (or descendant) of element en, en’.end < em.start, and both en’ and emi have OCE. en n > n’ m En’ em Ordered XML Query XML document Efficient Processing of Ordered XML Twig Pattern

  26. Ordered Children Extension (OCE) In an Ordered XML query: • If node n is ordered node: In order to find it’s OCE, all the three previous conditions must be checked. • If node n is an unordered node: In order to find it’s OCE, only the first two conditions need to be checked. The last condition does not apply. Efficient Processing of Ordered XML Twig Pattern

  27. Ordered Children Extension: Example 1 Document: Query: a1 a > e1 c1 e2 b c d b1 d1 Efficient Processing of Ordered XML Twig Pattern

  28. Ordered Children Extension: Example 1 Document: Query: a1 a > e1 c1 e2 b c d b1 d1 a1 has an OCE Efficient Processing of Ordered XML Twig Pattern

  29. Ordered Children Extension: Example 1 Document: Query: a1 a > e1 c1 e2 b c d b1 d1 a1 has an OCE 1) a1 has descendants b1and d1, and child c1(fulfill condition 1, 2 of OCE definition) 2) b1 has a right sibling element c1, and c1 has a right sibling element d1 (fulfill condition 3 of OCE definition) Efficient Processing of Ordered XML Twig Pattern

  30. Ordered Children Extension: Example 2 Document: Query: a1 a > e1 c1 b c d b1 d1 Efficient Processing of Ordered XML Twig Pattern

  31. Ordered Children Extension: Example 2 Document: Query: a1 a > e1 c1 b c d b1 d1 a1 doesn’t have any OCE Efficient Processing of Ordered XML Twig Pattern

  32. Ordered Children Extension: Example 2 Document: Query: a1 a > e1 c1 b c d b1 d1 a1 doesn’t have any OCE 1) a1 has descendants b1and d1, and child c1(fulfill condition 1, 2 of OCE definition) 2) b1 has a right sibling node c1 (fulfill condition 3 of OCE definition) 3) However, c1only has descendant of d1. There is no element with the labeld d that is a right sibling of element c1 (doesn’t satisfy condition 3 of OCE definition) Efficient Processing of Ordered XML Twig Pattern

  33. Outline • Introduction and motivation • Background • XML tree and twig pattern matching • Previous two algorithms: TwigStack and TwigStackList • Our Ordered Twig Algorithms • Ordered Children Extension (for short OCE) • A generalized holistic matching algorithm: OrderedTJ • Experiments • Conclusion Efficient Processing of Ordered XML Twig Pattern

  34. Data structure Each node n in the twig query has: Stream, List, and Stack • Data Stream: Tn • we partition an XML document into streams • All elements in a stream are of the same tag and ordered by their start Position • The elements in each stream is read only once from head to tail. a1 Level 1: Ta a1, a2, a3 a > a3 b2 a2 2: b1 , b2 b c d d1, d2, d3 Tb Td d3 d1 3: d2 b1 Tc C1 , C2 4: c2 c1 Document Efficient Processing of Ordered XML Twig Pattern

  35. Data structure Each node n in the twig query has: Stream, List, and Stack • List: Ln • The elements in lists help to check for P-C relationship • Elements in each list Ln are strictly nested from the first to the end, i.e. in the XML document, each element is an ancestor or parent of the following element. La a1, a2… a > Lb b1 .. b c d Ld d1 ,d3 Lc C1 Efficient Processing of Ordered XML Twig Pattern

  36. Data structure Each node n in the twig query has: Stream, List, and Stack • Stack: Sn • Stacks is used to store elements that have at least one OCE • Elements in the stack are potential solutions of the XML query. • When we insert an new element into a stack, the top element of the stack is popped out if the top of the stack doesn’t have A-D relationship with the new element. Sa a > b c d Sb Sd Sc Efficient Processing of Ordered XML Twig Pattern

  37. A holistic matching algorithm: OrderedTJ • We propose a general algorithm, OrderedTJ, that computes answers to an ordered query twig. • Our key focus is to check the ordered nodes in the query and find elements which has at least one OCE. Efficient Processing of Ordered XML Twig Pattern

  38. Main function • OrderedTJMain function operates in two phases. Efficient Processing of Ordered XML Twig Pattern

  39. Main function • OrderedTJMain function operates in two phases. Important function Phase 1 Phase 2 Phase 1: Parts of query root-leaf paths are output. The ordering requirements in the ordered query is checked. Phase 2: These solutions are merged-joined to compute the answers to the whole query. Efficient Processing of Ordered XML Twig Pattern

  40. getNext(n) • It gets the next stream to be processed and advanced Check Order Check P-C Efficient Processing of Ordered XML Twig Pattern

  41. An example of OrderedTJ algorithm b1 Document: c1 c2 c3 Book Query: > t1 t2 s2 s1 t3 s3 Chapter Section “Introduction” “Related work” “Algorithm” Title Book: b1 Chapter: c1, c2, c3 “Related work” Section: s1, s2, s3 Next Action: Title: t1, t2, t3 Partition an XML document into streams “related work” “Related work” Efficient Processing of Ordered XML Twig Pattern

  42. An example of OrderedTJ algorithm b1 Document: c1 c2 c3 Book Query: > t1 t2 s2 s1 t3 s3 Chapter Section “Introduction” “Related work” “Algorithm” Title Book: b1 Chapter: c1, c2, c3 “Related work” Section: s1, s2, s3 Title: t1, t2, t3 Next Action: Show lists for nodes with P-C child “related work” “Related work” Efficient Processing of Ordered XML Twig Pattern

  43. An example of OrderedTJ algorithm b1 Document: c1 c2 c3 Book Query: > t1 t2 s2 s1 t3 s3 Chapter Section “Introduction” “Related work” “Algorithm” Title Book: b1 Chapter: c1, c2, c3 “Related work” Section: s1, s2, s3 Title: Next Action: t1, t2, t3 Show Stacks of every node in the query “related work” “Related work” Efficient Processing of Ordered XML Twig Pattern

  44. An example of OrderedTJ algorithm b1 Document: t1 has no descendant “related work” c1 c2 c3 Book Query: > t1 t2 s2 s1 t3 s3 Chapter Section “Introduction” “Related work” “Algorithm” Title Book: b1 Chapter: c1, c2, c3 “Related work” Section: s1, s2, s3 Title: Next Action: t1, t2, t3 advance (Title) “related work” “Related work” Efficient Processing of Ordered XML Twig Pattern

  45. An example of OrderedTJ algorithm b1 Document: t2 has descendant “related work” c1 c2 c3 Book Query: > t1 t2 s2 s1 t3 s3 Chapter Section “Introduction” “Related work” “Algorithm” Title Book: b1 Chapter: c1, c2, c3 “Related work” Section: s1, s2, s3 Title: Next Action: t1, t2, t3 Insert t2 into the list of Title “related work” “Related work” Efficient Processing of Ordered XML Twig Pattern

  46. An example of OrderedTJ algorithm b1 Document: C1 has no descendant title that has child “related work” c1 c2 c3 Book Query: > t1 s1 t2 s2 t3 s3 Chapter Section “Introduction” “Related work” “Algorithm” Title Book: b1 t2 Chapter: c1, c2, c3 “Related work” Section: s1, s2, s3 Title: Next Action: t1, t2, t3 Advance (Chapter) “related work” “Related work” Efficient Processing of Ordered XML Twig Pattern

  47. An example of OrderedTJ algorithm b1 Document: C2 has a descendant t2 that has child “related work” c1 c2 c3 Book Query: > t1 t2 s2 s1 t3 s3 Chapter Section “Introduction” “Related work” “Algorithm” Title Book: b1 t2 Chapter: c1, c2, c3 “Related work” Section: s1, s2, s3 Title: Next Action: t1, t2, t3 Insert c2 into the list of chapter “related work” “Related work” Efficient Processing of Ordered XML Twig Pattern

  48. An example of OrderedTJ algorithm b1 Document: c1 s1 is not the following element of c2 c2 c3 Book Query: > t1 t2 s2 s1 t3 s3 c2 Chapter Section “Introduction” “Related work” “Algorithm” Title Book: b1 t2 Chapter: c1, c2, c3 “Related work” Section: s1, s2, s3 Title: Next Action: t1, t2, t3 Advance(Section) “related work” “Related work” Efficient Processing of Ordered XML Twig Pattern

  49. An example of OrderedTJ algorithm b1 Document: c1 c2 c3 Book Query: > s2 is not the following element of c2 t1 t2 s2 s1 t3 s3 c2 Chapter Section “Introduction” “Related work” “Algorithm” Title Book: b1 t2 Chapter: c1, c2, c3 “Related work” Section: s1, s2, s3 Title: Next Action: t1, t2, t3 Advance(Section) “related work” “Related work” Efficient Processing of Ordered XML Twig Pattern

  50. An example of OrderedTJ algorithm b1 is has an OCE b1 Document: c1 c2 c3 Book Query: > t1 t2 s2 s1 t3 s3 c2 Chapter Section “Introduction” “Related work” “Algorithm” Title Book: b1 t2 Chapter: c1, c2, c3 “Related work” Section: s1, s2, s3 Title: Next Action: t1, t2, t3 Push b1 into the stack of Book “related work” “Related work” Efficient Processing of Ordered XML Twig Pattern

More Related