1 / 44

Pattern tree algebras: sets or sequences?

Pattern tree algebras: sets or sequences?. Stelios Paparizos , H. V. Jagadish University of Michigan Ann Arbor, MI USA. Outline. XML and XQuery Order and Duplicates Document Order OrderBy Clause Binding Order Duplicates and XQuery Hybrid Collections Correct Output Order

keola
Download Presentation

Pattern tree algebras: sets or sequences?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA

  2. Outline • XML and XQuery Order and Duplicates • Document Order • OrderBy Clause • Binding Order • Duplicates and XQuery • Hybrid Collections • Correct Output Order • Thinking Efficiently • Experimental Evaluation • Final Words

  3. Document Order Usage • Provides capability to re-establish the original document information Example: Return authors of book with title = “Grilling…” FOR $b IN document(t)//book WHERE $b/title = “Grilling for amateurs” RETURN $b/author <author> Mario </author> <author> Stelios </author> <author> Alton </author>

  4. Document Order • Implicit, derived from XML data model • The order in which data is represented in a document is important information • Requires original XML order representation within a single document • Requires an order amongst documents during a single execution of a query • Enforced on every XPath expression and every sequence operation e.g. Union

  5. ORDER BY Clause Order • Explicit specification with ORDER BY clause • Results sorted using item’s value Example: Return all books sorted by year of publication XQuery: FOR $b IN document(t)//book ORDER BY $b/year RETURN $b SQL: SELECT book FROM t ORDER BY year

  6. Binding Order Usage • Provides mechanism to produce results in multiple document orders Example: Return books and articles with the same author, order the results by document order of book, article article, book FOR $b IN document(t)//book FOR $a IN document(t)//article WHERE $b/author = $a/author RETURN ($b, $a) FOR $a IN document(t)//article FOR $b IN document(t)//book WHERE $b/author = $a/author RETURN ($b, $a) Results book1 – article1 book2 – article1 book1 – article2 book2 – article2 book2 – article3 book1 – article1 book1 – article2 book2 – article1 book2 – article2 book2 – article3

  7. Binding Order • Implicit, derived from the way the query is typed by the user • Results are sorted based on the order variables are bound • Uses multiple document orders

  8. XQuery and Duplicates • XQuery operates on duplicate-free sequences • LET clause creates binding to sequence of matching elements • FOR clause creates binding to each element of sequence of matching elements • Hence, XQuery requires all duplicates to be removed at variable binding

  9. Outline • XML and XQuery Order and Duplicates • Hybrid Collections • Correct Output Order • Thinking Efficiently • Experimental Evaluation • Final Words

  10. Dilemma: Use Sequences or Sets (or Bags or …) • Sets lose all ordering information • Order can be important in intermediate steps • Sequences are expensive to manipulate • Optimization possibilities can be restricted • Both sets and sequences are duplicate-free • Duplicate elimination can be costly procedure that should be avoided when possible

  11. Solution: Use Hybrid Collections • A Hybrid Collection can have duplicate semantics that varies between a bag and a set and order semantics that varies between a set and a sequence • Duplicate Specification • Ordering Specification

  12. Duplicate Specification (D-Spec) • Given a collection of trees CT, D-Spec describes how duplicates were removed from the collection • Possible Parameter Values: • “empty”: Duplicates can be present • “tree”: Duplicates were removed using deep-tree comparison amongst trees in CT • List of Nodes u: Duplicates were removed using a comparison of the nodes referred by “u” in each tree in CT

  13. Duplicate Specification Example

  14. Ordering Item (O-Item) • Minimum unit used when sorting a collection CT • Parameters: • Reference to sort by node • Ascending (‘asc’) or descending (‘desc’) • Empty greater (‘g’) or empty least (‘l’) for trees without a matching node • Example: O-Item (B, asc, l)

  15. Ordering Specification (O-Spec) • Given a collection CT, O-Spec describes how the trees are sorted in the collection • It accepts as parameter an ordered list of Ordering-Items • Sorting took place in the order O-Items are specified

  16. Ordering Specification Example “Fully-ordered” “Partially-ordered” “any order”

  17. Outline • XML and XQuery Order and Duplicates • Hybrid Collections • Correct Output Order • Thinking Efficiently • Experiments • Final Words

  18. TLC-C Correct Output Algorithm

  19. TLC-C Basic Principles • Duplicate behavior is correct with sets • Document order is modeled by our node identifiers • Pattern tree matches return information in document order • ORDER BY clause is mapped to a list of ordering items and a sort operation • Binding order is determined during parsing by tracking how the query was typed • A sort operation is used at the end of each single block FLWOR statement to capture the binding order

  20. Binding Order Example FOR $b IN document(“lib.xml”)//book FOR $a IN $b/author FOR $e IN $b/editor FOR $h IN $e/hobby FOR $i IN $a/interest RETURN $b Orderlist: 2, 3, 5, 6, 4 Algebraic plan (TLC)

  21. Binding Order Example FOR $b IN document(“lib.xml”)//book FOR $a IN $b/author FOR $e IN $b/editor FOR $h IN $e/hobby FOR $i IN $a/interest RETURN $b Orderlist: 2, 3, 5, 6, 4 Algebraic plan with correct output order (TLC-C)

  22. Outline • XML and XQuery Order and Duplicates • Hybrid Collections • Correct Output Order • Thinking Efficiently • Enhancing an algebra with Hybrid Collections • Minimizing Duplicate Elimination procedures • Selections and Ordering • Nested Queries and Ordering • Experimental Evaluation • Final Words

  23. Operators with Ordering (example) • Select S[apt, ord](CT): produces the matches of the annotated pattern tree (apt) on the input collection CT • New parameter ord is used for ordering • ‘empty’, unspecified order • ‘maintain’, preserve order of input CT • ‘list-resort u’, destroy order of CT and resort using input list of node references u • ‘list-add u’, preserve order of input CT and sort ties using input list of node references u

  24. Algebraic Identities (example) • Select S and Sort O can be merged • O[ol](S[any, any](…)) ↔ S[any, ol](…) • Select S and Sort O can be swaped • O[ol](S[any, maintain](…)) ↔ S[any, maintain](O[ol](…))

  25. Minimize Duplicate Eliminations Step 1: Remove redundant duplicate elimination procedures Step 2: Explore partial duplicate specifications to further minimize duplicate elimination procedures

  26. Minimize DEs Step 1 Example FOR $o IN document(“auction.xml”)//open_auction WHERE count($o/bidder) > 5 RETURN <result> {$o/quantity} {$o/type} </result> From 6 DE procedures to 1

  27. Minimize DEs Step 2 Example FOR $o IN document(“auction.xml”)//open_auction WHERE count($o/bidder) > 5 RETURN <result> {$o/quantity} {$o/type} </result> DE procedure is modified to DE: ID(2). Then using algebraic rewrites is eliminated completely.

  28. Selections and Ordering For “selection” type queries, use algebraic rewrites and push the sort down to the select operator.

  29. Selections and Ordering Example FOR $b IN document(“lib.xml”)//book FOR $a IN $b/author FOR $e IN $b/editor FOR $h IN $e/hobby FOR $i IN $a/interest RETURN $b Push Sort into Select using algebraic identities. Optimizer can plan Select operator without having the forced blocking sort at the end.

  30. Joins and Ordering Example FOR $a IN document(t)//article FOR $b IN document(t)//book WHERE $b/author = $a/author RETURN ($b, $a) Algebraic plan with correct output order (TLC-C)

  31. Joins and Ordering Example Push Sort into Join using algebraic identities.

  32. Joins and Ordering Example Push Sort further down into Selects using algebraic identities.

  33. Nested Queries and Ordering FOR $b IN document(“lib.xml”)/book LET $k := FOR $a IN document(“lib.xml”)/article WHERE $b/author = $a/author AND $a/conf = “VLDB” RETURN $a WHERE $b/year = 1999 RETURN <result> {$b} {$k} </result> Algebraic plan with correct output order (TLC-C)

  34. Nested Queries and Reorder Rewrite Sort and blocking Join to Reorder operation.

  35. Outline • XML and XQuery Order and Duplicates • Hybrid Collections • Correct Output Order • Thinking Efficiently • Experimental Evaluation • Final Words

  36. Experimental Setup • Timber System • 128MB buffer pool • Value index when necessary (not for all queries) • Intel Pentium III-M 866 Mhz • Windows 2000 professional • IDE Hard Drive • 512MB RAM • XMark dataset factor 1 • 707MB total space (472MB data + 241MB index)

  37. Minimizing Duplicate Eliminations x17 more selective x19 less selective q2 value join

  38. Selections and Ordering x13 simple output x17 more selective x19 less selective

  39. Join and Ordering q1 less selective q2 more selective x3 less selective

  40. Nested Queries and Ordering

  41. Ordering and Duplicate Optimizations x19 selection q2 value join X8 nested query

  42. Outline • XML and XQuery Order and Duplicates • Hybrid Collections • Correct Output Order • Thinking Efficiently • Experimental Evaluation • Final Words

  43. Related Work • Relational Systems recognize smart sort placement as a problem • D. Simmen, E. Shekita, and T. Malkemus. Fundamentaltechniques for order optimization. In Proc.SIGMOD Conf., 1996 • XML Navigational-based approach has study of ordering requirements in: • J. Hidders and P. Michiels. Avoiding unnecessary orderingoperations in XPath. In Proc. DBPL Conf.,2003. • XML Algebraic-based approaches use sets or sequences. Aside from the performance limitations, it is unknown whether they fully address the XQuery binding order to produce correct results.

  44. Final Words • Ordering in XQuery is a complex procedure with significant performance ramifications • Introduced Hybrid Collections with Ordering Specification as means to a correct and flexible solution • Similar path for Duplicates • Showed algebraic optimizations that take advantage of provided flexibility • Demonstrated experimentally the performance increase

More Related