1 / 37

Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach

Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach. Xin Zhang, Bradford Pielech and Elke A. Rundensteiner. XML. Relational Database. Flexible and powerful way to: Represent data on the web Exchange data between applications. 1) Widely used to store business data

manton
Download Presentation

Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach Xin Zhang, Bradford Pielech and Elke A. Rundensteiner DSRG, Worcester Polytechnic Institute

  2. XML Relational Database • Flexible and powerful way to: • Represent data on the web • Exchange data between applications 1) Widely used to store business data 2) Efficient, reliable, secure 3) Provides standard querying (SQL) XML and Relational The look and feel of an XML query system combined with the maturity and technology support of RDB Rainbow + DSRG, Worcester Polytechnic Institute

  3. XML Document Virtual XML Document Virtual XML Document Virtual XML Document Virtual XML Document Virtual XML Document Architecture Rainbow XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User Query Results in XML User XQuery XAT Executor XAT Optimizer SQL Generator XAT XAT XAT Tuples XAT View XAT View XAT View XQuery SQL RDBMS XAT: XML Algebra Tree DSRG, Worcester Polytechnic Institute

  4. GOAL: XQuery level optimization DSRG, Worcester Polytechnic Institute

  5. <results> <title>TCP/IP Illustrated</title> <title>Data on the Web</title> </results> <dxv> <book> <row> <bid>001</bid> <title>TCP/IP Illustrated</title> </row> <row> <bid>002</bid> <title>Data on the Web</title> </row> </book> <prices> <row> <bid>001</bid> <price>65.95</price> </row> <row> <bid>002</bid> <price>34.95</price> </row> </prices> </dxv> <prices> FOR $book IN document(“dxv.xml”)/book/row $prices IN document(“dxv.xml”)/prices/row WHERE $book/bid = $prices/bid RETURN <book> $book/title, $prices/price </book> </prices> Bid Title Bid Price 001 TCP/IP Illustrated 001 65.95 002 Data on the Web 002 34.95 Running Example <result> FOR $t IN document(“prices.xml”)/book/title RETURN $t </result> <prices> <book> <title>TCP/IP Illustrated</title> <price>65.95</price> </book> <book> <title>Data on the Web</title> <price>34.95</price> </book> </prices> DSRG, Worcester Polytechnic Institute

  6. XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery User XML Algebra Tree (XAT) User Query col3 1: <result> FOR $t IN document(“prices.xml”)/book/title RETURN $t </result> 2: T<results>$t</result>col3 Agg 3: R0, book/title$t 6: S”prices.xml”R0 7: DSRG, Worcester Polytechnic Institute

  7. XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery View XML Algebra Tree (XAT) View Query <prices> FOR $book IN document(“dxv.xml”)/book/row $prices IN document(“dxv.xml”)/prices/row WHERE $book/bid = $prices/bid RETURN <book> $book/title, $prices/price </book> </prices> col6=col7 26: $book, bidcol6 27: $prices, bidcol7 28:  31: T<book> [col10][col12] </book>col5 22: 23: T<prices>col5</prices>col4 11: $book, titlecol10 R3, /prices/row$prices R1, /book/row$book 20: 14: Agg 12: $prices, pricecol12 25: 15: S“dxv.xml” R3 S“dxv.xml” R1 21: DSRG, Worcester Polytechnic Institute

  8. XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery Merged XML Algebra Tree (XAT) User Query View Query col6=col7 col3 26: 1: 2: T<prices>col5</prices>col4 $book, bidcol6 T<results>$t</result>col3 11: 27: Agg 3: Agg $prices, bidcol7 12: 28: R0, book/title$t T<book> [col10][col12] </book>col5 6:  31: 22: col4 R0 23: 7: $book, titlecol10 R3, /prices/row$prices R1, /book/row$book 20: 14: $prices, pricecol12 25: 15: S“dxv.xml” R3 S“dxv.xml” R1 21: DSRG, Worcester Polytechnic Institute

  9. Outline • XAT Optimization: • XAT Rewrite • XAT Cleanup • Preliminary Evaluation • Related Work • Summary DSRG, Worcester Polytechnic Institute

  10. XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery XAT Rewrite • Query Optimization at Logic Level. • Goal: • Redundancy Elimination. • Computation Pushdown. • Technique: • Equivalence Rewrite Rules. • Heuristics: • Pushdown Navigates • Remove Construction of Intermediate Result • Combine Multiple Operators. DSRG, Worcester Polytechnic Institute

  11. Before Navigation Pushdown User Query View Query col6=col7 col3 26: 1: $book, bidcol6 2: T<results>$t</result>col3 27: Agg 3: $prices, bidcol7 28: T<book> [col10][col12] </book>col5  R0, book/title$t 6: 31: 22: col4 R0 23: 7: $book, titlecol10 R3, /prices/row$prices R1, /book/row$book 20: T<prices>col5</prices>col4 14: 11: $prices, pricecol12 25: 15: Agg S“dxv.xml” R3 S“dxv.xml” R1 12: 21: DSRG, Worcester Polytechnic Institute

  12. R3, /prices/row$prices 20: 14: R1, /book/row$book S“dxv.xml” R1 S“dxv.xml” R3 15: 21: After Navigation Pushdown User Query View Query T<book> [col10][col12] </book>col5 22: col3 1: col6=col7 26: 2: T<results>$t</result>col3  31: Agg 3: $prices, pricecol12 $book, titlecol10 23: 25: R0, book/title$t 6: $prices, bidcol7 $book, bidcol6 28: 27: T<prices>col5</prices>R0 11: Agg 12: DSRG, Worcester Polytechnic Institute

  13. $book, bidcol6 col3 1: 27: T<results>$t</result>col3 2: $book, title$t 23: Agg 3: 14: R1, /book/row$book S“dxv.xml” R1 15: After Tagger Cancel Out User Query View Query JOINcol6=col7 31: $prices, pricecol12 25: $prices, bidcol7 28: R3, /prices/row$prices 20: S“dxv.xml” R3 21: DSRG, Worcester Polytechnic Institute

  14. Outline • XAT Optimization • XAT Rewrite • XAT Cleanup • Preliminary Evaluation • Related Work • Summary DSRG, Worcester Polytechnic Institute

  15. XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery XAT Cleanup • Why: • SQL engine cannot reduce redundancy in XQuery. • How: • Data Redundancy by Schema Cleanup • Each operator produced, consumed and modified some columns. • Minimum schema is then computed. • Tree Redundancy by Unused Operator Cutting • Cutting matrix generation. • Required columns analysis. • Operator cutting. DSRG, Worcester Polytechnic Institute

  16. XAT Operator Properties • Produced • Desc: New column generated by operator. • Example: , S, T • Consumed • Desc: Columns required by operator. • Example: ,  • Modified • Desc: Columns modified by operator. • Example: , ,  DSRG, Worcester Polytechnic Institute

  17. Schema Computation Node Parent Produced Consumed Old Schema 1 {} {col3} {col3} 1: col3 2 1 {col3} {$t} {col3, R1, $book, col6, $t, R3, $prices, col7, col12} T<results>$t</result>col3 2: 3 2 {} {} {R1, $book, col6, $t, R3, $prices, col7, col12} Agg 3: 31 3 {} {col6, col7} {R1, $book, col6, $t, R3, $prices, col7, col12} col6=col7 31: 23 31 {$t} {$book} {R1, $book, col6, $t} 25: $prices, pricecol12 $book, title$t 23: 27 23 {col6} {$book} {R1, $book, col6} 14 27 {$book} {R1} {R1, $book} 28: $prices, bidcol7 $book, bidcol6 27: 15 14 {R1} {} {R1} 20: 25 31 {col12} {$prices} {R3, $prices, col7, col12} R3, /prices/row$prices R1, /book/row$book 14: 28 25 {col7} {$prices} {R3, $prices, col7} S“dxv.xml” R3 21: 20 28 {$prices} {R3} {R3, $prices} S“dxv.xml” R1 15: 21 20 {R3} {} {R3} DSRG, Worcester Polytechnic Institute

  18. Intuition: Don’t keep anything that’s not used later. Schema Computation # Parent() col3 $t col6 col7 $book R1 col12 $prices R3 New Schema 1: 1 C {col3} col3 2 1 P C {col3} T<results>$t</result>col3 2: 3 2 {$t} 31* 3 C C {$t} Agg 3: 23 31 P C {col6, $t} 27 23 P C {$book, col6} col6=col7 31: 14 27 P C {$book} 25: $prices, pricecol12 $book, title$t 23: 15 14 P {R1} 25 31 P C {col7, col12} 28: $prices, bidcol7 $book, bidcol6 27: 28 25 P C {$prices, col7} 20: R3, /prices/row$prices R1, /book/row$book 20 28 P C {$prices} 14: 21 20 P {R3} S“dxv.xml” R3 21: S“dxv.xml” R1 *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 15: DSRG, Worcester Polytechnic Institute

  19. Schema Cleanup Result DSRG, Worcester Polytechnic Institute

  20. XAT Cleanup • Schema Cleanup • Each operator produced, consumed and modified some columns. • Minimum schema is then computed. • Unused Operator Cutting • Cutting matrix generation. • Required columns analysis. • Operator cutting. DSRG, Worcester Polytechnic Institute

  21. Cutting Matrix • Purpose: • Get rid of the unused operators. • Equations: • Propagation of modified • Propagation of required • Identify cuttable node. DSRG, Worcester Polytechnic Institute

  22. Matrix Computation 1: col3 T<results>$t</result>col3 2: Agg 3: JOINcol6=col7 31: 25: $prices, pricecol12 $book, title$t 23: 28: $prices, bidcol7 $book, bidcol6 27: 20: R3, /prices/row$prices R1, /book/row$book *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute

  23. Intuition: Give me only the required columns in order to get the final result. Matrix Computation (Cont.1) 1: col3 # Parent() col3 $t col6 col7 $book R1 col12 $prices R3 Cut? T<results>$t</result>col3 2: 1 R R R R 2 1 P C 3 2 - M - - - - - - - Agg 3: 31* 3 C C 23 31 P C JOINcol6=col7 31: 27 23 P C 14 27 P C 25: $prices, pricecol12 $book, title$t 23: 15 14 P 25 31 P C 28: $prices, bidcol7 28 25 P C $book, bidcol6 27: 20 28 P C 20: 21 20 P R3, /prices/row$prices R1, /book/row$book *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute

  24. Matrix Computation (Cont. 2) 1: col3 T<results>$t</result>col3 2: Agg 3: JOINcol6=col7 31: 25: $prices, pricecol12 $book, title$t 23: 28: $prices, bidcol7 $book, bidcol6 27: 20: R3, /prices/row$prices R1, /book/row$book *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute

  25. XAT after Cutting 1: col3 col3 1: T<results>$t</result>col3 2: Agg T<results>$t</result>col3 3: 2: Agg JOINcol6=col7 31: 3: Reduced To 25: $prices, pricecol12 $book, title$t $book, title$t 23: 23: 28: R1, /book/row$book $prices, bidcol7 14: $book, bidcol6 27: 20: S“dxv.xml” R1 R3, /prices/row$prices 15: R1, /book/row$book 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute

  26. XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery SQL Generated SELECT “$book”.title as “$t”, “$book”.bid as “col6”, “$prices”.price as “col12”, “$prices”.bid as “col7” FROM book “$book”, prices “$prices” WHERE “col6”=“col7” 1: col3 col3 1: 2: T<results>$t</result>col3 T<results>$t</result>col3 2: Agg 3: Agg 3: 31: JOINcol6=col7 $book, title$t 23: 25: $prices, pricecol12 23: $book, title$t R1, /book/row$book 14: 28: $prices, bidcol7 27: $book, bidcol6 S“dxv.xml” R1 15: 20: R3, /prices/row$prices 14: R1, /book/row$book SELECT “$book”.title as “$t”, FROM book “$book”, S“dxv.xml” R3 21: 15: S“dxv.xml” R1 DSRG, Worcester Polytechnic Institute

  27. Outline • XAT Optimization • XAT Rewrite • XAT Cleanup • Preliminary Evaluation • Related Work • Summary DSRG, Worcester Polytechnic Institute

  28. Preliminary Evaluation • Experiment Setup • XQuery over Kweelt Parser • PIII800 256 MB, Win 2k Pro. • Data Setup • Synthetic Data • Synthetic Queries • Query Execution • Native XML Engine. DSRG, Worcester Polytechnic Institute

  29. Performance Gain in Execution DSRG, Worcester Polytechnic Institute

  30. Query Engine Overhead XAT Optimizer XAT Generator XAT Decorrelator XAT Merger User XAT User XAT XAT Rewrite User XQuery XAT XAT XAT XAT Executor SQL Generator XAT View XAT View XAT XAT Cleanup View XQuery Total: 32,522 ms DSRG, Worcester Polytechnic Institute

  31. Outline • XAT Optimization • XAT Rewrite • XAT Cleanup • Preliminary Evaluation • Related Work • Summary DSRG, Worcester Polytechnic Institute

  32. Related Work • Rainbow: • Optimize on XAT. (static analysis) • Algebra level rewriting. • SQL Optimization • Algebra based optimization. • Static analysis. • XQuery by Views: Optimize in SQL. • XPERANTO[VLDBJ2000]: XQGM vs. XAT • Extension by UDFs for XML features. • SilkRoute[IEEE2001(24:2)]: • Generate SQL Efficiently. • AGORA[VLDB2000]: • Syntax level rewriting. DSRG, Worcester Polytechnic Institute

  33. Summary • Efficient XQuery Processing • XML Algebra Tree (XAT) • XAT Optimization: • Rewrite by using equivalent rules • Cleanup • Schema cleanup • Operator cutting • Prototype system implementation. DSRG, Worcester Polytechnic Institute

  34. Questions?(Futures!) http://davis.wpi.edu/dsrg/rainbow https://sourceforge.net/projects/rainbow-engine/ Special Thanks: Brian Murphy, Luping Ding, DSRG group. DSRG, Worcester Polytechnic Institute

  35. XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer SQL Generator XAT XAT XAT XAT View XAT View XAT View XQuery DSRG, Worcester Polytechnic Institute

  36. Schema Computation col3 1: T<results>$t</result>col3 2: Agg 3: col6=col7 31: $prices, pricecol12 25: $book, title$t 23: $prices, bidcol7 28: $book, bidcol6 27: R3, /prices/row$prices 20: R1, /book/row$book 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute

  37. $book, bidcol6 col3 1: 27: $prices, pricecol12 25: T<results>$t</result>col3 2: $prices, bidcol7 28: Agg 3: R3, /prices/row$prices 20: 14: R1, /book/row$book S“dxv.xml” R1 S“dxv.xml” R3 15: 21: After Tagger Cancel Out User Query View Query col6=col7 26:  31: $book, title$t 23: DSRG, Worcester Polytechnic Institute

More Related