310 likes | 337 Views
Explore the XQuery implementation in a relational database system, focusing on SQL Server 2005's XML support, operators, indexing, and compilation processes for optimal query performance.
E N D
XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys,Gideon Schaller, Wei Yu, Dragan Tomic, Adrian Baras, Brandon Berg, Denis Churin, Eugene Kogan SQL Server Microsoft Corp
Overview • Background • XML Support in SQL Server 2005 • OrdPath labeling of XML nodes • XML indexes – PATH, VALUE, PROPERTY • Main topic – XQuery compilation • Architecture • XML operators • Mapping XML operators to relational+ ops • Conclusions S. Pal et al.
BackgroundXML Support in SQL Server 2005 Create table DOCS ( ID int primary key, XDOC xml) • XML stored in an internal, binary form (‘blob’) • Optionally typed by a collection of XML schemas • Used for storage and query optimizations • 3 of 5 methods on XML data type: • query(): returns XML type • value(): returns scalar value • exist(): checks conditions on XML nodes • XML indexing • More information at http://msdn.microsoft.com/xml S. Pal et al.
BackgroundXQuery embedded in SQL • Retrieve section titles from <book> wrapped in new <topic> elements: SELECT ID, XDOC .query(' for $s in /BOOK/SECTION return <topic> {data($s/TITLE)} </topic> ') FROM DOCS S. Pal et al.
BackgroundXQuery – supported features • XQuery clauses “for”, “where”, “return” and “order by” • XPath axes – child, descendant, parent, attribute, self and descendant-or-self • Functions – numeric, string, Boolean, nodes, context, sequences, aggregate, constructor, data accessor • SQL Server extension functions to access SQL variable and column data within XQuery • Numeric operators (+, -, *, div, mod) • Value comparison operators (eq, ne, lt, gt, le, ge) • General comparison operators (=, !=, <, >, <=, >=) S. Pal et al.
Background [SIGMOD04] ORDPATH Label of Nodes • node1 is ancestor of node2 • ORDPATH (node1) is prefix • of ORDPATH (node2) BOOK 1 @ISBN 1.1 Section 1.3 Section 1.5 Title 1.3.1 Figure 1.3.3 Title 1.5.1 Figure 1.5.3 • node1 precedes node2 in document order • ORDPATH (node1) < ORDPATH (node2) ORDPATH(1.3) ≤ id < Descendant_Limit (1.3) = 1.4 S. Pal et al.
Background [VLDB 2004]Indexing XML column • Primary XML index on an XML column • Creates B+tree tree on data model content of the XML nodes • Adds column Path_ID for the reversed, encoded path from each XML node to root of XML tree • OrdPath labeling schema is used for XML nodes • Relative order of nodes • Document hierarchy S. Pal et al.
Background XML example INSERT INTO myTable VALUES (7, ‘<Book xmlns="myns" ISBN = "1-55860-3612"> <Section> <Title>Bad Bugs</Title> </Section> <Section> <Title> Tree frogs </Title> <Figure>…</Figure> </Section> </Book>’) S. Pal et al.
Background Primary XML Index Entries • Encoding of tags & types stored in system meta-data - Additional details not shown Clustering key S. Pal et al.
Background Secondary XML indexes • To speed up different classes of commonly occurring queries • Statistics created on key columns of the primary and secondary XML indexes • Used for cost-based selection of secondary XML indexes S. Pal et al.
Background Handling Types • If XML column is typed • Values are stored in XML blob and XML indexes with appropriate typing • Untyped XML • Values are stored as strings • Convert to appropriate types for operations • SQL typed values stored in primary XML index • Most SQL types are compatible with XQuery types (integer) • Value comparisons on XML index columns suffice • Some types (e.g. xs:datetime) are stored in internal format and processed specially S. Pal et al.
XQuery Processing Architecture • XQuery Compiler: • Parses XQuery expr • Checks static type correctness • Type annotations • Applies static optimiztns • Path collapsing • Rewrites using XML schemas • XML Operator Mapper • Recursively traverses XML algebra tree • Converts each XmlOp to reln+ operator sub-tree • Mapping depends upon existence of primary XML index XQuery expression XQuery Compiler XML algebra tree (XmlOp ops) XMLOperatorMapper Relational Operator Tree (relational+ operators) Reln Query Processor S. Pal et al.
Examples of XML Operators S. Pal et al.
XMLOperatorMapping – Overview PATH Index XQUERY PK OrdPath PK XML Primary XML Index VALUE Index PROPERTY Index Special handling for SELECT * | XDOC REL+ tree S. Pal et al.
New operators • Some produce N rows from M (≠ N) rows • XML_Reader – streaming, pull-model XML parser • XML_Serializer – to serialize query result as XML • Some are for efficiency • Contains – to evaluate XQuery contains() • TextAdd – to evaluate the XQuery function string() • Data – to evaluate XQuery data() function • Some are for specific needs • Check – validate XML during insertion or modification S. Pal et al.
XMLOperatorMapping • Following categories: • Mapping of XPath expressions • Mapping of XQuery expressions • Mapping of XQuery built-in functions S. Pal et al.
Non-indexed XML, Full Path XML operator tree: XmlOp_Path PATH = “/BOOK/SECTION” Rel+ operator tree: XML_Serialize XML_Reader (XDOC, “/BOOK/SECTION”) • XML_Reader produces subtrees of <SECTION> • Node table rows • Contains OrdPath • No PK or PATH_ID • XML_Serialize reassembles those row into XML data type • To output result S. Pal et al.
Sample query execution using Primary XML Index • /Book/Section #3#1 (by XML Op Mapper) Clustering key S. Pal et al.
Indexed XML, Full Path XML_Serialize • XmlOp_Path mapped to SELECT • GET(PXI) – rows from primary XML index • Match PATH_ID • Not shown: • JOIN with base table on PK Assemble Subtree Apply Select ($b) Select GET (PXI) $b.OrdP ≤ OrdP< DL($b) GET (PXI) Path_ID=#SECTION#BOOK S. Pal et al.
XML index – PATH • Speeds up path evaluations • Example – /Book/Section #3#1 S. Pal et al.
Indexed XML, Imprecise Paths /BOOK/SECTION// TITLE • Matched using LIKE operator on Path_ID XML_Serialize Apply Assemble subtree of <TITLE> Select ($s) GET (PXI) Path_ID LIKE #TITLE%#SECTION#BOOK S. Pal et al.
Predicate Evaluation XML_Serialize /BOOK[@ISBN = “12”] • Search value compared with VALUE column in PXI • Collapsed path /BOOK/@ISBN • Induce index seeks • Reduce intermediate result size • Parent check – Par($b) • Using OrdPath • Value conversion might be needed Apply Apply Assemble subtree of <BOOK> Select ($b) Select Path_ID=#@ISBN#BOOK & VALUE=“12” & Par($b) GET (PXI) GET (PXI) Path_ID=#BOOK S. Pal et al.
Ordinal Predicate • /BOOK[n] • Adds ranking column to the rows for <BOOK> elements • Retrieves the nth <BOOK> node • Special optimizations • [1] TOP 1 ascending • [last()] TOP 1 descending • Avoids sorting when input is sorted • Example – in XML_Serializer S. Pal et al.
Error handling • Static type errors at compilation time • Raises static type errors if an expression could fail at runtime due to type safety violation • Addition of string to integer • Querying non-existent node name in typed XML • Non-singleton in “eq” • Some can be fixed using explicit cast or ordinal specification • Dynamic error converted to empty sequence • Yields correct result in predicates without negations S. Pal et al.
“for” Iterator XML_Serialize for $s in /BOOK//SECTION where $s/@num >= 3 return $s/TITLE • XML op for “for” is XmlOp_Apply • Maps to APPLY • Binds $s and iterates over <SECTION> • Determines its <TITLE> children • Nested “for” and “for” with multiple bindings turn into nested APPLY • Each APPLY binds to a different variable Apply Assemble <SECTION> Apply ($s) Path_ID LIKE #TITLE#SECTION%#BOOK & Par($s) Select Exists Select ($s) Select GET(PXI) GET (PXI) Path_ID LIKE #@num#SEC%#BK & VALUE >= 3 & Par($s) Path_ID LIKE #SECTION%#BOOK S. Pal et al.
XQuery “order by” and “where” • Order by: • Sorts rows based on order-by expression • Adds a ranking column to these rows • Ranking column converted into OrdPath values • Yield the new order of the rows • Fits rest of query processing framework • Where • Becomes SELECT on input sequence • Filters rows satisfying specified condition S. Pal et al.
XQuery “return” • Return nodes sequence in document order • Use OrdPath values and • XML_Serialize operator • New element and sequence constructions • Merge constructed and existing nodes into a single sequence (SWITCH_UNION) S. Pal et al.
XQuery Functions & Operators • Built-in fn and op are mapped to relational fn and op if possible • fn:count() count() • Additional support for XQuery types, functions and operators that cannot be mapped directly • Intrinsics S. Pal et al.
Optimizations • Exploiting Ordered Sets • Sorting information (OrdPath) made available to further relational operators • XML_Serialize is an example • Using static type information • Eliminates CONVERT() in operations • Allows range scan on VALUE index S. Pal et al.
Conclusions • Built-up infrastructure for query processing framework • Other XQuery features (such as “let” and typeswitch) can be implemented • Data modification language • Fits into relational query processing framework • XQuery features can be implemented using rel++ operators • Optimizations pose the biggest challenges • More cost-based optimizations can be done • Enhanced costing model (e.g. choice of PXI) • Matching materialized views S. Pal et al.
Thank you! S. Pal et al.