1 / 40

Storing and Querying Ordered XML Using Relational Database System

Storing and Querying Ordered XML Using Relational Database System. Swapna Dhayagude. Agenda. Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation. Ordered XML Data Model. XML document as a tree structure

Download Presentation

Storing and Querying Ordered XML Using Relational Database System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude

  2. Agenda • Ordered XML Data Model • Order Encoding Methods • Shredding Ordered XML into Relations • Translating XML queries to SQL • Performance Evaluation

  3. Ordered XML Data Model • XML document as a tree structure - Relation as the ‘root’ - Nodes represent elements - Leaf nodes hold data values • Document Type Descriptor - schema information about the XML document • Order - a salient feature of an XML document

  4. Significance of order in XML Order – • Important from the point of view of reconstruction of XML documents • To ensure a lossless mapping from XML to RDB • Performance issues - Choice of order dramatically affects performance - Enhances Efficient Translation of XML into SQL Order based functionality of XPath and XQuery XPath – a simple ‘path based’ query language XQuery – a complex query language based on XPath

  5. Three dimensions of XML order • Evaluation of Order based axes XPath expressions requiring document order • preceding • following • Inter Element Order result set enforces document order among result set elements • Intra Element Order For reconstruction, document order is important

  6. Agenda • Ordered XML Data Model • Order Encoding Methods • Shredding Ordered XML into Relations • Translating XML queries to SQL • Performance Evaluation

  7. How is order encoded ? • Order is preserved using a simple numbering scheme • Each node is represented using a node_id • Node-id is stored as a data value within the relation • Numbering schemes capture enough information to reconstruct XML documents

  8. Order Based Functionality of XPath • XPath follows a step-by-step sequential evaluation, • Each step is applied to a single node (context node) • Result of each step is a set of nodes {node1,node2,..,node n} XPath syntax Path :: = /Step1/Step2/…/StepN Where each Xpath Step is defined as follows: Step :: = Axis :: Node-test Predicate* Axis selects a direction of navigation e.g. child :: title Would select all children that are ‘titles’

  9. Order Based Functionality of XPath Axes – specify the direction of navigation in an XML document • Up • parent • ancestor • Down • child • descendant • Left • preceding • Preceding-sibling • Right • following • Following-sibling

  10. Order Based Functionality of XQuery • BEFORE operator - Return nodes from the first sequence that are before some node in the second sequence • AFTER operator - Return nodes from the first sequence that are after some node in the second sequence • XQuery supports range predicates - allows selection of a range of elements from a sequence e.g. /play/act[2 TO 4] Will return act #2 ,act #3, and act #4 in document order.

  11. Global Order Encoding Methods • Global Order Encoding • Absolute positioning of nodes • Best performance on queries - • Query evaluation requires simple comparison between node positions • Worst performance on updates, especially deletes play(1) title(2) act(4) act(8) text#(3) title(5) scene(7) text#(6)

  12. Global Order Encoding (contd) • Initially, sparse numbering is used for Global Order Encoding • Sparse numbering brings down the cost of renumbering (on inserts/updates) • Sparse numbering results in better performance on updates • Makes intra-element and inter-element ordering easy (since global document order is easily available) • Drawback - performs poorly on inserts (Local Order offers better performance for inserts/updates)

  13. Global Order Renumbering Scenario • Inserting a new element in an existing document causes many nodes to be renumbered • In the adjoining figure, the highlighted nodes need to be renumbered (maximum in the global ordering scheme) play(1) title(2) act(8) text#(3) act(4) scene(7) New Element title(5) scene(7)

  14. Local Order Encoding Methods • Local Order Encoding • Relative positioning of nodes • Best performance on updates • Worst performance on queries play(1) title(1) act(2) act(3) text(1) title(1) scene(2) text(1)

  15. Local Order Encoding (continued….) How does local Order encoding reconstruct absolute path ? • the relative position of a node is combined with the relative order of the parent • this combined effect yields a vector that uniquely identifies the absolute position within the document (relative position of node) + (relative position of ancestor) = (absolute position of node in the document)

  16. Local Order Renumbering Scenario • As opposed to Global Order Encoding, Local Order requires a minimum number of nodes to be renumbered • This is a major advantage, since it dramatically reduces the cost of inserts play(1) title(1) act(2) text#(1) act(2) scene(1) New Element title(1) scene(2)

  17. Local Order Encoding (continued….) • Incurs low overhead on updates • Only “following-sibling “ may require renumbering • Drawbacks – Lack of global order information results in complex evaluations of following and preceding axes

  18. Dewey Order Encoding Methods • Dewey Order Encoding • Strikes a balance between Global and Local • Reasonable performance on updates andqueries Play 1 title(1.1) act(1.2) act(1.3) text(1.1.1) title(1.1.2) scene(1.2.2) text(1.1.2.1)

  19. Dewey Order Encoding • Each path uniquely identifies absolute position of a node in a document • Query processing is similar to that of Global order • Only “following-sibling “ may require renumbering • Drawbacks – Extra space required to store paths from root to the node

  20. Dewey Order Renumbering Scenario Renumbering required is more than that for Local Encoding, however much less than that for Global Encoding play title act text# act scene New element title scene

  21. Agenda • Ordered XML Data Model • Order Encoding Methods • Shredding Ordered XML into Relations • Translating XML queries to SQL • Performance Evaluation

  22. Shredding XML into Relations • Schema-less Case Unknown schema of input XML documents Edge Approach - Each document is stored as a single table • Schema-aware Case Schema of input XML documents is available Inlining – Single occurrence of child – store within parent relation Multiple occurrences – store as a new relation table

  23. Inlining Inlining is an effective way of storing and querying XML provided the availability of Document Schema Inlining adapts to Global, Local and Dewey Orders. Every relation requires an additional column to encode document order storing order information of ‘inlined’ elements is unnecessary (Element position is determined from the position of parent and from the document schema)

  24. Storing Order Information – Schema less case The Edge Approach • Each relation is stored as a table • Each tuple within the table represents a node • Edge (id, parent_id, name, value) id synonymous to a primary key parent_id synonymous to the foreign key, provides link to the node’s parent name stores tag name of element value stores text value

  25. Storing Order Information – Schema less case Edge approach adapts differently to Global, Local and Dewey • Global Order Edge (id, parent_id, end_desc_id, path_id, value) end_desc_id – id of the last descendant of a node • Local Order Edge (id, parent_id, sIndex, path_id, value) sIndex – sibling index of a node • Dewey Order Edge (dewey, path_id, value) dewey – represents both order and ancestor information

  26. Agenda • Ordered XML Data Model • Order Encoding Methods • Shredding Ordered XML into Relations • Translating XML queries to SQL • Performance Evaluation

  27. Query Translation for Global Order Edge (id, parent_id, end_desc_id, path_id, value) Translation of following/preceding Select nodes from Edge table where id value (context node) > end_descendant_id of context node Select nodes from Edge table where id value (context node) < end_descendant_id of context node Translation of following-sibling/ preceding-sibling Select (nodes in Edge table with id value > id of context node) AND (nodes with parent_id = parent_id of context node) Select (nodes in Edge table with id value < id of context node) AND (nodes with parent_id = parent_id of context node) Note : above expressions are NOT actual SQL statements

  28. Query Translation for Local Order Edge (id, parent_id, sIndex, path_id, value) Translation of following-sibling/ preceding-sibling (Similar to Global and Dewey Order) Translation of following/preceding (Complex Task !!!) • Compute all ancestors of context node – {anc} • Compute ancestors of following-sibling - {anc_sib} • Compute descendants of {anc_sib} Challenges: • Without knowledge of XML schema, retrieving ancestors/descendants is a complex task • Involves recursion

  29. Query Translation for Dewey Order Edge (dewey, path_id, value) dewey column - stored as variable length byte string - replaces parent_id, and end_desc_id in Global Edge Table - Encodes parent and descendant information within the dewey path - Eliminates need to store parent_id and child_id Drawback: Storage overhead due to large number of bytes allocated to each component.

  30. Query Translation in Inlining Essentially uses the same algorithm as that for Edge approach but with 2 extensions • XML data can be spread across several tables therefore evaluating axes requires access to multiple tables as opposed to accessing just one Edge table • Secondly translation algorithm does not use recursion (since the schema contains sufficient information about depth and postion of nodes) Drawback: Data is partitioned across many tables, too many tables to handle

  31. Agenda • Ordered XML Data Model • Order Encoding Methods • Shredding Ordered XML into Relations • Translating XML queries to SQL • Performance Evaluation

  32. Storage Requirements Table 1: Indicates the storage requirements of Global, Local and Dewey Encoding Methods

  33. Performance All experiments are based on the Shakespeare’s Plays dataset. Table 2: Test Queries

  34. Select and Reconstruct Modes XPath Queries essentially run in 2 different modes • Select Mode : Result set contains only the ID’s of the nodes satisfying the XPath expression • Reconstruct Mode: Entire XML fragments are extracted from the database in document order

  35. X axis: Queries Y axis: Time (seconds) Ordered Selection Edge Results

  36. Inlining Results

  37. Reconstruction • In reconstruct mode, XML documents need to be extracted from DB in document order • Optimizers inability to pick the best plan rendered poor results • On the other hand, using ‘tuned’ SQL queries yielded better results Note: Queries Q3,Q4,Q5,Q9 had a disastrous performance (way beyond the scope of indicated scale)

  38. Performance Results based on experiments • Global order is the most efficient order encoding method • Followed by Dewey Order – second best performance • Local Order uses sorting very often which degrades overall performance • Typically Inlining performs better than Edge • In general the XML document parsing overhead was more than XPath processing

  39. Performance Conclusions based on results • RDBMS efficiently supports ordered XML • Global order is the best for query workloads • Dewey Order is slightly less efficient than Global Order Best for a mix of queries and updates • Schema Information makes Local Order a viable alternative • Incomprehensiveness of Relational Optimizers to the hierarchical XML structure

  40. Acknowledgements… Prof. Elke Rundensteiner Thank You …

More Related