1 / 79

MonetDB/XQuery: Using a Relational DBMS for XML

Peter Boncz CWI The Netherlands. MonetDB/XQuery: Using a Relational DBMS for XML. Peter Boncz. Pathfinder - MonetDB/XQuery. TU Delft 10-5-2005. Outline. Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery XPath steps in the pre/post plane

Download Presentation

MonetDB/XQuery: Using a Relational DBMS for XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peter Boncz CWI The Netherlands MonetDB/XQuery: Using a Relational DBMS for XML

  2. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions

  3. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions

  4. XML • Standard, flexible syntax for data exchange • Regular, structured data Database content of all kinds: Inventory, billing, orders, … “Small” typed values • Irregular, unstructured text Documents of all kinds: Transcripts, books, legal briefs, … “Large” untyped values • Lingua franca of B2B Applications… • Increase access to products & services • Integrate disparate data sources • Automate business processes • … and numerous other application domains • Bio-informatics, library science, …

  5. XML : A First Look • XML document describing catalog of books <?xml version="1.0" encoding="ISO-8859-1" ?> <catalog> <book isbn="ISBN 1565114302"> <title>No Such Thing as a Bad Day</title> <author>Hamilton Jordan</author> <publisher>Longstreet Press, Inc.</publisher> <price currency="USD">17.60</price> <review> <reviewer>Publisher</reviewer>: This book is the moving account of one man's successful battles against three cancers ...<title>No Such Thing as a Bad Day</title> is warmly recommended. </review> </book> <!-- more books and specifications --> </catalog>

  6. XQuery 1.0 • Functional, strongly-typed query language • XQuery 1.0 = XPath 2.0 for navigation, selection, extraction + A few more expressions For-Let-Where-Order By-Return (FLWOR) XML construction Operators on types + User-defined functions & modules + Strong typing

  7. XSLT vs. XQuery • XSLT 1.0: XML  XML, HTML, Text • Loosely-typed scripting language • Format XML in HTML for display in browser • Must be highly tolerant of variability/errors in data • XQuery 1.0: XML  XML • Strongly-typed query language • Large-scale database access • Must guarantee safety/correctness of operations on data • Over time, XSLT & XQuery may both serve needs of many application domains • XQuery will become a hidden, commodity language

  8. Navigation, Selection, Extraction • Titles of all books published by Longstreet Press $cat/catalog/book[publisher=“Longstreet Press”]/title <title>No Such Thing As A Bad Day</title> • Publications with Jerome Simeon as author or editor • $cat//*[(author|editor) = “Jerome Simeon”] <book><title>XQuery from the Experts</title>…</book> <spec><title>XQuery Formal Semantics</title>…</spec>

  9. Transformation & Construction • First author & title of books published by A/W for $b in $cat//book[publisher = “Addison Wesley”] return <awbook> { $b/author[1], $b/title } </awbook> <awbook> <author>Don Chamberlin</author> <title>XQuery from the Experts</title> </awbook>

  10. Sequences & Iteration • Sequence constructor Return all books followed by all W3C specifications ($cat/catalog/book, $cat/catalog/W3Cspec) • XPath Expression Return all books & W3C specifications in doc order $cat/catalog/(book|W3Cspec) • For Expression • Similar to map : apply function to each item in sequence Return number of authors in each book for $b in $cat/catalog/book return fn:count($b/authors) => (3,1,2,…)

  11. Conditional & Quantified • Conditional if //show[year >= 2000] then “A-OK!” else “Error!” • Existential quantification • Implicit meaning of predicate expressions //show[year >= 2000] • Explicit expression: //show[some $y in ./year satisfies $y >= 2000] • Universal quantification //show[every $y in year satisfies $y >= 2000]

  12. Putting It Together • For each author, return number of books and receipts books published in past 2 years, ordered by name let $cat := fn:doc(“www.bn.com/catalog.xml“), Join $sales := fn:doc(“www.publishersweekly.com/sales.xml“) for $author in distinct-values($cat//author) Grouping let $books := $cat//book[@year >= 2000 and author = $a], S.J. $receipts := $sales/book[@isbn = $books/@isbn]/receipts order by $author Ordering return <sales> XML Construction { $author } <count> { fn:count($books) } </count> Aggregation <total> { fn:sum($receipts) } </total> </sales>

  13. Recursive Processing • Recursive functions support recursive data <part id=“001”> <partCt count=“2” id=“001”> <part id=“002”> <partCt count=“1” id=“002”/> <part id=“003”/> => <partCt count=“0” id=“003”/> </part> </partCt> <part id=“004”/> <partCt count=“0” id=“004”/> </part> </partCt> declare function partCount($p as element(part)) as element(partCt) { <partCt count=“{ count($p/part) }”> { $p1/@id, for $p2 in $p/part return partCount($p2) } </partCt> }

  14. XML Schema Languages • Many variants… • DTDs, XML Schema, RELAX-N/G, XDuce • … with similar goals to define • Types of literal (terminal) data • Names of elements & attribute • XQuery designed to support (all of) XML Schema • Structural & name constraints over types • Regular tree expressions over elements, attributes, atomic types

  15. TeXQuery : Full-text extensions • Text search & querying of structured content • Limited support in XQuery 1.0 • String operators with collation sequences $cat//book[contains(review/text(), “two thumbs up”)] • Stop words, proximity searching, ranking Ex: “Tony Blair” within two words of “George Bush” • Phrases that span tags and annotations Ex: Match “Mr. English sponsored the bill” in <sponsor> Mr. English </sponsor> <footnote> for himself and <co-sponsor> Mr.Coyne </co-sponsor> </footnote> sponsored the bill in the <committee-name> Committee for Financial Services </committee-name>

  16. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions

  17. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions

  18. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 XQuery Systems: 2 Approaches • Tree-based • Tree is basic data structure • Also on disk (if an XQuery DBMS) • Navigational Approach • Galax [Simeon..], Flux [Koch..], X-Hive • Tree Algebra Approach • TIMBER [Jagadish..] • Relational • Data shredded in relational tables • XQuery translated into database query (e.g. SQL)

  19. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 The Pathfinder Project • Challenge / Goal: • Turn RDBMSs into efficient XQuery engines • People: • Maurice van Keulen • University of Twente • Torsten Grust, Jens Teubner • University of Konstanz • Jan Rittinger • University of Konstanz & CWI

  20. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 The Pathfinder Project • Challenge / Goal: • Turn RDBMSs into efficient XQuery engines • People: • Maurice van Keulen • University of Twente • Torsten Grust, Jens Teubner • University of Konstanz • Jan Rittinger • University of Konstanz & CWI • Task: generate code for MonetDB

  21. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 MonetDB: Applied CS Research at CWI • a decade of “query-intensive” application experience • image retrieval: Peter Bosch  ImageSpotter • audio/video retrieval: Alex van Ballegooij RAM • XML text retrieval: de Vries / Hiemstra TIJAH • biological sequences: Arno Siebes  BRICKS • XML databases: Albrecht Schmidt  XMark • Grust / vKeulen  Pathfinder • GIS: Wilco Quak  MAGNUM • data warehousing / OLAP / data mining • SPSS  DataDistilleries • Univ. Massachussetts  PROXIMITY • CWI research group successfully spun off DataDistilleries (now SPSS)

  22. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 Pathfinder —MonetDB Pathfinder Parser Parser Sem. Analysis Sem. Analysis SQL Core Translation Core Translation Core to MILTranslation MIL (Query Algebra) Typechecking Typechecking Database Database Relational Algebra MonetDB

  23. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 Open Source • MonetDB + Pathfinder on Sourceforge • Mozilla License • Project Homepage • http://monetdb.cwi.nl • Developers website: • http://sf.net/projects/monetdb RoadMap • 14-apr-04: initial Beta release MonetDB/SQL • 30-sep-04: first official release MonetDB/SQL • 30-may-05: beta release of MonetDB/XQuery (i.e. Pathfinder)

  24. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 MonetDB

  25. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 MonetDB Particulars • Column wise fragmentation • BAT: Binary Association Tables [oid,X] • Don’t touch what you don’t need

  26. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 Binary Association Tables (BATs)

  27. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 BAT storage as thin arrays

  28. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 MonetDB Particulars • Column wise fragmentation • BAT: Binary Association Tables [oid,X] • Don’t touch what you don’t need • Void (virtual-oid) columns • Contain dense sequence 0,1,2,3,4,… • Require no space • Positional access (nice for XPath skipping) • pre = void

  29. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 DBMS Architecture

  30. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 Monet: DBMS Microkernel

  31. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 MonetDB: extensible architecture • Front-end/back-end: • support multiple data models • support multiple end-user languages • support diverse application domains

  32. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 MonetDB: extensible architecture Pathfinder XQuery Frontend • Front-end/back-end: • support multiple data models • support multiple end-user languages • support diverse application domains

  33. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 Architecture

  34. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions

  35. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • MonetDB Implementation • Data structures • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions

  36. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 XPath on and RDBMS Node-based relational encoding of XQuery's data model

  37. Tree Knowledge 1: pruning

  38. Tree Knowledge 2: Partitioning

  39. Staircase Join Algorithm

  40. Tree Knowledge 3: Skipping

  41. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 Pre/Post  Pre/Level/Size done for better skipping and updates

  42. Updates • Dense pre-numbers are nice for XPath • Positional skipping in Staircase join! • But how to handle updates?

  43. Updates • Dense pre-numbers are nice for XPath • Positional skipping in Staircase join! • But how to handle updates? Dense Not Dense

  44. Planned Update Solution

  45. Planned Update Solution

  46. Planned Update Solution

  47. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 XPath  XQuery

  48. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 Sequence Representation • sequence = table of items • add pos column for maintaining order • ignore polymorphism for the moment Pos Item 1 10 (10, “x”, <a/>, 10) → 2 “X” 3 pre(a) 4 10

  49. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 For-loops: the iter column

  50. Peter Boncz Pathfinder - MonetDB/XQuery TU Delft 10-5-2005 For-loops: the iter column

More Related