1 / 51

Introduction to XML and XQuery

Introduction to XML and XQuery. Guangjun (Kevin) Xie. Road Map. XML data model XML data vs Relational data XPath 2.0 XQuery Processing XQuery. XML Data Model XML Information Set (Infoset). Infoset is an abstract data set containing all information in an XML document

hong
Download Presentation

Introduction to XML and XQuery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to XML and XQuery Guangjun (Kevin) Xie

  2. Road Map • XML data model • XML data vs Relational data • XPath 2.0 • XQuery • Processing XQuery York University

  3. XML Data ModelXML Information Set (Infoset) • Infoset is an abstract data set containing all information in an XML document • provide a consistent set of definitions to refer to the information in a well-formed XML document • Usually, Infosets result from parsing XML documents; but it could also be synthetic • By use of an API, such as DOM • By transforming from existing infoset • An infoset consists of a number of information items. York University

  4. XML Data ModelXML Infoset • "information set" and "information item" are similar in meaning to the generic terms "tree" and "node” • An information item is an abstract description of some part of an XML document. • Each information item has a set of associated named properties, indicated as [property name] York University

  5. XML Data ModelInformation Items • 11 types of information items • Document Information Item • Element Information Items • Attribute Information Items • Character Information Items • Processing Instruction Information Items • Unexpanded Entity Reference Information Items • Comment Information Items • The Document Type Declaration Information Item • Unparsed Entity Information Items • Notation Information Items • Namespace Information Items • We will discuss the first 3 today York University

  6. XML Data ModelDocument Information Item • Exactly one doc item in an infoset • Other information accessible thru its properties: • [children] – containing PIs, comments, etc • [document element] – element item corresponding to the document element • [version] – XML version of the document • … • etc York University

  7. XML Data ModelElement Information Items • One element item for each element in XML document • The “root” element item is the [document element] prop. of document info item • Properties: • [namespace name] – the ns part of tag name • [local name] – the local part of tag name • [children] – all other info items inside • [attributes] – attributes elems of this item • [parent] – info. Item containing this item • … etc. York University

  8. XML Data ModelAttribute information items • One attribute item for each attribute in an XML element • Properties: • [namespace name] – the ns part of tag name • [local name] – the local part of tag name • [attribute type] – the data type of this attribute • [owner element] – the element info item containing this attr • … • etc York University

  9. XML Data ModelInfoset example <?xml version="1.0"?> <msg:message doc:date="19990421" xmlns:doc=“http://doc.example.org/namespaces/doc” xmlns:msg="http://message.example.org/" >Phone home!</msg:message> • The information set contains: • A document information item. • An element information item with namespace name "http://message.example.org/", local part "message", and prefix "msg". • An attribute information item with the namespace name "http://doc.example.org/namespaces/doc", local part "date", prefix "doc", and normalized value "19990421". • Three namespace information items for the http://www.w3.org/XML/1998/namespace, http://doc.example.org/namespaces/doc, and http://message.example.org/ namespaces. • Two attribute information items for the namespace attributes. • Eleven character information items for the character data. York University

  10. xmlns:doc xmlns:msg P h o n e h o m e ! XML Data ModelInfoset Example Legend: • Document info. Item • Element info. Item • Attribute info. Item • Character info. Item Version=1.0 msg:message doc:date York University

  11. Road Map • XML data model • XML data vs Relational data • XPath 2.0 • XQuery • Processing XQuery York University

  12. XML Data vs Relational Data • Relational DB stems from commercial data processing • Information usually has regular structure • XML has roots in text documents processing • Often have irregular structure. • Both are general model and capable of representing all forms of information. • Different heritages cause them to be optimized for different types of applications. York University

  13. XML Data vs Relational Data Nesting • XML Model • Deeply nested structure • Flexible (un-predefined) • Query easily handled by “descendants” axis in XPath 2.0 • Relational Model • Flat table structure • Primary-foreign keys represent nesting relationship • Complex and flexible nesting may result in awkward queries York University

  14. XML Data vs Relational Data Metadata • XML Model • Metadata mixed with ordinary data • Hight ratio of metadata to ordinary data • Relational Model • Metadata easily factored out • Difficult when query involve metadata • Ex: find the names of columns containing the value “red” York University

  15. XML Data vs Relational Data Ordering • XML Model • Intrinsic ordering can’t derived from value • Ex: sentences in a book is essential • Impose challenge for the query language • Relational Model • Ordering is dependent on values • Rows not considered to have ordering York University

  16. XML Data vs Relational Data Null Values • XML Model • Representing missing value by absence of element • Retrieving missing value results empty list • Need rule on how handle empty list • Relational Model • “null” value to represent missing value • Rules for operators in the presence of null York University

  17. XML Data vs Relational Data Structural Transformations • XML Model • Queries on XML documents and generate new XML documents • XPath 2.0 – navigating inside a document • XQuery – joining elements, constructing new elements/structures • Relational Model • Queries on tables and generate new tables York University

  18. XML Data vs Relational Data Data Definition • XML Model • Mixture of primitive data and nested elements • Elements may be optional • Constraints on cardinality and order • Impose challenges on type inference • Ex: proving output satisfies a given schema? • Relational Model • Specifying the properties of columns • All rows have same columns • Relatively simple York University

  19. Road Map • XML data model • XML data vs Relational data • XPath 2.0 • XQuery • Processing XQuery York University

  20. XPath 2.0What’s XPath? • XPath is a specification for defining parts of an XML document. • XPath 2.0 provides a method to locate individual node or set of nodes in a XML data model. • XPath 2.0 is close related to XQuery • Same data model based on XML data model (infoset) • XQuery uses XPath to refer to information in the data model • XPath 2.0 uses path expressions to navigate in XML documents • XPath 2.0 uses path expressions to select nodes in an XML document. • An XPath expression evaluates to a sequence of nodes • These path expressions look very much like the expressions you see when you work with a traditional computer file system. • XPath 2.0 is a W3C recommendation York University

  21. XPath 2.0Data model • Represent various values including • the input and the output of a query • all values of expressions used during the intermediate calculations. • Based on XML infoset data model • Shared with XQuery • Model XML data as trees • Sequence based data model • Using sequence to represent set of trees or tree fragments • Everything is sequence • Sequences never contain other sequences York University

  22. XPath 2.0Data model • A tree whose root node is a Document Node is referred to as a document. • A tree whose root node is not a Document Node is referred to as a fragment. York University

  23. XPath 2.0Data model • Every instance of the data model is a sequence • A sequence may contain nodes, atomic values, or any mixture of nodes and atomic values • A sequence is an ordered collection of zero or more items • An item is either a node or an atomic value • A single item appearing on its own is modeled as a sequence containing one item. York University

  24. XPath 2.0Data model • There are seven kinds of Nodes in the data model: • Document node • Element node • Attribute node • Text node • Namespace node • processing instruction node • Comment node York University

  25. <?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore> XPath 2.0Sample XML Document Books.xml York University

  26. <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> XPath 2.0Example /bookstore/book evaluated to a sequence of nodes, each node corresponding to a book element: //book evaluated to the same result York University

  27. XPath 2.0Example //book[@category=“WEB”] evaluates to a sequence containing 2 book element nodes: <book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> York University

  28. XPath 2.0Example • some $x in //book satisfies $x/price > 49 evaluates to a sequence containing a atomic value TRUE • every $x in //book satisfies $x/price > 49 evaluates to a sequence containing a atomic value FALSE York University

  29. XPath 2.0Example /bookstore/book[position()=1] evaluated to a sequence containing one element node: <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> York University

  30. Road Map • XML data model • XML data vs Relational data • XPath 2.0 • XQuery • Processing XQuery York University

  31. XQueryWhat’s XQuery? • The language for querying XML data • XQuery is a language for finding and extracting elements and attributes from XML documents. • XQuery for XML is like SQL for relational databases • Lots of the concepts and techniques used in SQL processing and optimization can be applied to XQuery processing and optimization. York University

  32. XQueryWhat’s XQuery? • XQuery is built on XPath 2.0 expressions • XQuery 1.0 and XPath 2.0 share the same data model • Support the same functions and operators. • Understanding XPath 2.0 is essential to understanding XQuery. • Supported by all the major database venders • IBM • Oracle • Microsoft • etc York University

  33. XQueryWhat’s XQuery? • closed with respect to a data model • value of every expression in the language is guaranteed to be in the data model. • XPath 2.0 is also closed • Designed to be a functional language • No side-effect • Processing and producing sequences • XQuery is becoming a W3C standard • Current draft version is XQuery 1.0 • Not yet a W3C Recommendation (XQuery is a Working Draft) York University

  34. XQueryFLWOR expression • For expression binds a variable with each element in a sequence iteratively • Let expression binds a variable with a sequence • Where expression applies conditions during For expression binding • Order By sort the output of the For expression • Return expression returns a sequence York University

  35. XQuerysample XML document – bib.xml <bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book> <book year="1999"> <title>The Economics of Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first> <affiliation>CITI</affiliation> </editor> <publisher>Kluwer Academic Publishers</publisher> <price>129.95</price> </book> </bib> York University

  36. XQuerysample XML document – reviews.xml <reviews> <entry> <title>Data on the Web</title> <price>34.95</price> <review> A very good discussion of semi-structured database systems and XML. </review> </entry> <entry> <title>Advanced Programming in the Unix environment</title> <price>65.95</price> <review> A clear and detailed discussion of UNIX programming. </review> </entry> <entry> <title>TCP/IP Illustrated</title> <price>65.95</price> <review> One of the best books on TCP/IP. </review> </entry> </reviews> York University

  37. XQuerysample XML document – prices.xml <prices> <book> <title>Advanced Programming in the Unix environment</title> <source>bstore2.example.com</source> <price>65.95</price> </book> <book> <title>Advanced Programming in the Unix environment</title> <source>bstore1.example.com</source> <price>65.95</price> </book> <book> <title>TCP/IP Illustrated</title> <source>bstore2.example.com</source> <price>65.95</price> </book> <book> <title>TCP/IP Illustrated</title> <source>bstore1.example.com</source> <price>65.95</price> </book> <book> <title>Data on the Web</title> <source>bstore2.example.com</source> <price>34.95</price> </book> <book> <title>Data on the Web</title> <source>bstore1.example.com</source> <price>39.95</price> </book> </prices> York University

  38. Solution in XQuery: <bib> { for $b in doc("bib.xml")/bib/book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 return <book year="{ $b/@year }"> { $b/title } </book> } </bib> Result: <bib> <book year="1994"> <title>TCP/IP Illustrated</title> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> </book> </bib> XQueryExample 1 • List books published by Addison-Wesley after 1991, including their year and title York University

  39. Solution in XQuery: for $b in doc("bib.xml")/bib/book, $t in $b/title, $a in $b/author return <result> { $t } { $a } </result> Result: <result> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> </result> <result> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> </result> <result> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> </result> <result> <title>Data on the Web</title> <author><last>Buneman</last><first>Peter</first></author> </result> <result> <title>Data on the Web</title> <author><last>Suciu</last><first>Dan</first></author> </result> XQueryExample 2 • Create a flat list of all the title-author pairs York University

  40. Solution in XQuery: for $b in doc("bib.xml")/bib/book return <result> { $b/title } { $b/author } </result> Result: <result> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> </result> <result> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> </result> <result> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> </result> <result> <title>The Economics of Technology and Content for Digital TV</title> </result>> XQueryExample 3 • For each book in the bibliography, list the title and authors York University

  41. Solution in XQuery: <books-with-prices> { for $b in doc("bib.xml")//book, $a in doc("reviews.xml")//entry where $b/title = $a/title return <book-with-prices> { $b/title } <bib-price> { $a/price/text() } </bib-price> <review-price> { $b/price/text() } </review-price> </book-with-prices> } </books-with-prices> Result: <books-with-prices> <book-with-prices> <title>TCP/IP Illustrated</title> <price-bstore2>65.95</price-bstore2> <price-bstore1>65.95</price-bstore1> </book-with-prices> <book-with-prices> <title>Advanced Programming in the Unix environment</title> <price-bstore2>65.95</price-bstore2> <price-bstore1>65.95</price-bstore1> </book-with-prices> <book-with-prices> <title>Data on the Web</title> <price-bstore2>34.95</price-bstore2> <price-bstore1>39.95</price-bstore1> </book-with-prices> </books-with-prices> XQueryExample 4 • For each book found at both bib.xml and reviews.xml, list the title of the book and its price from each source York University

  42. Solution in XQuery: <bib> { for $b in doc("bib.xml")//book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 order by $b/title return <book> { $b/@year } { $b/title } </book> } </bib> Result: <bib> <book year="1992"> <title> Advanced Programming in the Unix environment </title> </book> <book year="1994"> <title>TCP/IP Illustrated</title> </book> </bib> XQueryExample 5 • List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order York University

  43. Solution in XQuery: <results> { let $doc := doc("prices.xml") for $t in distinct-values($doc//book/title) let $p := $doc//book[title = $t]/price return <minprice title="{ $t }"> <price>{ min($p) }</price> </minprice> } </results> Result: <results> <minprice title="Advanced Programming in the Unix environment"> <price>65.95</price> </minprice> <minprice title="TCP/IP Illustrated"> <price>65.95</price> </minprice> <minprice title="Data on the Web"> <price>34.95</price> </minprice> </results> XQueryExample 6 • In the document “prices.xml”, find the minimum price for each book, in the form of a “miniprice” element with the book title as its title attribute York University

  44. <?xml version="1.0"?> <book> <title>Data on the Web</title> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> <section id="intro" difficulty="easy" > <title>Introduction</title> <p>Text ... </p> <section> <title>Audience</title> <p>Text ... </p> </section> <section> <title>Web Data and the Two Cultures</title> <p>Text ... </p> <figure height="400" width="400"> <title>Traditional client/server architecture</title> <image source="csarch.gif"/> </figure> <p>Text ... </p> </section> </section> <section id="syntax" difficulty="medium" > <title>A Syntax For Data</title> <p>Text ... </p> <figure height="200" width="500"> <title>Graph representations of structures</title> <image source="graphs.gif"/> </figure> <p>Text ... </p> <section> <title>Base Types</title> <p>Text ... </p> </section> <section> <title>Representing Relational Databases</title> <p>Text ... </p> <figure height="250" width="400"> <title>Examples of Relations</title> <image source="relations.gif"/> </figure> </section> <section> <title>Representing Object Databases</title> <p>Text ... </p> </section> </section> </book> XQuerysample XML document – book.xml York University

  45. Solution in XQuery: declare function local:toc( $book-or-section as element()) as element()* { for $section in $book-or-section/section return <section> { $section/@*, $section/title, local:toc($section) } </section> }; <toc> { for $s in doc("book.xml")/book return local:toc($s) } </toc> <toc> <section id="intro" difficulty="easy"> <title>Introduction</title> <section> <title>Audience</title> </section> <section> <title>Web Data and the Two Cultures</title> </section> </section> <section id="syntax" difficulty="medium"> <title>A Syntax For Data</title> <section> <title>Base Types</title> </section> <section> <title>Representing Relational Databases</title> </section> <section> <title>Representing Object Databases</title> </section> </section> </toc> XQueryExample 7 • Prepare a (nested) table of contents, listing all sections and their titles. Preserve the original attributes of each <section> element, if any York University

  46. Road Map • XML data model • XML data vs Relational data • XPath 2.0 • XQuery • Processing XQuery York University

  47. Processing XQueryApproaches for querying XML data • Mapping XML data into relational data • Query with SQL • May produces too many relations • Loses of information may occurs • Ex: ordering, explicit hierarchical relationship between elements • Using specific query languages • Usually integrated with SQL and relational data management • SQL/XML or XQuery York University

  48. A new XQuery parser is added to the existing relational query processing All components extended to process XQuery Processing XQueryIBM System RX SQL/XQuery compiler York University

  49. Parser convert XQuery into XQueryX XQueryX is an XML representation of XQuery (another W3C candidate recommendation) XML parser construct a DOM tree from XQueryX Work on the DOM afterward Corresponding components are extended for XQuery too Processing XQueryOracle XQuery Compilation Engine York University

  50. XQuery compiled into XML algebra tree, which is an internal representation Algebra tree can be optimized and executed by relational query processor Optimizations are rule-based Mapper traverses the algebra tree, converting each XML operator into a relational operator sub-tree Processing XQueryMicrosoft XQuery compilation York University

More Related