1 / 53

XML, XML Schema, Xpath and Xquery

XML, XML Schema, Xpath and Xquery. Slides collated from various sources, many from Dan Suciu at Univ. of Washington. XML. W3C standard to complement HTML origins: structured text SGML motivation: HTML describes presentation XML describes content

taima
Download Presentation

XML, XML Schema, Xpath and Xquery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML, XML Schema, Xpath and Xquery Slides collated from various sources, many from Dan Suciu at Univ. of Washington

  2. XML W3C standard to complement HTML • origins: structured text SGML • motivation: • HTML describes presentation • XML describes content • http://www.w3.org/TR/2000/REC-xml-20001006 (version 2, 10/2000) CS561 - Spring 2004.

  3. From HTML to XML HTML describes the presentation CS561 - Spring 2004.

  4. HTML <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteboul, Buneman, Suciu <br> Morgan Kaufmann, 1999 CS561 - Spring 2004.

  5. XML <bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography> XML describes the content CS561 - Spring 2004.

  6. XML Terminology • tags: book, title, author, … • start tag: <book>, end tag: </book> • elements: <book>…<book>,<author>…</author> • elements are nested • empty element: <red></red> abbrv. <red/> • an XML document: single root element well formed XML document: if it has matching tags CS561 - Spring 2004.

  7. More XML: Attributes <bookprice = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year> </book> attributes are alternative ways to represent data CS561 - Spring 2004.

  8. More XML: Oids and References <personid=“o555”> <name> Jane </name> </person> <personid=“o456”> <name> Mary </name> <childrenidref=“o123 o555”/> </person> <personid=“o123” mother=“o456”><name>John</name> </person> oids and references in XML are just syntax CS561 - Spring 2004.

  9. XML Namespaces • http://www.w3.org/TR/REC-xml-names (1/99) • name ::= [prefix:]localpart <bookxmlns:isbn=“www.isbn-org.org/def”> <title> … </title> <number> 15 </number> <isbn:number> …. </isbn:number> </book> CS561 - Spring 2004.

  10. defined here XML Namespaces • syntactic: <number> , <isbn:number> • semantic: provide URL for schema <tagxmlns:mystyle = “http://…”> … <mystyle:title> … </mystyle:title> <mystyle:number> … </tag> CS561 - Spring 2004.

  11. XML Schemas • http://www.w3.org/TR/xmlschema-1/10/2000 • generalizes DTDs • uses XML syntax • two documents: structure and datatypes • http://www.w3.org/TR/xmlschema-1 • http://www.w3.org/TR/xmlschema-2 • XML-Schema is complex CS561 - Spring 2004.

  12. XML Schemas <xsd:elementname=“paper” type=“papertype”/> <xsd:complexTypename=“papertype”> <xsd:sequence> <xsd:elementname=“title” type=“xsd:string”/> <xsd:elementname=“author” minOccurs=“0”/> <xsd:elementname=“year”/> <xsd:choice> < xsd:elementname=“journal”/> <xsd:elementname=“conference”/> </xsd:choice> </xsd:sequence> </xsd:element> DTD: <!ELEMENT paper (title,author*,year, (journal|conference))> CS561 - Spring 2004.

  13. Elements v.s. Types in XML Schema <xsd:elementname=“person”> <xsd:complexType> <xsd:sequence> <xsd:elementname=“name” type=“xsd:string”/> <xsd:elementname=“address”type=“xsd:string”/> </xsd:sequence> </xsd:complexType></xsd:element> <xsd:elementname=“person”type=“ttt”><xsd:complexType name=“ttt”> <xsd:sequence> <xsd:elementname=“name” type=“xsd:string”/> <xsd:elementname=“address”type=“xsd:string”/> </xsd:sequence></xsd:complexType> DTD: <!ELEMENT person (name,address)> CS561 - Spring 2004.

  14. Elements v.s. Types in XML Schema • Types: • Simple types (integers, strings, ...) • Complex types (regular expressions, like in DTDs) • Element-type-element alternation: • Root element has a complex type • That type is a regular expression of elements • Those elements have their complex types... • ... • On the leaves we have simple types CS561 - Spring 2004.

  15. Local and Global Types in XML Schema • Local type: <xsd:elementname=“person”> [define locally the person’s type] </xsd:element> • Global type: <xsd:elementname=“person” type=“ttt”/> <xsd:complexType name=“ttt”> [define here the type ttt] </xsd:complexType> CS561 - Spring 2004. Global types: can be reused in other elements

  16. Local v.s. Global Elements inXML Schema • Local element: <xsd:complexType name=“ttt”> <xsd:sequence> <xsd:elementname=“address” type=“...”/>... </xsd:sequence> </xsd:complexType> • Global element: <xsd:elementname=“address” type=“...”/> <xsd:complexType name=“ttt”> <xsd:sequence><xsd:elementref=“address”/> ... </xsd:sequence> </xsd:complexType> Global elements: like in DTDs CS561 - Spring 2004.

  17. Regular Expressions in XML Schema Recall the element-type-element alternation: <xsd:complexType name=“....”> [regular expression on elements] </xsd:complexType> Regular expressions: • <xsd:sequence> A B C </...> = A B C • <xsd:choice> A B C </...> = A | B | C • <xsd:group> A B C </...> = (A B C) • <xsd:... minOccurs=“0”maxOccurs=“unbounded”> ..</...> = (...)* • <xsd:... minOccurs=“0”maxOccurs=“1”> ..</...> = (...)? CS561 - Spring 2004.

  18. Attributes in XML Schema <xsd:elementname=“paper” type=“papertype”/> <xsd:complexTypename=“papertype”> <xsd:sequence> <xsd:elementname=“title” type=“xsd:string”/> . . . . . . </xsd:sequence> <xsd:attribute name=“language" type="xsd:NMTOKEN" fixed=“English"/> </xsd:complexType> Attributes are associated to the type, not to the element Only to complex types; more trouble if we want to add attributes to simple types. CS561 - Spring 2004.

  19. “Mixed” Content, “Any” Type <xsd:complexTypemixed="true"> . . . . • Better than in DTDs: can still enforce the type, but now may have text between any elements • Means anything is permitted there <xsd:elementname="anything" type="xsd:anyType"/> . . . . CS561 - Spring 2004.

  20. Derived Types by Extensions <complexTypename="Address"> <sequence> <elementname="street" type="string"/> <elementname="city" type="string"/> </sequence> </complexType> <complexTypename="USAddress"> <complexContent> <extensionbase="ipo:Address"> <sequence> <elementname="state" type="ipo:USState"/> <elementname="zip" type="positiveInteger"/> </sequence> </extension> </complexContent> </complexType> CS561 - Spring 2004. Corresponds to inheritance

  21. Derived Types by Restrictions • (*): may restrict cardinalities, e.g. (0,infty) to (1,1); may restrict choices; other restrictions… <complexContent> <restrictionbase="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent> Corresponds to set inclusion CS561 - Spring 2004.

  22. Keys in XML Schema XML: • <purchaseReport> • <regions> • <zipcode="95819"> • <partnumber="872-AA" quantity="1"/> • <partnumber="926-AA" quantity="1"/> • <partnumber="833-AA" quantity="1"/> • <partnumber="455-BX" quantity="1"/> • </zip> • <zip code="63143"> • <partnumber="455-BX" quantity="4"/> • </zip> • </regions> • <parts> • <partnumber="872-AA">Lawnmower</part> • <partnumber="926-AA">Baby Monitor</part> • <partnumber="833-AA">Lapis Necklace</part> • <partnumber="455-BX">Sturdy Shelves</part> • </parts> • </purchaseReport> XML Schema: <keyname="NumKey"> <selectorxpath="parts/part"/> <fieldxpath="@number"/> </key> CS561 - Spring 2004.

  23. Keys in XML Schema • In general, two flavors: <keyname=“someDummyNameHere"> <selectorxpath=“p"/> <fieldxpath=“p1"/> <fieldxpath=“p2"/> . . . <fieldxpath=“pk"/> </key> <uniquename=“someDummyNameHere"> <selectorxpath=“p"/> <fieldxpath=“p1"/> <fieldxpath=“p2"/> . . . <fieldxpath=“pk"/> </key> Note: all Xpath expressions “start” at the element currently being defined The fields must identify a single node CS561 - Spring 2004.

  24. Keys in XML Schema • Unique = guarantees uniqueness • Key = guarantees uniqueness and existence • All Xpath expressions are “restricted”: • /a/b | /a/c OK for selector” • //a/b/*/c OK for field • Note: better than DTD’s ID mechanism CS561 - Spring 2004.

  25. Keys in XML Schema • Examples • <keyname="fullName"> • <selectorxpath=".//person"/> • <fieldxpath="forename"/> • <fieldxpath="surname"/> • </key> • <uniquename="nearlyID"> • <selectorxpath=".//*"/> • <fieldxpath="@id"/> • </unique> Recall: must have A single forename, Single surname CS561 - Spring 2004.

  26. Foreign Keys in XML Schema • Example • <keyrefname="personRef" refer="fullName"> • <selectorxpath=".//personPointer"/> • <fieldxpath="@first"/> • <fieldxpath="@last"/> • </keyref> CS561 - Spring 2004.

  27. XPATH

  28. XPath • Goal = permit to access some nodes from document • XPath main construct : axis navigation • XPath path consists of one or more navigation steps, separated by / • Navigation step : axis + node-test + predicates • Examples • /descendant::node()/child::author • /descendant::node()/child::author[parent/attribute::booktitle =“XML”][2] • XPath also offers shortcuts • no axis means child • // º /descendant-or-self::node()/ CS561 - Spring 2004.

  29. context node aaa ccc aaa aaa ccc 2 3 1 bbb bbb 4 5 6 7 XPath- Child axis navigation • author is shorthand for child::author. Examples: • aaa -- all the child nodes labeled aaa (1,3) • aaa/bbb -- all the bbb grandchildren of aaa children (4) • */bbb all the bbb grandchildren of any child (4,6) • . -- the context node • / -- the root node CS561 - Spring 2004.

  30. XPath- child axis navigation • /doc -- all the doc children of the root • ./aaa -- all the aaa children of the context node (equivalent to aaa) • text() -- all the text children of the context node • node() -- all the children of the context node (includes text and attribute nodes) • .. -- parent of the context node • .// -- the context node and all its descendants • // -- the root node and all its descendants • //text() -- all the text nodes in the document CS561 - Spring 2004.

  31. Predicates • [2] -- the second child node of the context node • chapter[5] -- the fifth chapter child of the context node • [last()] -- the last child node of the context node • chapter[title=“introduction”] -- the chapter children of the context node that have one or more title children whose string-value is “introduction” (the string-value is the concatenation of all the text on descendant text nodes) • person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe” CS561 - Spring 2004.

  32. Axis navigation • So far, nearly all our expressions have moved us down by moving to child nodes. Exceptions were • . -- stay where you are • / go to the root • // all descendants of the root • .// all descendants of the context node • XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self • Some of these (self, parent) describe single nodes, others describe sequences of nodes. CS561 - Spring 2004.

  33. XPath Navigation Axes ancestor preceding-sibling following-sibling self child attribute preceding following namespace descendant CS561 - Spring 2004.

  34. XPath abbreviated syntax (nothing) child:: @ attribute:: // /descendant-or-self::node() . self::node() .// descendant-or-self::node .. parent::node() / (document root) CS561 - Spring 2004.

  35. Query Languages - XQuery

  36. Summary of XQuery • FLWR expressions • FOR and LET expressions • Collections and sorting Resources XQuery: A Query Language for XML Chamberlin, Florescu, et al. W3C recommendation: www.w3.org/TR/xquery/ CS561 - Spring 2004.

  37. XQuery • Based on Quilt (which is based on XML-QL) • http://www.w3.org/TR/xquery/2/2001 • XML Query data model (ordered) CS561 - Spring 2004.

  38. FLWR (“Flower”) Expressions FOR ... LET... FOR... LET... WHERE... RETURN... CS561 - Spring 2004.

  39. XQuery Find all book titles published after 1995: FOR$xINdocument("bib.xml")/bib/book WHERE$x/year > 1995 RETURN$x/title Result: <title> abc </title> <title> def </title> <title> ghi </title> CS561 - Spring 2004.

  40. XQuery For each author of a book by Morgan Kaufmann, list all books she published: FOR$aINdistinct(document("bib.xml")/bib/book[publisher=“Morgan Kaufmann”]/author) RETURN <result> $a, FOR$tIN /bib/book[author=$a]/title RETURN$t </result> distinct = a function that eliminates duplicates CS561 - Spring 2004.

  41. XQuery Result: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result> CS561 - Spring 2004.

  42. XQuery • FOR$x in expr -- binds $x to each element in the list expr • LET$x = expr -- binds $x to the entire list expr • Useful for common subexpressions and for aggregations CS561 - Spring 2004.

  43. XQuery <big_publishers> FOR$pINdistinct(document("bib.xml")//publisher) LET$b := document("bib.xml")/book[publisher = $p] WHEREcount($b) > 100 RETURN$p </big_publishers> count = a (aggregate) function that returns the number of elms CS561 - Spring 2004.

  44. XQuery Find books whose price is larger than average: LET$a=avg(document("bib.xml")/bib/book/@price) FOR$b in document("bib.xml")/bib/book WHERE$b/@price > $a RETURN$b CS561 - Spring 2004.

  45. XQuery Summary: • FOR-LET-WHERE-RETURN = FLWR FOR/LET Clauses List of tuples WHERE Clause List of tuples RETURN Clause CS561 - Spring 2004. Instance of Xquery data model

  46. FOR v.s. LET FOR • Binds node variables iteration LET • Binds collection variables one value CS561 - Spring 2004.

  47. FOR v.s. LET Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ... FOR$xINdocument("bib.xml")/bib/book RETURN <result> $x </result> LET$x:=document("bib.xml")/bib/book RETURN <result> $x </result> Returns: <result> <book>...</book> <book>...</book> <book>...</book> ... </result> CS561 - Spring 2004.

  48. Collections in XQuery • Ordered and unordered collections • /bib/book/author = an ordered collection • Distinct(/bib/book/author) = an unordered collection • LET$a = /bib/book $a is a collection • $b/author  a collection (several authors...) Returns: <result> <author>...</author> <author>...</author> <author>...</author> ... </result> RETURN <result> $b/author </result> CS561 - Spring 2004.

  49. Sorting in XQuery <publisher_list> FOR$pINdistinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR$bIN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY(priceDESCENDING) </publisher> SORTBY(name) </publisher_list> CS561 - Spring 2004.

  50. Sorting in XQuery • Sorting arguments: refer to name space of RETURN clause, not FOR clause • To sort on an element you don’t want to display, first return it, then remove it with an additional query. CS561 - Spring 2004.

More Related