1 / 45

Managing XML and Semistructured Data

Managing XML and Semistructured Data. Part 3: Query Languages. In this section…. Lorel (A Lightweight Object REpository Language - developed at Standford) XPath specification data model Examples [ xpath , axis ] syntax XQuery FLWR expressions FOR and LET expressions

denna
Download Presentation

Managing XML and Semistructured Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Managing XML and Semistructured Data Part 3: Query Languages

  2. In this section… • Lorel (A Lightweight Object REpository Language - developed at Standford) • XPath specification • data model • Examples [xpath, axis] • syntax • XQuery • FLWR expressions • FOR and LET expressions • Collections and sorting • (XML-QL the earlier version in AT&T Labs) • Resources: The Lorel Query Language for Semistructured Data  by Abiteboul, Quass, McHugh, Widom, Wiener, in International Journal on Digital Libraries, 1997. A formal semantics of patterns in XSLT by Phil Wadler. XML Path Language (XPath) www.w3.org/TR/xpath XQuery: A Query Language for XML Chamberlin, Florescu, et al. W3C recommendation: www.w3.org/TR/xquery/

  3. Querying XML Data • A core query language (extracting + restructuring) • XPath (core expressions) allows simple navigation through the tree • XQuery is used as the SQL of XML • XSLT (Extensible Stylesheet Language Transformation) = recursive traversal based on pattern matching - will not discuss here

  4. Sample Data for Queries <biblio><paper> … </paper> <book><author> Smith </author> <date> 1999 </date> <title> Database Systems </title></book><book><author> Roux</author><author> Combalusier</author> <date> 1976 </date><title> Database Systems </title></book> </biblio>

  5. A Core Query Language A SQL-like language for querying semi-structured data biblio &o1 book paper book &o12 &o24 &o29 . . . title author date title author author date &30 &o52 &25 &96 Database Systems Roux 1976 &o50 Will illustrate with: XML DB = &o47 &o48 Combalusier Smith 1999 Database Systems

  6. Query 1: biblio &o1 book paper book &o12 &o24 &o29 . . . title author date author title author date &30 &o52 &25 &96 1976 Database Systems &o50 &o47 &o48 Roux Combalusier 1999 Database Systems Smith SELECT author: XFROM biblio.book.author X answer author Answer ={author: “Smith”, author: “Roux”, author: “Combalusier”} author author

  7. Query 2: biblio &o1 book paper book &o12 &o24 &o29 . . . title author date author title author date &30 &o52 &25 &96 1976 Database Systems &o50 &o47 &o48 Roux Combalusier 1999 Database Systems Smith SELECT row: XFROM biblio._XWHERE “Smith” in X.author row answer . . . Answer ={row: {author:“Smith”, date: 1999, title: “Database…”},row: …} row

  8. Query 3: biblio &o1 book paper book &o12 &o24 &o29 . . . title author date author title author date &30 &o52 &25 &96 1976 Database Systems &o50 &o47 &o48 Roux Combalusier 1999 Database Systems Smith SELECT row: ( SELECT author: Y FROM X.author Y)FROM biblio.book X row answer &a1 row Answer ={row: {author:“Smith”},row: {author:“Roux”,author:“Combalusier”,},} &a2 author author author

  9. Query 4: biblio &o1 book paper book &o12 &o24 &o29 . . . title author date author title author date &30 &o52 &25 &96 1976 Database Systems &o50 &o47 &o48 Roux Combalusier 1999 Database Systems Smith SELECT ( SELECT row: {author: Y, title: T} FROM X.author Y, X.title T)FROM biblio.book XWHERE “Roux” in X.author row answer Answer ={row: {author:“Roux”, title: “Database…”},row: {author:“Combalusier”, title: “Database…”},} &a1 row author &a2 title title author

  10. Lorel • Minor syntactic differences in regular path expressions (% instead of _, # instead of _*) • Common path convention:becomes: SELECT biblio.book.authorFROM biblio.bookWHERE biblio.book.year = 1999 SELECT X.authorFROM biblio.book XWHERE X.year = 1999

  11. Lorel • Existential variables: • What happens with books having multiple authors ? Author is existentially quantified: SELECT biblio.book.yearFROM biblio.bookWHERE biblio.book.author = “Roux” SELECT X.yearFROM biblio.book X, X.author YWHERE Y = “Roux”

  12. Lorel • Path variables. @P in: • What happens on graphs with cycles ? • Constructing new results • Several default rules • Casting between datatypes • Very useful in practice SELECT @PFROM biblio.# @P X

  13. XPath • http://www.w3.org/TR/xpath (11/99) • Building block for other W3C standards: • XSL Transformations (XSLT) • XML Link (XLink) • XML Pointer (XPointer) • XML Query • Was originally part of XSL

  14. XPath: Summary bib matches a bib element * matches any element / matches the root element /bib matches a bib element under root bib/paper matches a paper in bib bib//paper matches a paper in bib, at any depth //paper matches a paper at any depth paper|book matches a paper or a book @price matches a price attribute bib/book/@price matches price attribute in book, in bib bib/book/[@price<“55”]/author/lastname matches…

  15. Example for XPath Queries <bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><bookprice=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book> </bib>

  16. bib Data Model for XPath The root The root element book book publisher author . . . . Addison-Wesley Serge Abiteboul

  17. XPath: Simple Expressions /bib/book/year Result: <year> 1995 </year> <year> 1998 </year> Result: empty (there were no papers) /bib/paper/year

  18. XPath: Restricted Kleene Closure //author Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author> Result: <first-name> Rick </first-name> /bib//first-name

  19. XPath: Text Nodes /bib/book/author/text() Result: Serge Abiteboul Jeffrey D. Ullman !Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: • text() = matches the text value • node() = matches any node (= * or @* or text()) • name() = returns the name of the current tag

  20. XPath: Wildcard Result: <first-name> Rick </first-name> <last-name> Hull </last-name> * Matches any element //author/*

  21. XPath: Attribute Nodes /bib/book/@price Result: “55” @price means that price is has to be an attribute

  22. XPath: Predicates /bib/book/author[firstname] Result: <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author>

  23. XPath: More Predicates Result: <lastname> … </lastname> <lastname> … </lastname> /bib/book/author[firstname][address[//zip][city]]/lastname

  24. XPath: More Predicates /bib/book[@price < “60”] /bib/book[author/@age < “25”] /bib/book[author/text()]

  25. XQuery FLWOR (flower) Expressions • Based on Quilt(which is based on XML-QL) • http://www.w3.org/TR/xquery/2/2001 • XML Query data model • Ordered ! FOR ... LET... WHERE... ORDER BY… RETURN...

  26. XQuery Query: Find all book titles published after 1995: FOR$xINdocument("bib.xml")/bib/book WHERE$x/year > 1995 RETURN$x/title <bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><bookprice=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book> </bib> * bib.xml is shown on slide 15 Result: <title> Principles of Database…</title>

  27. XQuery Query: Find book titles by the coauthors of “Foundations of Databases”: FOR$xINbib/book[title/text() = “Foundations …”]/author$yINbib/book[author/text() = $x/text()]/title RETURN <answer> $y/text() </answer> Result: <answer> Foundations … </ answer > < answer> Foundations …</ answer > The answer willcontain duplicates !

  28. XQuery Same as before, but eliminate duplicates: FOR$xINbib/book[title/text() = “Database Theory”]/author$yINdistinct(bib/book[author/text() = $x/text()]/title) RETURN <answer> $y/text() </answer> Result: < answer> Foundations …</ answer > distinct = a function that eliminates duplicates

  29. FOR$yIN /db/Company/row[city/text()=“Seattle”]$xIN /db/Product/row[maker/text()=$y/cid/text()]RETURN$x/name CoolXQuery SQL and XQuery Side-by-side Product(pid, name, maker)Company(cid, name, city) Query: Find all products made in Seattle FOR$xIN /db/Product/row$yIN /db/Company/rowWHERE$x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN$x/name SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle” SQL XQuery

  30. Result: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result> XQuery: Nesting Query: For each author of a book by Morgan Kaufmann, list all books s/he published: FOR$aINdistinct(document("bib.xml")/bib/book[publisher=“Morgan Kaufmann”]/author) RETURN <result> { $a, FOR$tIN /bib/book[author=$a]/title RETURN$t} </result>

  31. XQuery • FOR$xIN expr -- binds $x to each value in the list expr • LET$x = expr -- binds $x to the entire list expr • Useful for common subexpressions and for aggregations <big_publishers> FOR$pINdistinct(document("bib.xml")//publisher) LET$b := document("bib.xml")/book[publisher = $p] WHEREcount($b) > 100 RETURN$p </big_publishers> count = a (aggregate) function that returns the number of elms

  32. XQuery Query: Find books whose price is larger than average: FOR$aIN /bib/book LET$b:=avg(/bib/book/price/text()) WHERE$a/price/text() > $b RETURN$a

  33. XQuery Query: Find all publishers that published more than 100 books: <big_publishers> { FOR$pINdistinct(//publisher/text()) LET$b := document("bib.xml")/book[publisher/text() = $p] WHEREcount($b) > 100 RETURN <publisher> $p </publisher> } </big_publishers> $bis a collection of elements, not a single element count= a (aggregate) function that returns the number of elements

  34. FOR v.s. LET FOR • Binds node variables iteration LET • Binds collection variables one value Examples Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ... FOR$xINdocument("bib.xml")/bib/book RETURN <result> $x </result> Returns: <result> <book>...</book> <book>...</book> <book>...</book> ... </result> LET$x:=document("bib.xml")/bib/book RETURN <result> $x </result>

  35. Sorting in XQuery <publisher_list> FOR$pINdistinct(document("bib.xml")//publisher) RETURN <publisher> {<name> $p/text() </name> , FOR$bIN document("bib.xml")//book[publisher = $p] RETURN <book> {$b/title , $b/@price} </book> SORTBY(priceDESCENDING) } </publisher> SORTBY(name) </publisher_list>

  36. Sorting in XQuery • Sorting arguments: refer to the name space of the RETURN clause, not the FOR clause • To sort on an element you don’t want to display, first return it, then remove it with an additional query. <publisher_list> FOR$pINdistinct(document("bib.xml")//publisher) RETURN <publisher> { <name> $p/text() </name> , FOR$bIN document("bib.xml")//book[publisher = $p] RETURN <book> { $b/title , $b/price } </book> ORDER BY priceDESCENDING } </publisher> ORDER BY name </publisher_list>

  37. Collections in XQuery • Ordered and unordered collections • /bib/book/author = an ordered collection • Distinct(/bib/book/author) = an unordered collection • LET$b = /bib/book $b is a collection • $b/author  a collection (several authors...) Returns: <result> <author>...</author> <author>...</author> <author>...</author> ... </result> RETURN <result> $b/author </result>

  38. If-Then-Else FOR$h IN //holding RETURN <holding> { $h/title, IF$h/@type = "Journal" THEN$h/editor ELSE$h/author } </holding> ORDER BYtitle

  39. Quantifiers FOR$b IN //book WHERESOME$p IN $b//paraSATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN$b/title Existential Quantifiers FOR$b IN //book WHEREEVERY$p IN $b//paraSATISFIES contains($p, "sailing") RETURN$b/title Universal Quantifiers

  40. Other Stuff in XQuery • BEFORE and AFTER • for dealing with order in the input • FILTER • deletes some edges in the result tree • Recursive functions • Currently: arbitrary recursion • Perhaps more restrictions in the future ?

  41. Group-By in XQuery ?? • No GROUPBY currently in XQuery • A recent proposal (next) • What do YOU think ?

  42. Group-By in XQuery ?? FOR$bIN document("http://www.bn.com")/bib/book, $yIN$b/@year WHERE$b/publisher="Morgan Kaufmann" RETURNGROUPBY $y WHERE count($b) > 10 IN <year> $y </year>  with GROUPBY SELECT year FROM Bib WHERE Bib.publisher="Morgan Kaufmann" GROUPBY year HAVING count(*) > 10 Equivalent SQL 

  43. Group-By in XQuery ?? FOR $b IN document("http://www.bn.com")/bib/book,$a IN $b/author,$y IN $b/@yearRETURN GROUPBY $a, $y IN <result> $a, <year> $y </year>, <total> count($b) </total> </result>  with GROUPBY FOR $Tup IN distinct(FOR $b IN document("http://www.bn.com")/bib,$a IN $b/author,$y IN $b/@year RETURN <Tup> <a> $a </a> <y> $y </y> </Tup>),$a IN $Tup/a/node(),$y IN $Tup/y/node() LET $b = document("http://www.bn.com")/bib/book[author=$a,@year=$y] RETURN <result> $a, <year> $y </year>, <total> count($b) </total> </result> Without GROUPBY 

  44. Group-By in XQuery ?? FOR$bIN document("http://www.bn.com")/bib/book,$aIN$b/author,$yIN$b/@year,$tIN$b/title,$pIN$b/publisherRETURNGROUPBY$p, $yIN <result> $p, <year> $y </year>,GROUPBY$aIN <authorEntry> $a,GROUPBY $tIN$t <authorEntry> </result>  Nested GROUPBY’s

  45. FOR/LET Clauses List of tuples of bounded variables WHERE Clause List of pruned tuples of bounded variables RETURN Clause Instance of XQuery data model XQuery Summary:[Demo] FOR-LET-WHERE-RETURN = FLWR

More Related