1 / 106

5 Querying XML

5 Querying XML. How to access various XML data sources? XQuery , XML Query Lang, W3C Rec , Jan '07 joint work by XML Query and XSL WGs with XPath 2.0 and XSLT 2.0 Started ~1999; 2 nd Ed. in Dec 2010 influenced by many research groups and query languages

sydney
Download Presentation

5 Querying XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 5 Querying XML • How to access various XML data sources? • XQuery, XML Query Lang, W3C Rec, Jan '07 • joint work by XML Query and XSL WGs • with XPath 2.0 and XSLT 2.0 • Started ~1999; 2nd Ed. in Dec 2010 • influenced by many research groups and query languages • Quilt, XPath, XQL, XML-QL, SQL, OQL, Lorel, ... • A query language for any XML-represented data: both documents and databases 5. Querying XML with XQuery

  2. Outline of this section • Quick overview of XQuery • Review of XPath, emphasizing XPath 2.0 vs 1.0 • items, types, and sequences • tree model, location path expressions; comparison operators • Central features of XQuery (over those of XPath 2.0) • element constructors, FLWOR expressions • use cases, user-defined functions, querying relational data • Comparison of XQuery and XSLT 1.0 • XQuery for problem solving • examples of application to puzzles • Summary 5. Querying XML with XQuery

  3. Capabilities of XQuery (1) • XQuery allows to select, reorganize and transform XML data • respecting document content, structure, hierarchy, and order • Selection, filtering, and search • Combine and join • data from different parts of a document, or from multiple documents • Sort, group, and aggregate • Transform, restructure and create XML data • Operate on numbers and dates • Manipulate content strings 5. Querying XML with XQuery

  4. Capabilities of XQuery (2) • Closure property: • Results of XML queries are also XML (well-formed document fragments) • > queries can be combined, without limit • Extensibility: • supports user-defined functions on all data types of the data model • In-place update of XML data not supported • specified in “XQuery Update Facility 1.0”, W3C Rec. March 2011 5. Querying XML with XQuery

  5. XQuery in a Nutshell • Functional expression language (lausekekieli) • Strongly-typed: (optional) type-checking of expressions, and validation of results (We’ll concentrate to processing) • predeclaredprefix for typenames:xs="http://www.w3.org/2001/XMLSchema" • Extends XPath 2.0 • XQuery 1.0 and XPath 2.0Functions and Operators, Rec. Jan. 2007 • over 100; for numbers, strings, dates and times, Booleans, documents & URIs, nodes, and sequences • XQuery XPath 2.0 + XSLT' + SQL' (roughly) 5. Querying XML with XQuery

  6. Example Query xquery version "1.0"; (: optionaldeclaration :) <cheapBooks> <Title>CheapBooks</Title> { for $b indoc("bib.xml")//book[@price < 50]orderby $b/titlereturn $b } </cheapBooks> • Syntax "concise and easilyunderstood" • XML-basedsyntax (XQueryX) hasalsobeenspecified • easier for applications, harder for humans 5. Querying XML with XQuery

  7. A possible result <?xml version="1.0" encoding="UTF-8"?><cheapBooks> <Title>Cheap Books</Title> <book price="26.50"> <title>Computing with Logic</title> <author>David Maier</author> <publisher>Benjamin Cummings</publisher> <year>1999</year> </book> <book price="40.00"> <title>Designing Internet applications</title> <author>Michael Leventhal</author> <publisher>Prentice Hall</publisher> <year>1998</year> </book></cheapBooks> 5. Querying XML with XQuery

  8. XQuery and XPath • XQuery is an extension of XPath (2.0) • Common data model, 108 functions and 68 operators • > review some XPath first • XPath used in several other contexts, too: • For pattern matching and selection in XSLT • in validation rules of Schematron • For uniqueness constraints in XML Schema • For addressing in XLink and XPointer 5. Querying XML with XQuery

  9. XPath in a Nutshell • XPath 1.0 (W3C Rec. 11/99) • a compact non-XML syntax for addressing parts of XML documents (as node-sets) • also operations on strings, numbers and truth values • XPath 2.0 (W3C Rec. 1/07) extends and generalizes: • data manipulated as sequences of items • Item = a node or an atomic value of a simple XML Schema datatype 5. Querying XML with XQuery

  10. Literal Atomic Values and Their Types • Examples: "-12" instance of xs:string -12 instance of xs:integer 1.2 instance of xs:decimal 1.2E3 instance of xs:double string(1.2E3) instance of xs:string number("+12") instance of xs:double xs:date("2009-05-11") instance of xs:date true() instance of xs:boolean 5. Querying XML with XQuery

  11. XPath 2.0/XQuery Type Hierarchy 5. Querying XML with XQuery

  12. XPath 2.0/XQuery Type Hierarchy (cont.) 5. Querying XML with XQuery

  13. XQuery/XPath 2.0 Sequences • Expressions operate on, and return sequences of • atomic values (of simple XML Schema types) and • nodes • an item a singleton sequence • sequences are flat: no sequences as items • (1, (2, 3), (), 1) = (1, 2, 3, 1) • sequences are ordered, and can contain duplicates • Unlimited combination of expressions, often with automatic type conversions (e.g. for arithmetics) 5. Querying XML with XQuery

  14. Sequence Expressions • Constant sequences constructed by listing values • comma (,) is a catenation operator • (1, (2, 3), (), 1) = (1, 2, 3, 1) • Range expressions for integer sequences: • 1 to 4 • 4 to 1 • reverse(1 to 4) ->(1, 2, 3, 4) ->() ->(4, 3, 2, 1) 5. Querying XML with XQuery

  15. Accessing Documents • XQuery operates on nodes accessible by input functions • fn:doc("URI") • document-node of the XML document available at URI • roughly same as document("URI") in XSLT 1.0 • fn:collection("URI") • sequence of nodes from URI • association defined by implementation • predeclared prefix for the default function namespace: fn="http://www.w3.org/2005/04/xpath-functions" 5. Querying XML with XQuery

  16. XQuery/XPath 2.0 Data Model • Documents are viewed as treesmade of six types of nodes: • document (additional root above document element) • element nodes • attribute nodes • text nodes • Comments and processing instructions • Obs 1: No entity nodes, and no CDATA sections • Obs 2: Namespace nodes have been deprecated 5. Querying XML with XQuery

  17. Document Trees • Defined in Sect. 5 of XPath 1.0 spec • for XSLT/XPath 2.0 & XQuery in their joint Data Model • Element nodes have elements, text nodes, comments and processing instructions of their (direct) content as their children • NB: attribute nodes are not children (but have a parent) • > they have no siblings either • the stringvalue of an document/element is the concatenation of its all text-node descendants 5. Querying XML with XQuery

  18. Document Order • Document order of nodes: • = their left-to-right pre-order • Document root first • Other nodes in the order of the first character of their XML markup in the document text • > an element precedes it's attribute nodes, which precede any content nodes of the element • Order btw nodes belonging to different trees is implementation dependent, but consistent and stable 5. Querying XML with XQuery

  19. Location Paths • XPath can select any parts of a document tree using … • Location paths • evaluated with respect to a context item (.) • assigned on path steps, after the first one • Path expression typically starts with $x or doc(…) • Result: sequence of nodes, in document order, without duplicates 5. Querying XML with XQuery

  20. Path Expressions • Similar to XPath 1.0: [/ [/]]Expr/… /Expr • but steps more liberal: • arbitrary expressions OK, but steps before the last one must produce node sequences • 6 (of 13 XPath) axes required: child, descendant, attribute, self, descendant-or-self, parent • others (except namespace) optional, available if the Full Axis Feature is supported • with document-order operators (<<, >>) sufficient for expressing queries (→ Exercises) 5. Querying XML with XQuery

  21. Location paths • Consist of location steps separated by '/' • each step produces a sequence of items • steps evaluated left-to-right, each item in turn as the context item • Complete location step: AxisName::NodeTest ([PredicateExpr])* • axis specifies the tree relationship between the context node and the selected nodes • node test restricts the type and and name of nodes • filtered further by 0 or more predicates 5. Querying XML with XQuery

  22. Location steps: Axes • In total 12 axes (~ directions in tree) • for staying at the context node: self • for going downwards: • child, descendant, descendant-or-self • for going upwards: • parent, ancestor, ancestor-or-self • for moving towards start/end of the document: • preceding-sibling, following-sibling, preceding, following • “Special” axes • attribute(namespace deprecated in XPath 2.0) • (Axes required in XQuery implementations underlined) 5. Querying XML with XQuery

  23. Notes on Location Paths (1) • XPath 2.0 allows unrestricted expressions as steps • but intermediate steps must produce nodes only • Numeric predicates support array-style access: $rows[$i] • Predicates evaluated step at a time. This sometimes causes confusion with shorthand notations: • doc("doc.xml")//title[3]  third title child of each parent (likely none!). Why? • = doc("doc.xml")/ descendant-or-self::node()/child::title[3] • To get the third title in the doc use(doc("doc.xml")//title)[3] 5. Querying XML with XQuery

  24. Notes on Location Paths (2) • References to attributes and subelements easy to use as predicates • Get divisions that are of class C or have a head:doc("doc.xml")//div[@class="C" or head] • Values are coerced to Booleans on demand • string/sequence  true iff non-empty • number  false if and only if zero or NaN • (but a single number as a predicate tests for equality with position()) 5. Querying XML with XQuery

  25. Filter Expressions • Location steps can be filtered by predicates: doc("foo.xml")/body/(chap | app)[last()]/title the title of the last chapter of appendix, whichever is last • Other sequences, too:(1 to 20)[. mod 5 eq 0] → (5, 10, 15, 20) • ('.' generalized from XPath 1.0 shorthand for self::node() into the context item) XPath 2.0 extendedstep 5. Querying XML with XQuery

  26. Path Steps as a Map operator • XPath 2.0 path exprs provide a kind-of map facility, to compute a new sequence by evaluating an expression for each item of the input sequence • Example: Get all salaries incremented by 20%: doc("emps.xml")//emp/@salary/(. * 1.2) • Useful tricks, like providing defaults for missing attributes: $emp/(@salary,text{0.0})[1]/(. * 1.2) 5. Querying XML with XQuery

  27. Path Steps as a Map (2) • Limitation: steps are applicable to node sequences only. Example: an invalidattemptto square numbers 1, 2, ..., 10: (1 to 10)/(. * .) • Work-around: translate items first to text nodes: (for $i in 1 to 10 return text{ $i })/(. * .) or simply: for $i in 1 to 10 return $i * $i • Function calls can also be used as steps: myFun:toTextNodes(1 to 10)/myFun:square(.) 5. Querying XML with XQuery

  28. $s1 union $s2 = $s1 intersect $s2 = $s1 except $s2 = Set Operations on Node Sequences • Assume variable bindings: $s1 $s2 a b c d e • Then: w.o. duplicates, in doc. order a b c d e c based on node indentity (node1is node2) a b 5. Querying XML with XQuery

  29. Node Comparisons • To compare single nodes, • for identity: is$book//chap[@id="ch1"] is ($book//chap)[1]true iff the chapter with id="ch1" is the first chap • for document order: <<and>> $book//chap[@id="ch2"] >> $book//title[. eq "Intro"]true iff the chapter with id="ch2" appears after<title>Intro</title> • if either operand is empty, then result is empty (~ false) 5. Querying XML with XQuery

  30. Comparing values of sequences and items • General comparisons btw sequences: • =, !=, <, <=, >, >= • existential semantics: true iffsome pair of values from operand sequences satisfy the condition • (1,2) = (2,3); (2,3) = (3,4); (1,2) != (3,4) • Same as in XPath 1.0: //book[author = "Aho"]→ books where some author is Aho • "Is (some) author of $book Ann or Bob?":$book/author = ("Ann", "Bob") • Slice of $seqfrom pos $s to $e: 5. Querying XML with XQuery

  31. Set operations for sequences of atomic items • General comparison as a predicate yields set operations: • Union of $A and $B: • Intersection of $A and $B: • Difference of $A and $B: • Above comparisons require items of compatible types 5. Querying XML with XQuery

  32. Value Comparisons • For comparing single values: • eq, ne, lt, le, gt, ge • 1 eq 3 - 2; 10 lt 20 • $books[@price le 100] • the last assumes that a numeric type has been assigned to @price by validation • otherwise it has type xs:untypedAtomic, which is cast to xs:string( TYPE ERROR)  general comparisons more convenient with unvalidated elements & attributes 5. Querying XML with XQuery

  33. Working with Untyped Values • Text values may receive a specific type in a schema-validated element or attribute; Otherwise their type is xs:untypedAtomic • Automatic atomization and casting make dealing with them easy. Example: let $elem := <e>2.718281828</e> return ( "Value of", concat(substring($elem, 1, 6), ".. is about"), round-half-to-even($elem, 2) ) -> Value of 2.71828.. is about 2.72 5. Querying XML with XQuery

  34. General vs Value Comparisons wrt Types • Comparisons atomize operands: nodes  typed values • Assume that $E := <E>007</E> • Generalcomparisonstry to cast xs:untypedAtomic operands to compatible types: $E < 6 (: false: xs:untypedAtomic(007) -> xs:double(007) = 7 :), $E < "6" (: true: xs:untypedAtomic(007) -> "007" :), • Value comparisons cast xs:untypedAtomic values to strings: $E lt "6" (: true: xs:string("007") lt "6" :), $E lt 6  TYPE ERROR: cannot compare xs:untypedAtomic to xs:integer 5. Querying XML with XQuery

  35. What does XQuery add to XPath 2.0? • A query is an expression (lauseke) • any XPath expression is a query • XQuery adds to XPath expressions • Element constructors ( XSLT templates) • FLWOR expressions (”flower”; for-let-where-order by-return) 5. Querying XML with XQuery

  36. Central XQuery Expressions • Path expressions • Sequence expressions • Comparison operators • Conditionals: if (..) then .. else .. • Quantified expressions (some/every $varin … satisfies …) • Element constructors ( XSLT templates) • FLWOR expressions (”flower”; for-let-where-order by-return) • XPath 2.0 has a simpler for-return expression also in XPath 2.0 5. Querying XML with XQuery

  37. Example: Quantified Expression • Find book elements which have at least 10 sections in each of their chapters : doc(”Books.xml”)//book[ every $c in .//chaptersatisfiescount($c//section) ge 10 ] 5. Querying XML with XQuery

  38. Element Constructors • Direct element constructors ~ XSLT templates: • start and end tag enclosing the content • literal fragments written directly, expressions enclosed in braces {and } ≈ XSLT 1.0 attribute value templates • often used inside another expression that binds variables used in the element constructor • (There is no 'current node' in XQuery) • See next 5. Querying XML with XQuery

  39. Example • An emp element with an empid attribute and child elements name and job, from values in variables $id, $n, and $j: <empempid="{$id}"> <name>{$n}</name> <job>{$j}</job> </emp> Also computed constructors: element {"emp"} { attribute {"empid"}{$id}, <name> {$n} </name>, <job> {$j} </job> } 5. Querying XML with XQuery

  40. Identity of Component Nodes • Each node has node identity, and at most one parent. Existing nodes are copied before they get a new parent. • Example: let $x := <e>Hi</e>, $y := <p>{$x}</p> return not($x is $y/e) and deep-equal($x, $y/e) -> true 5. Querying XML with XQuery

  41. FLWOR ("flower") Expressions • Constructed from for, let, where, order by and return clauses (~SQL select-from-where) • Syntax: (ForClause | LetClause)+ WhereClause? OrderByClause? "return" Expr • FLWOR binds variables to values, and uses these bindings to construct a result (an ordered sequence of items) 5. Querying XML with XQuery

  42. Flow of data in a FLWOR expression tuple = monikko/rivi sequnce of items 5. Querying XML with XQuery

  43. for clauses • for $V1inExp1 (, $V2inExp2, …) • associates each variable Vi with expression Expi (e.g. a path expression) • Result: list of tuples, each containing a binding for each of the variables • can be though of as loops iterating over the items returned by respective expressions 5. Querying XML with XQuery

  44. Example: for clause for $i in (1,2), $j in (1 to $i)return <tuple> <i>{$i}</i> <j>{$j}</j></tuple> Result: <tuple><i>1</i><j>1</j></tuple> <tuple><i>2</i><j>1</j></tuple> <tuple><i>2</i><j>2</j></tuple> 5. Querying XML with XQuery

  45. let clauses • let also binds variables to expressions • each variable gets the entire sequence as its value (without iterating over the items of the sequence) • results in binding a single sequence for each variable • Compare: • for $b in doc("bib.xml")//book-> many bindings (to single books) • let $bl := doc("bib.xml")//book-> a single binding (to sequence of books) 5. Querying XML with XQuery

  46. Example: let clauses let $s := (<one/>, <two/>, <three/>) return <out> {$s} </out> Result: <out> <one/> <two/> <three/> </out> for $s in (<one/>,<two/>,<three/>) return <out> {$s} </out> --><out><one/></out> <out><two/></out> <out><three/></out> 5. Querying XML with XQuery

  47. for/let clauses • A FLWOR expr may contain several fors and lets • each may refer to variables bound in previous clauses • the result of the for/let sequence: • an ordered list of tuples (monikko) of bound variables • number of tuples = product of the cardinalities of the sequences returned by the for expressions 5. Querying XML with XQuery

  48. where clause • binding tuples generated by for and let clauses are filtered by an optional where clause • tuples with a true condition are used to instantiate the return clause • the where clause may contain several predicates connected by and, or, and fn:not() • usually refer to the bound variables • sequences as Booleans (similarly to node-sets in XPath 1.0): empty ~ false; non-empty ~ true 5. Querying XML with XQuery

  49. where clause • for binds variables to single items ->value comparisons, e.g. $color eq"red" • let to whole sequences ->general comparisons, e.g. $colors = "red" (~ some $c in $colors satisfies $c eq "red") • a number of aggregation functions available: avg(), sum(), count(), max(), min() (also in XPath 1.0) 5. Querying XML with XQuery

  50. return clause • The return clause generates the output of the FLWOR expression • instantiated once for each binding tuple • often contains element constructors, references to bound variables, and nested sub-expressions 5. Querying XML with XQuery

More Related