1 / 61

Query Languages for XML

Query Languages for XML. XPath. The XPath Data Model. Given an XML document, XPath operates on it and produces values that are sequences of items . An item is either: A value of primitive type, e.g., integer or string; A node (defined next). Principal Kinds of Nodes.

Download Presentation

Query Languages for XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Languages for XML XPath

  2. The XPath Data Model • Given an XML document, XPath operates on it and produces values that are sequences of items. • An item is either: • A value of primitive type, e.g., integer or string; • A node (defined next).

  3. Principal Kinds of Nodes • Documents represent entire XML documents. • Local path name or a URL. • Elements are pieces of a document consisting of some opening tag, its matching closing tag (if any), and everything in between. • Attributes are names that are given values inside opening tags.

  4. Document Nodes • Formed by doc(filename) • filename can be a local name or a URL. • Example: doc(“univ.xml”) or doc(“/mydir/univ.xml”) • All Xpath queries refer to a doc node, either explicitly or implicitly. • Example: key definitions in XML Schema have XPath expressions that refer to the document described by the schema.

  5. Running Example <!DOCTYPE UNIV [ <!ELEMENT UNIV (STUDENT*, COURSE*)> <!ELEMENT STUDENT (NAME, MARK+)> <!ATTLIST STUDENT sno ID #REQUIRED> <!ELEMENT NAME (#PCDATA)> <!ELEMENT MARK (#PCDATA)> <!ATTLIST MARK theCNO IDREF #REQUIRED> <!ELEMENT COURSE (#PCDATA)> <!ATTLIST COURSE cno ID #REQUIRED> ]>

  6. Example Document An element node An attribute node <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV> Document node is all of this, plus the header ( <? xml version… ?>).

  7. Nodes as Semistructured Data univ.xml UNIV sno = ”007” cno = ”CS123” STUDENT COURSE theCNO = ”CS456” DB MARK theCNO = ”CS123” NAME MARK 98 James Bond 80

  8. Paths in XML Documents • XPath is a language for describing paths in XML documents. • The result of the described path is a sequence of items.

  9. Path Expressions • Simple path expressions are sequences of slashes (/) and tags, starting with /. • Example: /UNIV/STUDENT/MARK • Construct the result by starting with just the doc node and processing each tag in turn from the left.

  10. Evaluating a Path Expression • Assume the first tag is the root. • Processing the doc node by this tag results in a sequence consisting of only the root element. • e.g., /UNIV • Suppose we have obtained a sequence of items from processing the previous tags, and the next tag is X. • For each item that is an element node, replace the element by all its subelements with tag X.

  11. Example: /UNIV One item, the UNIV element <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>

  12. Example: /UNIV/STUDENT This STUDENT element followed by all the other STUDENT elements <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>

  13. Example: /UNIV/STUDENT/MARK These MARK elements followed by the MARK elements of all the other STUDENTs. <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>

  14. Relative Path • We can use XPath expressions that are relative to the current node or sequence of nodes. • Do not start with /. • Example • If we have arrived at node /UNIV, then we can use relative path STUDENT/NAME or COURSE to describe its subelements. Lu Chaojun, SJTU

  15. Attributes in Paths • Instead of going to subelements with a given tag, you can go to an attribute of the elements you already have. • An attribute is indicated by putting @ in front of its name.

  16. Evaluating Attributes • When a path expression ends in an attribute, the result is typically a sequence of values of primitive type.

  17. Example: /UNIV/STUDENT/MARK/@theCNO These attributes contribute ”CS123” ”CS456” to the result, followed by other theCNO values. <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>

  18. Paths that Begin Anywhere • If the path begins with //X, then the first step can begin at the root or any subelement of the root, as long as the tag is X. • In fact, //X can appear anywhere in a path. • e.g., /UNIV//NAME • // is the shorthand of a kind of axis. (see next slide)

  19. Axes: Modes of Navigation • In general, path expressions allow us to start at the root and execute steps to find a sequence of nodes. • At each step, we may follow any one of several axes. • The default axis is child:: --- go to all the children of the current set of nodes. • Shorthand: /

  20. Example: Axes • /UNIV/STUDENT is really shorthand for /child::UNIV/child::STUDENT. • @ is really shorthand for the attribute:: axis. • Thus, /UNIV/STUDENT/@sno is shorthand for /child::UNIV/child::STUDENT/attribute::sno

  21. More Axes • Some other useful axes are: • parent:: = parent(s) of the current node(s) • Shorthand: .. • self • Shorthand: the dot • descendant-or-self:: = the current node(s) and all descendants • Shorthand: // • ancestor, ancestor-or-self, next-sibling, etc.

  22. Wild-Card * • A star (*) in place of a tag represents any one tag. • Example: /*/*/NAME represents all NAME elements at the third level of nesting. • @* represents any attribute.

  23. Example: /UNIV/* This STUDENT element, all other STUDENT elements, the COURSE element, all other COURSE elements <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>

  24. Selection Conditions • A condition inside […] may follow a tag. • If so, then only paths that have that tag and also satisfy the condition are included in the result of a path expression. • Comparisons have an implied “there exists” sense: two sequences are related if any pair of items, one from each sequence, are related by the given comparison operator.

  25. Example: Selection Condition The condition that the MARK be < 90 makes this but not the CS123 mark part of the result. /UNIV/STUDENT/MARK[. < 90] <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>

  26. Example: Attribute in Selection Now, this MARK element is selected, along with any other marks for CS123. /UNIV/STUDENT/MARK[@theCNO = ”CS123”] <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>

  27. Other Forms of Conditions • Here are some useful forms of conditions: X[i] = true for ith child of its parent X[T] = true for X having subelement with tag T X[A] = true for X having attribute A Lu Chaojun, SJTU

  28. Query Languages for XML XQuery

  29. XQuery • XQuery extends XPath to a query language that has power similar to SQL. • Uses the same sequence-of-items data model. • XQuery is an expression/functional language. • Any XQuery expression can be an argument of any other XQuery expression. • Like relational algebra

  30. More About Item Sequences Empty sequence • XQuery will sometimes form sequences of sequences. • All sequences are flattened. • Example: (1 2 () (3 4)) = (1 2 3 4).

  31. FLWR Expressions • Zero or more for and/or let clauses. • Then an optional where clause. • Exactly one return clause.

  32. Semantics of FLWR Expressions • Each for creates a loop. • let produces only a local definition. • At each iteration of the nested loops, if any, evaluate the where clause. • If the where clause returns TRUE, invoke the return clause, and append its value to the output.

  33. FOR Clauses for var in exp, ... • Variables begin with $. • A for-variable takes on each item in the sequence denoted by the expression, in turn. • Whatever follows this for is executed once for each value of the variable.

  34. Example: FOR “Expand the en- closed string by replacing variables and path exps. by their values.” for $c in doc(”univ.xml”)/UNIV/COURSE/@cno return <CNO> {$c} </CNO> • $c ranges over the cno attributes of all courses in our example document. • Result is a sequence of CNO elements: <CNO>CS123</CNO> <CNO>CS456</CNO> . . .

  35. Use of Braces • When a variable name like $x, or an expression, could be text, we need to surround it by braces to avoid having it interpreted literally. • Example: <A>$x</A> is an A-element with value ”$x”. <A>{$x}</A> is correct. • But return $x is unambiguous. • You cannot return an untagged string without quoting it, as return ”$x”.

  36. LET Clauses let var := exp, ... • Value of the variable becomes the sequenceof items defined by the expression. • Note let does not cause iteration; for does.

  37. Example: LET let $d := document(”univ.xml”) let $c := $d/UNIV/COURSE/@cno return <CNO> {$c} </CNO> • Returns one element with all the course numbers, like: <CNO>CS123</CNO>

  38. Order-By Clauses • FLWR is really FLWOR: an order-by clause can precede the return. • Form: order by <expression> • With optional ascending or descending. • The expression is evaluated for each assignment to variables. • Determines placement in output sequence.

  39. Example: Order-By Generates bindings for $p to MARK elements. Order those bindings by the values inside the elements (auto- matic coersion). Each binding is evaluated for the output. The result is a sequence of MARK elements. • List all prices for Bud, lowest first. let $d := document(”univ.xml”) for $p in $d/UNIV/STUDENT/MARK[@theCNO=”CS123”] order by $p return $p

  40. Predicates • Normally, conditions imply existential quantification. • e.g., for two sequences of items to be equal, we have only to find any pair of items, one from each side, that equate.

  41. Strict Comparisons • To require that the things being compared are sequences of only one element, use comparison operators: eq, ne, lt, le, gt, ge. • Example: $x/NAME eq ”James Bond” is true if somebody is the only person named “James Bond”.

  42. Boolean Values in XQuery • The effective boolean value (EBV) of an expression is: • The actual value if the expression is of type boolean. • FALSE if the expression evaluates to 0, ”” [the empty string], or () [the empty sequence]. • TRUE otherwise.

  43. Comparison of Elements and Values • When an element is compared to a primitive value, the element is treated as its value, if that value is atomic. • Example: /UNIV/STUDENT[@sno=”007”]/ MARK[@theCNO=”CS123”] eq ”80” is true if 007 get 80 for CS123.

  44. Comparison of Two Elements • It is insufficient that two elements look alike. • Example: /A[@C=”C1”]/B eq /A[@C=”C2”]/B is false. • For elements to be equal, they must be the same, physically, in the implied document.

  45. Getting Data From Elements • Suppose we want to compare the values of elements, rather than their location in documents. • To extract just the value (e.g., the mark itself) from an element E, use data(E).

  46. Eliminating Duplicates • Use function distinct-values applied to a sequence. • Subtlety: this function strips tags away from elements and compares the string values. • But it doesn’t restore the tags in the result. • Example return distinct-values( let $d= doc(”univ.xml”) return $d/UNIV/STUDENT/MARK)

  47. Quantifier Expressions some $x in E1 satisfies E2 • Evaluate the sequence E1. • Let $x (any variable) be each item in the sequence, and evaluate E2. • Return TRUE if E2 has EBV TRUE for at least one $x. • Analogously: every $x in E1 satisfies E2

  48. Aggregations • Take sequence as argument, and return count, sum, max, etc. • Example let $d := doc(“univ.xml”) for $s in $d/UNIV/STUDENT where count($s/MARK) > 100 return $s Lu Chaojun, SJTU

  49. Branching Expressions • if (E1) then E2 else E3 is evaluated by: • Compute the EBV of E1. • If true, the result is E2; else the result is E3. • Example if($x/@sno eq ”007”) then $x/NAME else ()

  50. Query Languages for XML XSLT

More Related