Before The Class • What’re the roles of us in this subject? • A researcher • A developer • A problem solver • IT related technologies • What’s really confused you? • XML: a markup language to model specific data from a specific domain • The architecture of XML application • The process of utilising related technologies (XML-based) as a problem solver
XML Path Language (XPath) Address the nodes of XML doc. trees
Introduction • Nodes • Location Paths • Axes & Node Tests • Using Axes and Node Tests • Syntax and Expression • Operations and Data Types • Examples • Functions • Summary
Introduction – Why XPath? • During the development of eXtensible StyleSheet Language Transformation (XSLT), another member of the XML family, known as XPointer was defined. • XPointer takes the idea of anchor tags to a new level. • Both XPointer and XSLT needed a way to point to various parts of a document. • XSLT needs it to select the part of a document that would be transformed • XPointer for linking two documents. • The solution was to provide a common syntax and semantics that both XSLT and XPointer could use.
Introduction – Why XPath? (cont.) • Besides, XML provides a rich, flexible and efficient way to mark up data, • but XML does not provide a way to locate specific part of nodes within a given document. • XML processing steps vs. XPath • An XML doc. would need to be parsed and then searched through element by element to reach specific parts (DOM, SAX) • Trouble: for large document, inefficient and error prone • XPath: providing expression syntax for other XML technologies to locate specific parts • Effectively and efficiently • This new subset was called XPath
Introduction • What is XPath? • The XML Path Language (XPath) is a set of syntax and semantics for referring to portions of XML documents • A language for finding information in an XML document • XPath is a string-based (expression) instead of structure-based language (XML) • A syntax for defining parts of an XML document • Uses path expressions to navigateelements and attributes (nodes)in XML documents • XPath contains a library of standard functions (for accessing the nodes) • Intended to be used by other specifications such as XSL, XSL Transformations (XSLT) and the XML Pointer Language (XPointer) and others.
Introduction (cont.) • The XPath specification is the foundation for a variety of specifications, • Including XSLT and linking/addressing specifications like XPointer. • An understanding of XPath is fundamental to a lot of advancing XML technology usages. • XPath core: Path Expressions • XPath uses path expressions to select nodes or node-sets in an XML document. • These path expressions look very much like the expressions you see when you work with a traditional computer file system. (XPath uses a notation with forward slashes (/) similar the UNIX shell ) • The basic idea is to recall: a tree is much like the structures of files and folders on a hard drive
Introduction (cont.) • XPath Standard Functions • XPath includes over 100 built-in functions (refer to W3C XPath document) • There are functions for: • string values, numeric values, date and time comparison, node and QName manipulation, sequence manipulation, Boolean values, and more • XPath is Used in XSLT • A major element in the XSLT standard. • XSLT uses XPath extensively for matching -- testing whether a node matches a given pattern. • XSLTspecifies the context used by XPath. • You should understandXPath if you want to work with XSLT • Without XPath knowledge you will not be able to create XSLT documents (different documents).
XPath the locations of document structures or data finding the information using XPath XSLT XPoint XQuery ………… XML document Generate Variable documents process the information Introduction (cont.) • XPath is a W3C Standard • XPath became a W3C Recommendation 16. November 1999. • XPath was designed to be used by XSLT, XPointer and other XML parsing software • XPath vs. XSLT, XPoint, XQuery, …
XPath Version • XPath 2.0 • XPath 2.0 is a superset of XPath 1.0 and currently a W3C Working Draft. • Two W3C working groups are working on version 2.0 of XPath: the XML Query Working Group and the XSL Working Group. • XPath 2.0 has more power and is more robust because it supports a broader set of data types. • XPath 2.0 values use XML Schema types rather than simple strings, numbers, or booleans. • XPath 2.0 is backward-compatible so that 1.0 expressions behave the same in 2.0, with exceptions listed in the specification.
XPath Nodes – use to model a document tree • An XML document • Are treated as trees of nodes (Tree structure with nodes) • Each node represents part of an XML document • Seven types of node • Root • Element • Attribute • Text • Comment • Processing instruction • Namespace • Attributes and namespaces are not children of their parent node • They describe their parent nodes
XPath Nodes • An XPath tree has a single root node <?xml version = “1.0”?> • Each XPath tree node has a string representation • Called a string-value, only text() nodes, not attribute and namespace nodes • Example: <book> Web Programming II </book> • ? Example II: <book> <title> Web </title> Text Book </book> • Document order: Nodes in an XPath tree have an ordering • Determined by the order appeared in the original XML doc. • Expanded-name of certain nodes (refers to Fig 11.5) • Local part: tag name • Namespace URI: prefix
An Example of Presenting an XML Doc. In XPath <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="AuctionItemSummary-Base.xsl"?> <list xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://MyXPathExample.com AuctionItemList.xsd"> <item type="BidOnly" private="false" id="itemId0001"> <bidIncrement>1</bidIncrement> <currentPrice currency="USD">5</currentPrice> <endOfAuction>56</endOfAuction> <description>Miles Smiles album, CD</description> <sellerId>miles1965</sellerId> </item> <item type="FixedPrice" private="true" id="itemId0302"> <bidIncrement>0</bidIncrement> <currentPrice currency="GBP">70</currentPrice> <endOfAuction>2</endOfAuction> <description>New 256m MP3 player</description> <sellerId>jimmy</sellerId> </item> ……………………………… </list>
1 <?xml version = "1.0"?> Root node 2 Comment nodes 3 <!-- Fig. 11.1 : simple.xml --> 4 <!-- Simple XML document --> Element nodes Attribute nodes 5 6 <book title = "C++ How to Program"edition = "3"> 7 Text nodes 8 <sample> 9 <![CDATA[ 10 11 // C++ comment 12 if ( this->getX() < 5 && value[ 0 ] != 3 ) 13 cerr << this->displayError(); 14 ]]> 15 </sample> 16 17 C++ How to Program by Deitel & Deitel 18 </book> Fig. 11.1 Simple XML document. Root node Comment nodes Element nodesAttribute nodesText nodes
Fig. 11.2 XPath tree for Fig. 11.1. Root CommentFig. 11.1 : simple.xml CommentSimple XML document Elementbook AttributeTitleC++ How to Program Attributeedition 3 Elementsample Text// C++ comment if (this -> getX() < 5 && value[ 0 ] != 3 ) cerr << this->displayError(); TextC++ How to Program by Deitel & Deitel
1 <?xml version = "1.0"?> Root node 2 Comment nodes 3 <!-- Fig. 11.3 : simple2.xml --> 4 <!-- Processing instructions and namespacess --> 5 Namespace nodes Processing instruction node 6 <html xmlns = "http://www.w3.org/TR/REC-html40"> Element nodes 7 Text nodes 8 <head> 9 <title>Processing Instruction and Namespace Nodes</title> Attribute nodes 10 </head> 11 12 <?deitelprocessor example = "fig11_03.xml"?> 13 14 <body> 15 16 <deitel:book deitel:edition = "1" 17 xmlns:deitel = "http://www.deitel.com/xmlhtp1"> 18 <deitel:title>XML How to Program</deitel:title> 19 </deitel:book> 20 21 </body> 22 23 </html> Fig. 11.3 XML document with processing-instruction and namespace nodes.Root nodeComment nodesNamespace nodesProcessing instruction nodeElement nodesText nodesAttribute nodes
Fig. 11.4 Tree diagram of an XML documentwith a processing-instruction node Root CommentFig. 11.3 : simple2.xml CommentProcessing instructions and namespaces Elementhtml Namespacehttp://www.w3.org/TR/REC-html40 Elementhead Elementtitle TextProcessing instructions and Namespcae Nodes Continued on next slide...
Fig. 11.4 Tree diagram of an XML document with a processing-instruction node. (Part 2) Continued from previous slide Processing Instructiondeitelprocessorexample = "fig11_03.xml" Elementbody Elementbook Attributeedition1 Namespacehttp://www.deitel.com/xmlhtp1 Elementtitle TextXML How to Program
Location Paths • What we already know? • The structure of an XML document in XPath • How can we use the structure to locate particular parts of a document? (Location path) • Just like we use to locate a specific file or folder in a file system? • The most useful and widely used feature of XPath • An expression specifying how to navigateXPath tree
Location Paths (cont.) • A location path is composed of location steps • Each location step composed of • Each location step has the following form: axis-name :: node-test [predicate]* • Axis:specifies the tree relationship between the nodes selected by the location step and the context node • Node test:specifies the node type and expanded-name of the nodes selected by the location step • Predicate:use arbitrary expressions to further refine the set of nodes selected by the location step
Location Paths (cont.) • The location path sets the context of the node that you're trying to find. • The context is set using the location path of the root (match="/"). • To code a location path, you can use an abbreviated or non-abbreviated syntax. • “abbreviated” is the most widely used; the unabbreviated syntax is also more complex • You might need to check which one your parser supports. • An example of the non-abbreviated syntax is shown below. (From the May 2000 release, the MSXML parser supports both.) • child::para selects the para element children of the context node
Location Paths (cont.) • Two kinds of location path: relative location paths and absolute location paths • A relativelocation path is a sequence of location steps separated by /. For example: list/item[ currentPrice < 20.0 ] • This location path consists of two location steps: • the first,list, selects a set of nodes relative to the context node; • the second, item[currentPrice < 20.0], selects a set of nodes in the subset identified by the first step; • and so on, if there are more nodes.
Location Paths (cont.) • An absolute location path consists of a /, optionally followed by a relative location path, • with / referring to the root node. • An absolute location path is basically a relative location path evaluated in the context of the root node, for example: /list/item[ currentPrice < 20.0 ] • With absolute location paths (location paths that start with /), the context node isn't meaningful because the path is always evaluated from the root node
Axes • XPath searches are made relative to context node • The contextfor a query is the nodein the source XML document currently being processed • Context node: reference node where you start to locate specific parts in a document • Examp1: xsl:template match="/", we are in the context of the root of the XML document • Examp2: xsl:for-each loop, the context is whichever node we are currently looping through • Axis • Indicates which nodes, relative to context node, are included in search • Dictates node ordering in set • Forward axes select nodes that follow context node • Reverse axes select nodes that precede context node
Fig. 11.6 XPath axes.Summaries the 13 XPath Axes and their ordering and provides a description
Node Tests • Node tests • A set of nodes selected by axis is refined with node tests • Rely upon axis’ principle node type • For attribute axes, the principle node type is attribute • For namespace axes, the principle node type is namespace • All other axes, the principle node type is element • Corresponds to type of node axis can select
Location Paths Using Axes and Node Tests • Location step • Axis and node test separated by double colon (::) • Optional predicate enclosed in square brackets () • Some examples: • Select all element-node children of context node child::* • Select all text-node children of context nodechild::text() • Select all text-node grandchildren of context node child::*/child::text()
1 <?xml version = "1.0"?> 2 3 <!-- Fig. 11.9 : books.xml --> 4 <!-- XML book list --> 5 6 <books> 7 8 <book> 9 <title>Java How to Program</title> 10 <translation edition = "1">Spanish</translation> 11 <translation edition = "1">Chinese</translation> 12 <translation edition = "1">Japanese</translation> 13 <translation edition = "2">French</translation> 14 <translation edition = "2">Japanese</translation> 15 </book> 16 17 <book> 18 <title>C++ How to Program</title> 19 <translation edition = "1">Korean</translation> 20 <translation edition = "2">French</translation> 21 <translation edition = "2">Spanish</translation> 22 <translation edition = "3">Italian</translation> 23 <translation edition = "3">Japanese</translation> 24 </book> 25 26 </books> Fig. 11.9 XML document that marks up book translations.
Fig. 11.10 XPath tree for books.xml Othernodes… Elementbook Elementtitle TextJava How to Program Elementtranslation Attributeedition 1 TextSpanish Elementtranslation Attributeedition 1 TextChinese Continued on next slide…
Fig. 11.10 XPath tree for books.xml Continued from previous slide… Elementtranslation Attributeedition 1 TextJapanese Elementtranslation Attributeedition 2 TextFrench Elementtranslation Attributeedition 2 TextJapanese Othernodes…
Location Paths Using Axes and Node Tests (cont.) • Which books have Japanese translations? • Use root node of XPath tree as context node • Use predicate • Boolean expression for filtering nodes from search • Compare string value of current node to string ‘Japanese’ /books/book/translation[. = ‘Japanese’]/../title
XPath Syntax • XPath uses path expressions to select nodes or node-sets in an XML document. • The node is selected by following a path or steps • The XML Example Document
XPath Syntax (cont.) • Selecting Nodes • XPath uses path expressions to select nodes in an XML document. • The node is selected by following a path or steps. • The most useful path expressions are listed below
XPath Syntax (cont.) • Examples • In the table below we have listed some path expressions and the result of the expressions
XPath Syntax (cont.) • Predicates • Predicates are used to find a specific node or a node that contains a specific value. • Predicates are always embedded in square brackets[ ] • Examples • In the table below we have listed some path expressions with predicates and the result of the expressions
XPath Syntax (cont.) Selecting Unknown Nodes XPath wildcards can be used to select unknown XML elements.
XPath Syntax (cont.) • Examples • In the table below we have listed some path expressions and the result of the expressions:
XPath Syntax (cont.) Selecting Several Paths By using the |operator in an XPath expression you can select several paths. Examples In the table below we have listed some path expressions and the result of the expressions:
Predicates • Predicatesare used in location paths to filter the current set of nodes. • A predicate contains a boolean expression (or an expression that can be easily converted to boolean) • Each member of the current node-set is tested against the boolean expression and kept if the expression is true; otherwise, it is rejected. • A predicate is enclosed in square brackets, [ ].
Predicates (cont.) • Have a look at the following location path: List / item / currentPrice [ @currency = "EUR“ ] • During evaluation, all currentPrice elements in the XML doc. are in the selected node-set. • Then, the @currency="EUR"predicate is evaluated and the currentPrice elements whose currencies do not contain the EUR value are rejected. • Predicates can also use the relational operators>, <, >=, <=, and !=. They can also use Boolean operators,
XPath Operator • XPath expressions returns either a node-set, a string, a Boolean, or a number • A list of the operators that can be used in XPath expressions