Xml data management 5 extracting data from xml xpath
Download
1 / 22

XML Data Management 5. Extracting Data from XML: XPath - PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on

XML Data Management 5. Extracting Data from XML: XPath. Werner Nutt based on slides by Sara Cohen, Jerusalem. Extracting Data from XML. Data stored in an XML document must be extracted to use it with various applications Data can be extracted by a program …

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' XML Data Management 5. Extracting Data from XML: XPath' - dareh


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Xml data management 5 extracting data from xml xpath

XML Data Management 5. Extracting Data from XML: XPath

Werner Nutt

based on slides by Sara Cohen, Jerusalem


Extracting data from xml
Extracting Data from XML

  • Data stored in an XML document must be extracted to use it with various applications

  • Data can be extracted by a program …

  • … or using a declarative language: XPath

  • XPath is used extensively in other languages, e.g.,

    • XSL

    • XML Schema

    • XQuery

    • Xpointer

  • Versions: XPath 1.0 (allows for efficient execution), XPath 2.0 (not yet widely supported)


<?xml version="1.0" encoding="ISO-8859-1"?>

<catalog>

<cd country="UK">

<title>Dark Side of the Moon</title>

<artist>Pink Floyd</artist>

<price>10.90</price>

</cd>

<cd country="UK">

<title>Space Oddity</title>

<artist>David Bowie</artist>

<price>9.90</price>

</cd>

<cd country="USA">

<title>Aretha: Lady Soul</title>

<artist>Aretha Franklin</artist>

<price>9.90</price>

</cd>

</catalog>

Our XML document


The XML document

as a DOM tree

catalog.xml

catalog

cd

cd

country

cd

country

country

USA

UK

UK

title

artist

price

title

artist

price

title

artist

price

Space Oddity

Aretha: Lady Soul

Dark Side of the Moon

David Bowie

Aretha Franklin

Pink Floyd

9.90

9.90

10.90


Xpath ideas
XPath: Ideas

A language of path expressions:

  • a document D is a tree

  • an expression E specifies possible paths in D

  • Ereturns nodes in D that can be reached from the root walking along an E-path

    Path expressions specify

  • navigation in docs

  • tests on nodes


Xpath syntax path expressions
XPath Syntax: Path Expressions

  • / at the beginning of an XPath expression represents the root of the document

  • / between element names represents a parent-child relationship

  • // represents an ancestor-descendant relationship

  • foo element name, path has to go through an element foo, e.g., /cd

  • * wildcard, represents any element

  • @ marks an attribute


Xpath syntax conditions and built ins
XPath Syntax: Conditions and Built-Ins

  • [condition] specifies a condition, e.g., /cd[price < 10]

  • [N] position of a child, e.g., /cd[2]

  • contains(s1,s2) string comparison, e.g., /cd[contains(title, ″Moon″)]

  • name() name of an element, e.g., /*[name()="cd"] is equivalent to /cd


catalog.xml

catalog

/catalog

cd

cd

country

cd

country

country

USA

UK

UK

title

artist

price

title

artist

price

title

artist

price

Space Oddity

Aretha: Lady Soul

Dark Side of the Moon

David Bowie

Aretha Franklin

Pink Floyd

9.90

9.90

10.90

Getting the top element of the document


catalog.xml

catalog

/catalog/cd

cd

cd

country

cd

country

country

USA

UK

UK

title

artist

price

title

artist

price

title

artist

price

Space Oddity

Aretha: Lady Soul

Dark Side of the Moon

David Bowie

Aretha Franklin

Pink Floyd

9.90

9.90

10.90

Finding child nodes


catalog.xml

catalog

/catalog/cd/price

cd

cd

country

cd

country

country

USA

UK

UK

title

artist

price

title

artist

price

title

artist

price

Space Oddity

Aretha: Lady Soul

Dark Side of the Moon

David Bowie

Aretha Franklin

Pink Floyd

9.90

9.90

10.90

Finding descendant nodes


catalog.xml

catalog

/catalog/cd[price<10]

cd

cd

country

cd

country

country

USA

UK

UK

title

artist

price

title

artist

price

title

artist

price

Space Oddity

Aretha: Lady Soul

Dark Side of the Moon

David Bowie

Aretha Franklin

Pink Floyd

9.90

9.90

10.90

Condition on elements


catalog.xml

/catalog//title

catalog

//title

cd

cd

country

cd

country

country

USA

UK

UK

title

artist

price

title

artist

price

title

artist

price

Space Oddity

Aretha: Lady Soul

Dark Side of the Moon

David Bowie

Aretha Franklin

Pink Floyd

9.90

9.90

10.90

// represents any top down path in the document


catalog.xml

catalog

/catalog/cd/*

cd

cd

country

cd

country

country

USA

UK

UK

title

artist

price

title

artist

price

title

artist

price

Space Oddity

Aretha: Lady Soul

Dark Side of the Moon

David Bowie

Aretha Franklin

Pink Floyd

9.90

9.90

10.90

* represents any element name in the document


/*/*

catalog.xml

What do the following

expressions return?

//*

catalog

//*[price=9.90]

//*[price=9.90]/*

cd

cd

country

cd

country

country

USA

UK

UK

title

artist

price

title

artist

price

title

artist

price

Space Oddity

Aretha: Lady Soul

Dark Side of the Moon

David Bowie

Aretha Franklin

Pink Floyd

9.90

9.90

10.90

* represents any element name in the document


catalog.xml

/catalog/cd[1]

catalog

/catalog/cd[last()]

cd

cd

country

cd

country

country

USA

UK

UK

title

artist

price

title

artist

price

title

artist

price

Space Oddity

Aretha: Lady Soul

Dark Side of the Moon

David Bowie

Aretha Franklin

Pink Floyd

9.90

9.90

10.90

Position based condition


catalog.xml

(//title | //price)

catalog

cd

cd

country

cd

country

country

USA

UK

UK

title

artist

price

title

artist

price

title

artist

price

Space Oddity

Aretha: Lady Soul

Dark Side of the Moon

David Bowie

Aretha Franklin

Pink Floyd

9.90

9.90

10.90

| stands for for union


catalog.xml

/catalog/cd[@country=″UK″]

catalog

cd

cd

country

cd

country

country

USA

UK

UK

title

artist

price

title

artist

price

title

artist

price

Space Oddity

Aretha: Lady Soul

Dark Side of the Moon

David Bowie

Aretha Franklin

Pink Floyd

9.90

9.90

10.90

@ marks attributes


catalog.xml

catalog

/catalog/cd/data(@country)

cd

cd

country

cd

country

country

USA

UK

UK

title

artist

price

title

artist

price

title

artist

price

Space Oddity

Aretha: Lady Soul

Dark Side of the Moon

David Bowie

Aretha Franklin

Pink Floyd

9.90

9.90

10.90

@ marks attributes


How would you write:

The price of the cds

whose artist is David Bowie?

catalog.xml

catalog

cd

cd

country

cd

country

country

USA

UK

UK

title

artist

price

title

artist

price

title

artist

price

Space Oddity

Aretha: Lady Soul

Dark Side of the Moon

David Bowie

Aretha Franklin

Pink Floyd

9.90

9.90

10.90


Navigational axes plural of axis
Navigational Axes (plural of “axis”)

  • We have discussed the following axes:

    • child (/)

    • descendant (//)

    • attribute (@)

  • These symbols are actually shorthands, e.g.,

    /cd//price is the same as

    child::cd/descendant::price

  • There are additional shorthands, e.g.,

    • self (/.)

    • parent (/..)


Additional axes

ancestor

Contains all ancestors (parent, grandparent, etc.) of the current node

ancestor-or-self

Contains the current node plus all its ancestors (parent, grandparent, etc.)

descendant-or-self

Contains the current node plus all its descendants (children, grandchildren, etc.)

following

Contains everything in the document after the closing tag of the current node

following-sibling

Contains all siblings after the current node

preceding

Contains everything in the document that is before the starting tag of the current node

preceding-sibling

Contains all siblings before the current node

Additional Axes


Info and tools
Info and Tools

You will find more info in the next lecture and:

  • XPath 1.0 specification at W3C

    (there is also XPath 2.0, which is not yet widely supported)

  • XPath tutorial at W3Schools

  • Mulberry XPath Quick Reference

    Tools for our course

  • XPath plugin for Eclipse

  • Saxon XSLT and XQuery Processor

  • Kernow front end for Saxon (I’ll let you know the code for unlocking it)

  • XMLQuire XML and XPath Editor and Visualizer


ad