xml and xpath n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
XML And XPath PowerPoint Presentation
Download Presentation
XML And XPath

Loading in 2 Seconds...

play fullscreen
1 / 23

XML And XPath - PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on

XML And XPath. DSA Term 2 Week 14. Lecture overview. Matters arising Character coding Well-formed XML Creating simple XML files Placename to BBC code Introduction to XPath. Character Coding. Character set ISO 8549 - 1 Byte 0 - 127 are ASCII

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'XML And XPath' - ailsa


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
xml and xpath

XML And XPath

DSA Term 2

Week 14

DSA/2006/week 14

lecture overview
Lecture overview
  • Matters arising
    • Character coding
  • Well-formed XML
  • Creating simple XML files
      • Placename to BBC code
  • Introduction to XPath

DSA/2006/week 14

character coding
Character Coding
  • Character set
    • ISO 8549 - 1 Byte
      • 0 - 127 are ASCII
      • 128- 255 vary depending on the part of the standard
      • 15 different character maps
        • ISO-8859-1 - Latin -1 - the default for HTML
        • ISO-8859-2 – Central European
      • A document must be on one encoding
        • problem of mixing characters e.g. an Arabic quotation in a Cyrillic text
    • UTF-8 - Unicode 1- 4 byte variable length to support a huge range of international languages in a single code
      • ASCII is included as characters 0-127
      • Ensures that the internet is truly multi-lingual
      • Key invention by Ken Thompson of self-synchronisation allowing character boundaries to be detected
  • Character references in HTML
    • Named °
    • decimal &176;
    • Hexadecimal &#B0;

DSA/2006/week 14

defining the encoding
Defining the Encoding
  • Encodings in HTML
    • In a meta-tag
      • <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
    • In the xml processing instruction
      • <?xml version="1.0" encoding="ISO-8859-1"?>
    • In the HTTP content header
      • Content-Type: text/html; charset=ISO-8859-1
  • Setting Encoding in PHP
    • header("Content-type: text/html; charset=UTF-8");
  • Setting encoding in the Browser
    • Firefox
      • View/Character Encoding

DSA/2006/week 14

design a simple xml file
Design a simple XML file
  • Design an XML vocabulary to represent pairs of place names and codes
    • Bristol 1263
    • Bath 1123
  • First review XML structure

DSA/2006/week 14

example
Example

<MapSet>

<Map id="P2" desc="P Block level 2">

<room id="2P2">

<area shape="rect" coords="118,39,138,68"/>

<type>Staff Room</type>

<occupant>Tony Solomonides</occupant>

</room>

<room id="2P3">

<area shape="rect" coords="141,40,162,69"/>

<type>Staff Room</type>

<occupant>Richard Lawson</occupant>

</room>

<room id="2P4">

<area shape="poly" coords="201,40,234,40,234,118,164,119,163,71,200,71"/>

<type>Office</type>

<occupant>Eleanor Gibbons</occupant>

<occupant>Dee Evans</occupant>

<occupant>Ali Jack</occupant>

</room>

….

</Map>

</MapSet>

DSA/2006/week 14

slide7

Well-formed XML documents (1)

Every XML document must be well-formed and must therefore adhere to the following rules (among others):

  • Every start-tag must have a matching end tag.
  • Elements may nest but must not overlap. <name>Anna<em>Coffey</em></name> - √ <name><em>Anna</name>Coffey</em> - ×
  • There must be exactly one root element.
  • Attribute values must be quoted.
  • An element must not be quoted.
  • Comments and processing instructions may not appear inside tags.
  • No unescaped < or & signs may occur in the character data of an element.

DSA/2006/week 14

slide8

Well-formed XML documents (2)

Element names are case sensitive - <NAME>, <name>, <Name> & <NaMe> are four different element types.

No white spaces in element name - <First Name> not allowed; <First_Name> OK.

Element names cannot start with the letters “XML” or “xml” – reserved terms.

Element names must start with a letter or a underscore. Element names cannot start with a number but numbers may be embedded within an element name - <2you> not allowed; <me2you> is OK.

Attribute names are constrained by the above rules for element names.

Entity references are used to substitute specific characters. There are five predefined entities built into XML:

Entity Char Notes

&amp; & Do not use inside processing instructions

&lt; < Use inside attribute values quoted with “.

&gt; > Use after ]] in normal text and inside processing instruction.

&quot; “ Use inside attribute values quoted with “.

&apos; ‘ Use inside attribute values quoted with ‘.

Map

DSA/2006/week 14

errors
Errors
  • Look at the listing of the XML file and identify all the places which prevent this XML from being well-formed

DSA/2006/week 14

slide10

<Map id=P2 desc="P Block level 2'>

<room id="2P2">

This is a nice big office

<area rect coords="118,39,138,68">

<typo>Staff Room</typo>

<occupant>Tony Solomonides</occupant>

</Room>

<room id="2P3">

<area rect coords="141,40,162,69"></area>

<typo>Staff Room</typo>

<occupant>”Richard Lawson”</occupant>

</Room>

<room id="2P4">

<area poly coords="201,40,234,40,234,118,164,119,163,71,200,71"/>

<typo>Office</typo>

<occupant>Eleanor Gibbons</occupant>

<person>Dee Evans</person>

<occupant>Ali Jack</occupant

</Room>

---

DSA/2006/week 14

slide11
Task
  • Draw the structure
    • Use ER notation
      • Attributes in the Entity
      • Cross-foot notation for one-many, optional
      • Identify any restricted sets of values (ennumerated types)
    • In the lab, QSEE will allow you to define the structure and generate the schema definition (XML Schema or DTD)

DSA/2006/week 14

xpath
XPATH
  • Core language for selecting nodes in XML
  • Version 1.0 used in XSLT 1.0
    • client-side in Browsers
    • xalan engine
    • w3.schools Tutorial is for XPath 1.0
    • SimpleXML in PHP
  • Version 2.0 used in XSLT 2.0
    • Saxon parser
    • XQuery 1.0
  • Differences
    • Code data structure in 2.0 is a node sequence
    • Full support for all XML schema datatypes
    • Two kinds of equality operators
    • Larger function library

DSA/2006/week 14

xpath language
XPath Language
  • Not a programming language
  • Expressions to be evaluated
  • Focus on
    • Navigation in a tree structure
      • Multiple directions or ‘axes’
        • Down to children (child axis)
        • Up to parent (parent axis)
        • Down to attributes (attribute axis)
        • Across to siblings (sibling axis)
    • Operators
    • Functions

DSA/2006/week 14

xpath operators
XPath operators
  • Arithmetic operators

+ - * div idiv mod

  • Value comparisons

eq, le, ge, gt, lt

  • Sequence comparisons

= , !=

= is true if there are common elements

!= is true if there are no common elements

(1,2,3) = (2,3,4) is true

(1,2,3) != (2,3,4) is also true

not ((1,2,3) = (2,3,4) ) is false

  • Logical operators

and, or, not()

DSA/2006/week 14

large function library
large function library
  • count (seq) , max((seq)) ,min((seq)), average
    • count(1,2,3) = 3
  • max, min
  • string functions
    • string-length(‘abc’)
    • tokenize(‘a,b,c’,’,’)
    • string-join((a,b,c),’, ‘)

DSA/2006/week 14

using the exist database
Using the eXist database
  • eXist database as an XPath / XQuery engine.
    • Rest interface
      • ..exist/rest/db/chriswallace/rooms?_query=//Map
    • Java client
    • Sandbox (using Ajax to do dynamic syntax checking)
      • Context is the whole database
  • The demo database includes
    • the whole text for Romeo and Juliet
    • the mondial world database

DSA/2006/week 14

examples
Examples
  • all Rooms
    • /MapSet/Map/room
    • //room
  • room 2P5
    • //room[@id=‘2P5’]
  • the occupants of room 2P4
    • //room[@id=‘2P4’]/occupant
  • the roomNo of the room which Colin Fudge occupies
    • //room[occupant = ‘Colin Fudge’]/@id
  • the number of occupants of 2P4
    • count(//room[@id=‘2P4’]/occupant)
  • The floor of Ali Jack’s room
    • //room[occupant = ‘Ali Jack’]/../@desc

DSA/2006/week 14

notes
Notes
  • Note how = tests if a person is amongst the occupants
  • To ‘serialise’ an attribute use string()
  • See how ../ allows navigation to the parent element

DSA/2006/week 14

examples for you
Examples for you
  • The room number for Richard Lawton
  • The coordinates of room 2P2
  • All rooms with poly shape
  • Who are Ali Jack’s office mates?

DSA/2006/week 14

xml design
XML design
  • Rooms is a mixture of text elements and attributes.
  • Could be all attributes – what would change?
  • Could be no attributes – what would change?
  • For the workshop exercise use elements instead of attributes – its simpler even if more verbose
  • Generally, what do the experts recommend?

DSA/2006/week 14

workshop
Workshop
  • Create a simple XML file containing pairs of Place names and BBC codes
  • Change the PHP script to accept a placename
  • Read the new xml file and decode the name to get the code using PHP SimpleXML interface and xpath(‘’)

DSA/2006/week 14