xml 101 a technical introduction to xml l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
XML 101: A Technical Introduction to XML PowerPoint Presentation
Download Presentation
XML 101: A Technical Introduction to XML

Loading in 2 Seconds...

play fullscreen
1 / 41

XML 101: A Technical Introduction to XML - PowerPoint PPT Presentation


  • 199 Views
  • Uploaded on

XML 101: A Technical Introduction to XML. 20 November 2002 Bank of Montreal Database Users Group Ian GRAHAM IT Strategy, IBS, Technology and Solutions, BMO Financial Group E: <ian.graham@bmo.com> T: (416) 513.5656 / F: (416) 513.5590

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'XML 101: A Technical Introduction to XML' - knox


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
xml 101 a technical introduction to xml

XML 101:A Technical Introduction to XML

20 November 2002

Bank of Montreal Database Users Group

Ian GRAHAM

IT Strategy, IBS, Technology and Solutions, BMO Financial Group

E: <ian.graham@bmo.com>

T: (416) 513.5656 / F: (416) 513.5590

To download this talk: http://www.utoronto.ca/ian/talks/

presentation outline
Presentation Outline
  • What is XML (basic introduction)
  • Defining language dialects and constraints
    • DTDs, namespaces, and schemas
  • XML processing
    • Parsers and parser interfaces; XML processing tools
  • XML databases
    • High-level issues, and references
  • XML messaging / web services
    • Why, and some issues/example
  • Conclusions
what is xml
What is XML?
  • A base-level syntax
    • for encoding structured, text-based information (words, characters, ...)
  • A text-based syntax
    • XML is written using printableUnicode characters. Explicit binary data is not allowed
  • Supports extensible data formats
    • XML lets you define your own elements (essentially data types), within the constraints of the syntax rules
  • Designed as a universalformat
    • The syntax rules ensure that all XML processing software MUST identically handle a given piece of XML data.

If you can read and process it, so can anybody else

xml a simple example
XML: A Simple Example

Flags character encoding used in file

XML Declaration (“this is XML”)

<?xml version="1.0" encoding="iso-8859-1"?>

<partorders

xmlns=“http://myco.org/Spec/partorders”>

<order ref=“x23-2112-2342”

date=“25aug1999-12:34:23h”>

<desc> Gold sprockel grommets,

with matching hamster

</desc>

<part number=“23-23221-a12” />

<quantity units=“gross”> 12</quantity>

<deliveryDate date=“27aug1999-12:00h” />

</order>

<order ref=“x23-2112-2342”

date=“25aug1999-12:34:23h”>

. . . Order something else . . .

</order>

</partorders>

Black – XML tags and markup

Blue - encoded text data

example revisited

attribute of this

quantity element

element

tags

Hierarchical, structured data

Example Revisited

<partorders

xmlns=“http://myco.org/Spec/partorders” >

<order ref=“x23-2112-2342”

date=“25aug1999-12:34:23h”>

<desc> Gold sprockel grommets,

with matching hamster

</desc>

<part number=“23-23221-a12” />

<quantityunits=“gross”> 12 </quantity>

<deliveryDate date=“27aug1999-12:00h” />

</order>

<order ref=“x23-2112-2342”

date=“25aug1999-12:34:23h”>

. . . Order something else . . .

</order>

</partorders>

slide6

ref=

date=

desc

text

order

part

quantity

partorders

text

xmlns=

delivery-date

order

ref=

date=

XML Data Model - A Tree

<partorders xmlns="...">

<order date="..."

ref="...">

<desc> ..text..

</desc>

<part />

<quantity />

<delivery-date />

</order>

<order ref=".." .../>

</partorders>

xml design goals
XML: Design goals
  • Simplebut reliable
    • Strict syntax rules, to eliminate syntax errors
    • syntax defines structure (hierarchically), and names structural parts (element names) -- it is self-describing data
  • Extensible and ‘mixable’
    • Can create your own language of tags/elements
    • Can mix one language with another, and still reliably separate / process the data
  • Designed for a distributed environment
    • Can have remote (‘webbed’) data, and retrieve and use it reliably
slide8

XML Processing: The XML Parser

parser

Interface

  • The parser must verify that the XML is syntactically correct
  • Such data is said to be well-formed
    • The minimal requirement to “be” XML
  • A parser MUST stop processing if the data isn’t well-formed
    • E.g., stop processing and “throw an exception” to the XML-based application. The XML 1.0 spec requires this behaviour

XML

parser

XML-based

application

XML data

special issues characters and charsets
Special Issues: Characters and Charsets
  • XML specification defines characters allowed as whitespace in tags: <element id = “23.112” />
  • You cannot use EBCIDIC character ‘NEL’ as whitespace
    • Must make sure to not do so!
  • What if you want to include characters not defined in the encoding charset (e.g., Greek characters in an ISO-Latin-1 document):
    • Use character references. For example:&#9824; -- the spades character () 9824th character in the Unicode character set
  • Also, a reminder that binary data is forbidden
    • must be encoded as printable characters (e.g. using Base64)
slide10

Parsers and DTDs

parser

interface

  • A DTD can define external parts (entities) to be ‘included’ in
  • But …. what if the parser can’t find the external parts (firewall?)?
  • That depends on the type: there are two types of XML parsers
    • one that MUST retrieve all parts
    • one that can ignore them (if it can’t find them)

parser

XML-based

application

XML data

DTD

slide11

Two types of XML parsers

  • Validating
    • Must retrieve all entities and process all of the DTD. Will stop processing and indicate a failure if it cannot
    • It must also test and verify other things in the DTD -- instructions that define syntactic document rules (allowed elements, attributes, etc.).
  • Non-validating (well-formed only)
    • Tries retrieve all ‘parts’, but will cease processing the DTD content at the first part (entity) it can’t find,
    • But this is not an error -- the parser simply makes available the XML data (and the names of any unresolved ‘parts’) to the application.

Application behavior will depend on parser type

Many parsers can operate in either mode (config)

presentation outline12
Presentation Outline
  • What is XML (basic introduction)
  • Defining language dialects and constraints
    • DTDs, namespaces, and schemas
  • XML processing
    • Parsers and parser interfaces; XML processing tools
  • XML databases
    • High-level issues, and references
  • XML messaging / web services
    • Why, and some issues/example
  • Conclusions
defining constraints languages
Defining constraints / languages
  • Two ways of doing so:
    • XML Document Type Declaration (DTD) -- Part of core XML spec.
    • XML Schema(often called XSD) -- New specification (2001), which allows for richer constraints on XML documents.
  • What DTDs and/or schema specify:
    • Allowed element and attribute names, hierarchical nesting rules; element content/type restrictions
  • Adding dialect specifications implies two classes of XML data
    • Well-formedXML that is syntactically correct
    • ValidXML that is well-formed and consistent with a specific DTD (or Schema)
  • Schemas are more powerful than DTDs
    • Often used for type validation, or for defining low-level type constraints (integer, varchar, datetime, etc.) constraints on values.
dtd example
DTD Example

<!DOCTYPE transfers [

<!ELEMENTtransfers(fundsTransfer)+>

<!ELEMENTfundsTransfer(from, to)>

<!ATTLISTfundsTransfer

dateCDATA #REQUIRED>

<!ELEMENTfrom(amount, transitID?, accountID,

acknowledgeReceipt)>

<!ATTLISTfrom

type (intrabank|internal|other) #REQUIRED>

<!ELEMENTamount (#PCDATA) >

. . . Omitted DTD content . . .

<!ELEMENTtoEMPTY>

<!ATTLISTto

accountCDATA#REQUIRED>

]>

<transfers>

<fundsTransfer date="20010923T12:34:34Z">

. . . As with previous example . . .

xml namespaces
XML Namespaces
  • Mechanism for identifying different “spaces” for XML names
    • That is, element or attribute names
  • This is a way of identifying different language dialects, consisting of names that have specific semantic (and processing) meanings.
  • For example <key/> in one language (e.g. a security key) can be distinguised from <key/> in another language (a database key)
  • Mechanism uses a special xmlns attribute to define namespaces.
    • The namespace is a URL string
    • But the URL does not reference anything in particular (there may be nothing there!)
mixing languages together
Mixing languages together

Namespaces let you do this relatively easily:

<?xml version= "1.0" encoding= "utf-8" ?>

<htmlxmlns="http://www.w3.org/1999/xhtml1"

xmlns:mt="http://www.w3.org/1998/mathml” >

<head>

<title> Title of XHTML Document </title>

</head><body>

<div class="myDiv">

<h1> Heading of Page </h1>

<mt:mathml>

<mt:title> ... MathML markup . . .

</mt:mathml>

<p> more html stuff goes here </p>

</div>

</body>

</html>

Default ‘space’

is xhtml

mt: prefix indicates ‘space’ mathml (a different language)

xml schemas
XML Schemas
  • A specification for defining XML validation rules Specs: http://www.w3.org/XML/SchemaBest-practice:http://www.xfront.com/BestPracticesHomepage.html
  • Uses pureXML (plus namespaces) to do this
  • More powerful than DTDs - can specify things like integer types, date strings, real numbers in a given range, etc.
  • Often used for type validation, or for relating database schemas to XML models
  • They don’t, however, let you declare entities -- those can only be done in DTDs
  • The following slide shows the XML schema equivalent to our DTD
xml schema version of our dtd portion
XML Schema version of our DTD (Portion)

<?xml version="1.0" encoding="UTF-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

elementFormDefault="qualified">

<xs:element name="accountID" type="xs:string"/>

<xs:element name="acknowledgeReceipt" type="xs:string"/>

<xs:complexType name="amountType">

<xs:simpleContent>

<xs:restriction base="xs:string">

<xs:attribute name="currency" use="required">

<xs:simpleType>

<xs:restriction base="xs:NMTOKEN">

<xs:enumeration value="USD"/>

. . . (some stuff omitted) . . .

</xs:restriction>

</xs:simpleType>

</xs:attribute>

</xs:restriction>

</xs:simpleContent>

</xs:complexType>

<xs:complexType name="fromType">

<xs:sequence>

<xs:element name="amount" type="amountType"/>

<xs:element ref="transitID" minOccurs="0"/>

<xs:element ref="accountID"/>

<xs:element ref="acknowledgeReceipt"/>

</xs:sequence>

. . . And still more !!! . . .

slide19

Presentation Outline

  • What is XML (basic introduction)
  • Defining language dialects and constraints
    • DTDs, namespaces, and schemas
  • XML processing
    • Parsers and parser interfaces; XML processing tools
  • XML databases
    • High-level issues, and references
  • XML messaging / web services
    • Why, and some issues/example
  • Conclusions
xml software
XML Software
  • XML parsers…..
    • Read in XML data, checks for syntactic (and possibly DTD/Schema) constraints, and makes data available to an application. There are three 'generic' parser APIs
      • SAX Simple API to XML (event-based)
      • DOM Document Object Model (object/tree based)
      • JDOM Java Document Object Model (object/tree based)
      • Pull evolving API (new) (pull-based / object + tree)
    • Lots of XML parsers and interface software available
      • Unix, Linux, Windows 2000/XP, Z/OS, etc
    • SAX-based parsers are fast (often as fast as you can stream data)
    • DOM slower, more memory intensive (create in-memory version of entire document
    • Validating can be much slower than non-validating
parser api sax
Parser API: SAX

A) SAX: Simple API for XML

    • http://www.megginson.com/SAX/index.html
    • An event-based interface (a push parser API)
    • Parser reports events whenever it sees a tag/attribute/text node/unresolved external entity/other (driven by input stream)
    • Programmer attaches “event handlers” to handle the event
  • Advantages
    • Simple to use
    • Very fast (not doing very much before you get the tags and data)
    • Low memory footprint (doesn’t read an XML document entirely into memory)
  • Disadvantages
    • Not doing very much for you -- you have to do everything yourself
    • Not useful if you have to dynamically modify the document once it’s in memory (since you’ll have to do all the work to put it in memory yourself!)
parser api dom
Parser API: DOM

B) DOM: Document Object Model

    • http://www.w3.org/DOM/
    • An object-based interface
    • Parser generates an in-memory tree corresponding to the document
    • DOM interface defines methods for accessing and modifying the tree
  • Advantages
    • Very useful for dynamic modification of, access to the tree
    • Useful for querying (I.e. looking for data) that depends on the tree structure [element.childNode("2").getAttributeValue("boobie")]
    • Same interface for many programming languages (C++, Java, ...)
  • Disadvantages
    • Can be slow (needs to produce the tree), and may need lots of memory
    • DOM programming interface is a bit awkward, not terribly object oriented
dom parser processing model

DOM

desc

parser

interface

text

order

parser

application

XML data

part

partorders

quantity

Document “object”

delivery-date

order

DOM Parser Processing Model
parser api jdom
Parser API: JDOM

B2) JDOM: Java Document Object Model

    • http://www.jdom.org
    • A Java-specific object-oriented interface
    • Parser generates an in-memory tree corresponding to the document
    • JDOM interface has methods for accessing and modifying the tree
  • Advantages
    • Very useful for dynamic modification of the tree
    • Useful for querying (I.e. looking for data) that depends on the tree structure
    • Much nicer Object Oriented programming interface than DOM
  • Disadvantages
    • Can be slow (make that tree...), and can take up lots of memory
    • New, and not entirely cooked (but close)
    • Only works with Java
parser api pull
Parser API: Pull

C) Pull Interfaces

    • http://www.xmlpull.org/ (Java); there is also a .NET pull API
    • An pull-parser interface
    • API uses expressions / methods to ‘pull’ specific chunks of XML data, or to iterate over the XML
    • Can be built on top of a DOM model
  • Advantages
    • Easier to write applications that need to read in and process XML data (‘easier’ model than a push API, in many cases)
    • Has proven a very popular component in the .NET toolkit
  • Disadvantages
    • Can be slow if you do lots of iteration over the XML input data
    • No common API across different languages (although xmlpull.org tries to be similar to the .NET API); not yet a ‘real’ standard (still being worked on; not part of most commercial environments)
xml processing xslt
XML Processing: XSLT

D) XSLT eXtensible Stylesheet Language -- Transformations

    • http://www.w3.org/TR/xslt
    • An XML language for processing/transforming XML
    • Does tree transformations -- takes XML and an XSLT style sheet as input, and produces a new XML document with a different structure
  • Advantages
    • Very useful for tree transformations -- much easier than DOM or SAX for this purpose
    • Can be used to query a document (XSLT pulls out the part you want)
  • Disadvantages
    • Can be slow for large documents or stylesheets
    • Can be difficult to debug stylesheets (poor error detection; much better if you use schemas)
xslt processing model

desc

text

order

part

partorders

quantity

delivery-date

xza

foo

partorders

bee

order

order

XSLT processing model
  • D) Processing model

schema

XSLT

processor

XSLT style sheet in

XML

parser

XML data in

data out (XML)

XML

parser

schema

document “objects” for

data and style sheet

xml processing toolkits
XML Processing Toolkits

Lots of them …

  • Java
    • JAXP ( http://java.sun.com/xml/jaxp/faq.html )dom4j ( http://www.dom4j.org ) .NET ( part of .NET framework)… … others …
  • Provide DOM, SAX, (JDOM) interfaces, plus lots of other useful tools in a standardized way (loading parsers, performing XSLT transformations, etc.)
  • JAXP is standard Java, and thus integrated with Websphere
presentation outline29
Presentation Outline
  • What is XML (basic introduction)
  • Defining language dialects and constraints
    • DTDs, namespaces, and schemas
  • XML processing
    • Parsers and parser interfaces; XML processing tools
  • XML databases
    • High-level issues, and references
  • XML messaging / web services
    • Why, and some issues/example
  • Conclusions
xml and databases
XML and databases
  • So where do you stick XML data
    • Inside a database!?!
    • But how to do this – and which database type to use:
        • RDBMS, ORDBMS, ODB, XML??
  • How you do so depends on the use cases you have for the data. Some good-to-ask questions are
    • Am I talking about storing documents, or data?
        • Is the XML format integral to the application (e.g. XHTML, DocBook?)
    • How will the database be queried?
        • Queried by XML structure, or by standard SQL
        • What ‘parts’ of the document need to be queried
        • Do I need a text index?
    • How will the data be used/retrieved?
        • Passed to XML processing tools (e.g. XSLT), or used at ‘atomic’ simple type level?
    • The answers drive out
        • What database to choose, how to map XML to tables (O-R or table mappings), store as BLOB or broken up …..
xml and databases31
XML and databases
  • Upcoming technologies
    • XML Query – a query language for querying XML datasets (and databases)
      • Uses XML schema for type casting, and validation
      • Info: http://www.w3.org/XML/Query
  • Useful XML Database references
    • http://www.xml.com/pub/a/2001/10/31/nativexmldb.html Introductory article
    • http://www.rpbourret.com/xml/XMLAndDatabases.htm XML and databases
    • http://www.rpbourret.com/xml/XMLDatabaseProds.htm Products list
    • http://www.xmldb.org/resources.html Docs / resource list
presentation outline32
Presentation Outline
  • What is XML (basic introduction)
  • Defining language dialects and constraints
    • DTDs, namespaces, and schemas
  • XML processing
    • Parsers and parser interfaces; XML processing tools
  • XML databases
    • High-level issues, and references
  • XML messaging / web services
    • Why, and some issues/example
  • Conclusions
xml messaging
XML Messaging
  • Use XML as the format for sending messages between systems
  • Advantages:
    • Common syntax; self-describing (easier to parse)
    • Can use common/existing transport mechanisms to “move” the XML data (HTTP, HTTPS, SMTP (email), MQ, IIOP/(CORBA), JMS, ….)
  • Requirements
    • Shared understanding of dialects for transport (required registry [namespace!] ) for identifying dialects
    • Shared acceptance of messaging contract
  • Disadvantages
    • Asynchronous transport; no guarantee of delivery, no guarantee that partner (external) shares acceptance of contract.
    • Messages will be much larger than binary (10x or more) [can compress]
common messaging model
Common messaging model
  • XML over HTTP
    • Use HTTP to transport XML messages
    • POST /path/to/interface.pl HTTP/1.1Referer: http://www.foo.org/myClient.htmlUser-agent: db-server-olkAccept-encoding: gzipAccept-charset: iso-8859-1, utf-8, ucsContent-type: application/xml; charset=utf-8Content-length: 13221. . . <?xml version=“1.0” encoding=“utf-8” ?><message> . . . Markup in message . . . </message>
some standards for message format
Some standards for message format
  • Define dialects designed to “wrap” remote invocation messages
  • XML-RPChttp://www.xmlrpc.com
    • Very simple way of encoding function/method call name, and passed parameters, in an XML message.
  • SOAP (Simple object access protocol) http://www.soapware.org
    • More complex wrapper, which lets you specify schemas for interfaces; more complex rules for handling/proxying messages, etc. This is a core component of Microsoft’s .NET strategy, and is integrated into more recent versions of Websphere and other commercial packages. W3c activity (who sets the SOAP spec) is outlined at: http://www.w3.org/2000/xp/Group/
xml messaging processing
XML Messaging + Processing
  • XML as a universal format for data exchange

Place order

(XML/edi) using SOAP over HTTP

SOAP interface

Application

Supplier

SOAP API

Factory

SOAP

Supplier

XML/

EDI

Transport

HTTP(S)

SMTP

other ...

Supplier

Response

(XML/edi) using SOAP over HTTP

web services model
Web “Services” Model
  • SOAP plus higher-level modeling for how services are ‘advertised’, ‘exposed’ and ‘found’
    • Uses an XML dialect, WSDL (Web Services Description Language) to define a service
      • WSDL can use XML Schema to define how data is passed between a service provider and requestor
    • Uses an XML dialect, UDDI (Universal Description, Discovery and Integration) for
      • Describing services (high-level)
      • Discovering services (registry services, metadata)
      • UDDI defined using XML Schema
    • Core technology for application integration
      • Microsoft .NET
      • IBM Websphere
      • Oracle
      • …. Many others
slide38

Web Services Code Development

Client code

WSDL

proxy

proxy

WS/SOAP

SOAP

Requests/

responses

Write the

Application!

automated

code

generator

WS/SOAP

XML

schema

skeleton

skeleton

Validation,

business

logic,

routing,

Logging,

more…

Middle tier

code

adapter

Product

System

code

adapter

MECH

presentation outline39
Presentation Outline
  • What is XML (basic introduction)
  • Defining language dialects and constraints
    • DTDs, namespaces, and schemas
  • XML processing
    • Parsers and parser interfaces; XML processing tools
  • XML databases
    • High-level issues, and references
  • XML messaging / web services
    • Why, and some issues/example
  • Conclusions
slide40

industry std

Xfragment

RDF

Canonical

Xpath

MathML

SMIL 1 & 2

Xpointer

XML base

W3C rec

SVG

Xlink

Infoset

XSL

…...

XML

signature

XHTML

events

DOM 3

Xforms

XHTML

basic

Modularized

XHTML

FinXML

Biztalk

CSS 1

IFX

dirXML

ebXML

CSS 2

WDDX

XMI

100's more ....

FpML

...

...

CSS 3

...

XML (and related) Specifications

W3C draft

‘Open’ std

XML Core

XML 1.0

XML names

APIs

XSLT

JDOM

JAXP

DOM 1

XHTML 1.0

DOM 2

XML query ….

XML schema

SAX 1

SAX 2

SOAP

UDDI

XML-RPC

WSDL

Style

Protocols

Web Services

Application areas

xml 101 a technical introduction to xml41

XML 101:A Technical Introduction to XML

The End.

Ian GRAHAM

IT Strategy, IBS, Technology and Solutions, BMO Financial Group

E: <ian.graham@bmo.com>

T: (416) 513.5656 / F: (416) 513.5590