Xml introduction
1 / 63

XML Introduction - PowerPoint PPT Presentation

  • Uploaded on

XML Introduction. Introducing XML. XML stands for Extensible Markup Language. A markup language specifies the structure and content of a document. Because it is extensible, XML can be used to create a wide variety of document types. Introducing XML.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'XML Introduction' - noelani-robertson

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Introducing xml
Introducing XML

  • XML stands for Extensible Markup Language. A markup language specifies the structure and content of a document.

  • Because it is extensible, XML can be used to create a wide variety of document types.

Introducing xml1
Introducing XML

  • XML is a subset of a the Standard Generalized Markup Language (SGML) which was introduced in the 1980s. SGML is very complex and can be costly.

  • These reasons led to the creation of Hypertext Markup Language (HTML), a more easily used markup language. XML can be seen as sitting between SGML and HTML – easier to learn than SGML, but more robust than HTML.

The limits of html
The Limits of HTML

  • HTML was designed for formatting text on a Web page. It was not designed for dealing with the content of a Web page. Additional features have been added to HTML, but they do not solve data description or cataloging issues in an HTML document.

  • Because HTML is not extensible, it cannot be modified to meet specific needs. Browser developers have added features making HTML more robust, but this has resulted in a confusing mix of different HTML standards.

Introducing xml2
Introducing XML

  • HTML cannot be applied consistently. Different browsers require different standards making the final document appear differently on one browser compared with another.

Introduction to xml markup
Introduction to XML Markup

  • XML document (intro.xml)

    • Marks up message as XML

    • Commonly stored in text files

      • Extension .xml

Xml introduction

1 <?xml version = "1.0"?>

Document begins with declaration that specifies XML version 1.0


3 <!-- Fig. 5.1 : intro.xml -->

Element message is child element of root elementmyMessage

4 <!-- Simple introduction to XML markup -->

Line numbers are not part of XML document. We include them for clarity.


6 <myMessage>

7 <message>Welcome to XML!</message>

8 </myMessage>

Introduction to xml markup cont
Introduction to XML Markup (cont.)

  • XML documents

    • Must contain exactly one root element

      • Attempting to create more than one root element is erroneous

    • Elements must be nested properly

      • Incorrect:<x><y>hello</x></y>

      • Correct:<x><y>hello</y></x>

    • Must be well-formed

Xml parsers
XML Parsers

  • An XML processor (also called XML parser) evaluates the document to make sure it conforms to all XML specifications for structure and syntax.

  • XML parsers are strict. It is this rigidity built into XML that ensures XML code accepted by the parser will work the same everywhere.

Structure of a well formed xml document
Structure of a Well-formed XML Document

<?xml version="1.0" ?>

<!DOCTYPE publication [

<!ELEMENT publications (journals, conferences, books)>


<!ELEMENT author (#PCDATA)>

<!ELEMENT issue (#PCDATA)>


<!ENTITY JSI " <journal>Journal of Systems Integration</journal>

<publisher>Kluwer Academic Publishers</publisher>">








Xml parsers1
XML Parsers

  • Microsoft’s parser is called MSXML and is built directly in IE versions 5.0 and above.

  • Netscape developed its own parser, called Mozilla, which is built into version 6.0 and above.

Parsers and well formed xml documents cont
Parsers and Well-formed XML Documents (cont.)

  • XML parsers support

    • Document Object Model (DOM)

      • Builds tree structure containing document data in memory

    • Simple API for XML (SAX)

      • Generates events when tags, comments, etc. are encountered

        • (Events are notifications to the application)

Parsing an xml document with msxml
Parsing an XML Document with MSXML

  • XML document

    • Contains data

    • Does not contain formatting information

    • Load XML document into Internet Explorer 5.0

      • Document is parsed by msxml.

      • Places plus (+) or minus (-) signs next to container elements

        • Plus sign indicates that all child elements are hidden

        • Clicking plus sign expands container element

          • Displays children

        • Minus sign indicates that all child elements are visible

        • Clicking minus sign collapses container element

          • Hides children

      • Error generated, if document is not well formed

Character set
Character Set

  • XML documents may contain

    • Carriage returns

    • Line feeds

    • Unicode characters

      • Enables computers to process characters for several languages

Characters vs markup
Characters vs. Markup

  • XML must differentiate between

    • Markup text

      • Enclosed in angle brackets (< and >)

        • e.g,. Child elements

    • Character data

      • Text between start tag and end tag

        • Welcome to XML!

    • Elements versus Attributes

White space entity references and built in entities
White Space, Entity References and Built-in Entities

  • Whitespace characters

    • Spaces, tabs, line feeds and carriage returns

      • Significant (preserved by application)

      • Insignificant (not preserved by application)

        • Normalization

          • Whitespace collapsed into single whitespace character

          • Sometimes whitespace removed entirely

            <markup>This is character data</markup>

            after normalization, becomes

            <markup>This is character data</markup>

White space entity references and built in entities cont
White Space, Entity References and Built-in Entities (cont.)

  • XML-reserved characters

    • Ampersand (&)

    • Left-angle bracket (<)

    • Right-angle bracket (>)

    • Apostrophe (’)

    • Double quote (”)

  • Entity references

    • Allow to use XML-reserved characters

      • Begin with ampersand (&) and end with semicolon (;)

    • Prevents from misinterpreting character data as markup

White space entity references and built in entities cont1
White Space, Entity References and Built-in Entities (cont.)

  • Build-in entities

    • Ampersand (&amp;)

    • Left-angle bracket (&lt;)

    • Right-angle bracket (&gt;)

    • Apostrophe (&apos;)

    • Quotation mark (&quot;)

    • Mark up characters “<>&” in element message


Document object model dom
Document Object Model (DOM)

  • XML Document Object Model (DOM)

    • Build tree structure in memory for XML documents

    • DOM-based parsers parse these structures

      • Exist in several languages (Java, C, C++, Python, Perl, C#, VB.NET, VB, etc)

Document object model dom1
Document Object Model (DOM)

  • DOM tree

    • Each node represents an element, attribute, etc.

      <?xml version ="1.0"?><message from = "Paul" to = "Tem"> <body>Hi, Tim!</body></message>

      • Node created for element message

        • Element message has child node for body element

        • Element body has child node for text "Hi, Tim!"

        • Attributes from and to also have nodes in tree

Dom implementations
DOM Implementations

  • DOM-based parsers

    • Microsoft’s msxml

    • Microsoft.NET System.Xml Namspace

    • Sun Microsystem’s JAXP

Creating nodes
Creating Nodes

  • Create XML document at run time

Traversing the dom
Traversing the DOM

  • Use DOM to traverse XML document

    • Output element nodes

    • Output attribute nodes

    • Output text nodes

Dom components
DOM Components

  • Manipulate XML document


  • XML Path Language (XPath)

    • Syntax for locating information in XML document

      • e.g., attribute values

    • String-based language of expressions

      • Not structural language like XML

    • Used by other XML technologies

      • XSLT

Xpath nodes
XPATH - Nodes

  • XML document

    • Tree structure with nodes

    • Each node represents part of XML document

      • Seven types

        • Root

        • Element

        • Attribute

        • Text

        • Comment

        • Processing instruction

        • Namespace

      • Attributes and namespaces are not children of their parent node

        • They describe their parent node

Location paths
Location Paths

  • Location path

    • Expression specifying how to navigate XPath tree

    • Composed of location steps

      • Each location step composed of

        • Axis

        • Node test

        • Predicate

Xml introduction

  • XPath searches are made relative to context node

  • Axis

    • Indicates which nodes are included in search

      • Relative to context node

    • Dictates node ordering in set

      • Forward axes select nodes that follow context node

      • Reverse axes select nodes that precede context node

Node tests
Node Tests

  • Node tests

    • Refine set of nodes selected by axis

      • Rely upon axis’ principle node type

        • Corresponds to type of node axis can select

Node set operators and functions cont
Node-set Operators and Functions (cont.)

  • Location-path expressions

    • Combine node-set operators and functions

      • Select all head and body children element nodes

        head | body

      • Select last bold element node in head element node

        head/title[ last() ]

      • Select third book element

        book[ position() = 3 ]

        • Or alternatively

          book[ 3 ]

      • Return total number of element-node children

        count( * )

      • Select all book element nodes in document


Sample data for queries
Sample Data for Queries

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><bookprice=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>


Data model for xpath


Data Model for XPath

The root

The root element





. . . .


Serge Abiteboul

Xpath simple expressions
XPath: Simple Expressions

Result: <year> 1995 </year>

<year> 1998 </year>

Result: empty (there were no papers)



Xml document type definitions
XML Document Type Definitions

  • Declarations

    Definition of element and attribute

  • Content Model (regular expressions)

    • Association of attributes with elements

    • Association of elements with other

    • Order and cardinality constraints

Element declarations
Element Declarations

  • Basic form

    – <!ELEMENT elementname (contentmodel)>

    – Contentmodel determines which

    – Given by a regular expression

  • Atomic contents

    • Element content

      <!ELEMENT example ( a )>

    • Text content

      <!ELEMENT example (#PCDATA)>

      – Empty Element

      <!ELEMENT example EMPTY>

      – Arbitrary content

      <!ELEMENT example ANY>

Element declarations1
Element Declarations

  • Sequence

    <!ELEMENT example ( a, b )>

  • Alternative

    <!ELEMENT example ( a | b )>

  • Optional (zero or one)

    <!ELEMENT example ( a )?>

  • Optional and repeatable (zero or more)

    <!ELEMENT example ( a )*>

  • Required and repeatable (one or more)

    <!ELEMENT example ( a )+>

  • Mixed content

    <!ELEMENT example (#PCDATA | a)*>

  • Content model can be grouped by parentheses

  • Cyclic element containment is allowed

Attribute declarations
Attribute Declarations

  • Each element can be associated with an arbitrary number of attributes

  • Basic form

    – <!ATTLIST Elementname Attributename Type Default

    Attributename Type Default

    ... >

  • Example:

    Document Type Definition

    <!ELEMENT shipTo ( #PCDATA)>

    <!ATTLIST shipTo country CDATA #REQUIRED "US"

    state CDATA #IMPLIED

    version CDATA #FIXED "1.0"

    payment (cash|creditCard) "cash">


    <shipTo country="Switzerland"


    payment="creditCard"> … </shipTo>

Attribute declarations types
Attribute Declarations - Types


    – String

    – <!ATTLIST example HREF CDATA

  • Enumeration

    – Token from given set of values, Default

    – <!ATTLIST example selection (

  • Possible Defaults

    – Required attribute: #REQUIRED

    – Optional attribute: #IMPLIED

    – Fixed attribute: #FIXED

    – Default for enumeration: "value"


Id idref example id idref


    – ID is a unique identifier within the document

    – IDREF is a reference to an ID

    – Referential integrity checked by the parser

    – ID's determined by the application

    – <!ATTLIST example identity ID #IMPLIED

    reference IDREF #IMPLIED>

Inclusion of xml document type definitions
Inclusion of XML Document Type Definitions

  • External DTD Declaration

    <?xml version="1.0" encoding="ISO-8859-1"?>

    <!DOCTYPE test PUBLIC "-//Test AG//DTD test V1.0//EN"

    SYSTEM "http://www.test.org/test.dtd">

    <test> "test" is a document element </test>

  • Internal DTD Declaration

    <!DOCTYPE test [ <!ELEMENT test EMPTY> ]>


  • Mixed usage

    <!DOCTYPE test SYSTEM "http://www.test.org/test.dtd" [

    <!ENTITY hello "hello world"> ]>


Working with namespaces
Working with Namespaces

  • Name collision occurs when elements from two or more documents share the same name.

  • Name collision isn’t a problem if you are not concerned with validation. The document content only needs to be well-formed.

  • However, name collision will keep a document from being validated.

Name collision
Name Collision

This figure shows two documents each with a Name element

Using namespaces to avoid name collision
Using Namespaces to Avoid Name Collision

This figure shows how to use a namespace to avoid collision

Declaring a namespace
Declaring a Namespace

  • A namespace is a defined collection of element and attribute names.

  • Names that belong to the same namespace must be unique. Elements can share the same name if they reside in different namespaces.

  • Namespaces must be declared before they can be used.

Declaring a namespace1
Declaring a Namespace

  • A namespace can be declared in the prolog or as an element attribute. The syntax to declare a namespace in the prolog is:

    <?xml:namespace ns=“URI” prefix=“prefix”?>

  • Where URI is a Uniform Resource Identifier that assigns a unique name to the namespace, and prefix is a string of letters that associates each element or attribute in the document with the declared namespace.

Declaring a namespace2
Declaring a Namespace

  • For example,

    <?xml:namespace ns=http://uhosp/patients/ns prefix=“pat”>

  • Declares a namespace with the prefix “pat” and the URI http://uhosp/patients/ns.

  • The URI is not a Web address. A URI identifies a physical or an abstract resource.

Xml introduction

1 <?xml version = "1.0"?>


3 <!-- Fig. 5.9 : defaultnamespace.xml -->

4 <!-- Using Default Namespaces -->


6 <directory xmlns = "urn:deitel:textInfo"

7 xmlns:image = "urn:deitel:imageInfo">


9 <file filename = "book.xml">

10 <description>A book list</description>

11 </file>


13 <image:file filename = "funny.jpg">

14 <image:description>A funny picture</image:description>

15 <image:size width = "200"height = "100"/>

16 </image:file>


18 </directory>

Xml introduction

<part-catalog xmlns:nw="http://www.nutware.com/" xmlns="http://www.bobco.com/" >

<nw:entry nw:number="1327"> <nw:description>torque-balancing hexnut</nw:description>


<part id="555">

<name>type 4 wingnut</name>



Schemas xmlns="http://www.bobco.com/" >

  • A schema is an XML document that defines the content and structure of one or more XML documents.

  • To avoid confusion, the XML document containing the content is called the instance document.

  • It represents a specific instance of the structure defined in the schema.

Comparing schemas and dtds
Comparing Schemas and DTDs xmlns="http://www.bobco.com/" >

This figure compares schemas and DTDs

Schema dialects
Schema Dialects xmlns="http://www.bobco.com/" >

  • There is no single schema form.

  • Several schema “dialects” have been developed in the XML language.

  • Support for a particular schema depends on the XML parser being used for validation.

Starting a schema file
Starting a Schema File xmlns="http://www.bobco.com/" >

  • A schema is always placed in a separate XML document that is referenced by the instance document.

Schema types
Schema Types xmlns="http://www.bobco.com/" >

  • XML Schema recognize two categories of element types: complex and simple.

  • A complextype element has one or more attributes, or is the parent to one or more child elements.

  • A simpletype element contains only character data and has no attributes.

Schema types1
Schema Types xmlns="http://www.bobco.com/" >

This figure shows types of elements

Understanding data types
Understanding Data Types xmlns="http://www.bobco.com/" >

  • XML Schema supports two data types: built-in and user-derived.

  • A built-in data type is part of the XML Schema specifications and is available to all XML Schema authors.

  • A user-derived data type is created by the XML Schema author for specific data values in the instance document.

Understanding data types1
Understanding Data Types xmlns="http://www.bobco.com/" >

  • A primitive data type, also called a base type, is one of 19 fundamental data types not defined in terms of other types.

  • A derived data type is a collection of 25 data types that the XML Schema developers created based on the 19 primitive types.

Example document sequence constructor
Example Document – Sequence Constructor xmlns="http://www.bobco.com/" >

  • XML Document

    <USAddress country="US">

    <name>Alice Smith</name>

    <street>123 Maple Street</street>

    <city>Mill Valley</city>



    </USAddress >

  • DTD

    <!ELEMENT USAdress(name,street,city, state,zip )>

    <!ATTLIST USAdress country CDATA #FIXED >

    <!ELEMENT name #PCDATA> etc.

Example document sequence constructor1
Example Document – Sequence Constructor xmlns="http://www.bobco.com/" >

  • XML Schema

    <xsd:complexType name="USAddress">


    <xsd:element name="name" type="xsd:string"/>

    <xsd:element name="street" type="xsd:string"/>

    <xsd:element name="city" type="xsd:string"/>

    <xsd:element name="state" type="xsd:string"/>

    <xsd:element name="zip" type="xsd:decimal"/>


    <xsd:attribute name="country" type="xsd:NMTOKEN"

    use="fixed" value="US"/>


Anonymous types and user defined simple types
Anonymous Types and User-Defined Simple Types xmlns="http://www.bobco.com/" >

<xsd:complexType name="Items">


<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">



<xsd:element name="productName" type="xsd:string"/>

<xsd:element name="quantity">


<xsd:restriction base="xsd:positiveInteger">

<xsd:maxExclusive value="100"/>




<xsd:element name="USPrice" type="xsd:decimal"/>

<xsd:element ref="comment" minOccurs="0"/>

<xsd:element name="shipDate" type="xsd:date“ minOccurs="0"/>


<xsd:attribute name="partNum" type="SKU"/>