XML Introduction - PowerPoint PPT Presentation

Xml introduction " data-normal="" >
1 / 63

  • Uploaded on
  • Presentation posted in: General

XML Introduction. Introducing XML. XML stands for Extensible Markup Language. A markup language specifies the structure and content of a document. Because it is extensible, XML can be used to create a wide variety of document types. Introducing XML.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

XML Introduction

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Xml introduction

XML Introduction

Introducing xml

Introducing XML

  • XML stands for Extensible Markup Language. A markup language specifies the structure and content of a document.

  • Because it is extensible, XML can be used to create a wide variety of document types.

Introducing xml1

Introducing XML

  • XML is a subset of a the Standard Generalized Markup Language (SGML) which was introduced in the 1980s. SGML is very complex and can be costly.

  • These reasons led to the creation of Hypertext Markup Language (HTML), a more easily used markup language. XML can be seen as sitting between SGML and HTML – easier to learn than SGML, but more robust than HTML.

The limits of html

The Limits of HTML

  • HTML was designed for formatting text on a Web page. It was not designed for dealing with the content of a Web page. Additional features have been added to HTML, but they do not solve data description or cataloging issues in an HTML document.

  • Because HTML is not extensible, it cannot be modified to meet specific needs. Browser developers have added features making HTML more robust, but this has resulted in a confusing mix of different HTML standards.

Introducing xml2

Introducing XML

  • HTML cannot be applied consistently. Different browsers require different standards making the final document appear differently on one browser compared with another.

Introduction to xml markup

Introduction to XML Markup

  • XML document (intro.xml)

    • Marks up message as XML

    • Commonly stored in text files

      • Extension .xml

Xml introduction

1<?xml version = "1.0"?>

Document begins with declaration that specifies XML version 1.0


3<!-- Fig. 5.1 : intro.xml -->

Element message is child element of root elementmyMessage

4<!-- Simple introduction to XML markup -->

Line numbers are not part of XML document. We include them for clarity.



7 <message>Welcome to XML!</message>


Introduction to xml markup cont

Introduction to XML Markup (cont.)

  • XML documents

    • Must contain exactly one root element

      • Attempting to create more than one root element is erroneous

    • Elements must be nested properly

      • Incorrect:<x><y>hello</x></y>

      • Correct:<x><y>hello</y></x>

    • Must be well-formed

Xml parsers

XML Parsers

  • An XML processor (also called XML parser) evaluates the document to make sure it conforms to all XML specifications for structure and syntax.

  • XML parsers are strict. It is this rigidity built into XML that ensures XML code accepted by the parser will work the same everywhere.

Xml architecture

XML Architecture

Structure of a well formed xml document

Structure of a Well-formed XML Document

<?xml version="1.0" ?>

<!DOCTYPE publication [

<!ELEMENT publications (journals, conferences, books)>


<!ELEMENT author (#PCDATA)>

<!ELEMENT issue (#PCDATA)>


<!ENTITY JSI " <journal>Journal of Systems Integration</journal>

<publisher>Kluwer Academic Publishers</publisher>">








Xml parsers1

XML Parsers

  • Microsoft’s parser is called MSXML and is built directly in IE versions 5.0 and above.

  • Netscape developed its own parser, called Mozilla, which is built into version 6.0 and above.

Parsers and well formed xml documents cont

Parsers and Well-formed XML Documents (cont.)

  • XML parsers support

    • Document Object Model (DOM)

      • Builds tree structure containing document data in memory

    • Simple API for XML (SAX)

      • Generates events when tags, comments, etc. are encountered

        • (Events are notifications to the application)

Parsing an xml document with msxml

Parsing an XML Document with MSXML

  • XML document

    • Contains data

    • Does not contain formatting information

    • Load XML document into Internet Explorer 5.0

      • Document is parsed by msxml.

      • Places plus (+) or minus (-) signs next to container elements

        • Plus sign indicates that all child elements are hidden

        • Clicking plus sign expands container element

          • Displays children

        • Minus sign indicates that all child elements are visible

        • Clicking minus sign collapses container element

          • Hides children

      • Error generated, if document is not well formed

Xml document shown in ie6

XML document shown in IE6.

Character set

Character Set

  • XML documents may contain

    • Carriage returns

    • Line feeds

    • Unicode characters

      • Enables computers to process characters for several languages

Characters vs markup

Characters vs. Markup

  • XML must differentiate between

    • Markup text

      • Enclosed in angle brackets (< and >)

        • e.g,. Child elements

    • Character data

      • Text between start tag and end tag

        • Welcome to XML!

    • Elements versus Attributes

White space entity references and built in entities

White Space, Entity References and Built-in Entities

  • Whitespace characters

    • Spaces, tabs, line feeds and carriage returns

      • Significant (preserved by application)

      • Insignificant (not preserved by application)

        • Normalization

          • Whitespace collapsed into single whitespace character

          • Sometimes whitespace removed entirely

            <markup>This is character data</markup>

            after normalization, becomes

            <markup>This is character data</markup>

White space entity references and built in entities cont

White Space, Entity References and Built-in Entities (cont.)

  • XML-reserved characters

    • Ampersand (&)

    • Left-angle bracket (<)

    • Right-angle bracket (>)

    • Apostrophe (’)

    • Double quote (”)

  • Entity references

    • Allow to use XML-reserved characters

      • Begin with ampersand (&) and end with semicolon (;)

    • Prevents from misinterpreting character data as markup

White space entity references and built in entities cont1

White Space, Entity References and Built-in Entities (cont.)

  • Build-in entities

    • Ampersand (&amp;)

    • Left-angle bracket (&lt;)

    • Right-angle bracket (&gt;)

    • Apostrophe (&apos;)

    • Quotation mark (&quot;)

    • Mark up characters “<>&” in element message


Document object model dom

Document Object Model (DOM)

  • XML Document Object Model (DOM)

    • Build tree structure in memory for XML documents

    • DOM-based parsers parse these structures

      • Exist in several languages (Java, C, C++, Python, Perl, C#, VB.NET, VB, etc)

Document object model dom1

Document Object Model (DOM)

  • DOM tree

    • Each node represents an element, attribute, etc.

      <?xml version ="1.0"?><message from = "Paul" to = "Tem"> <body>Hi, Tim!</body></message>

      • Node created for element message

        • Element message has child node for body element

        • Element body has child node for text "Hi, Tim!"

        • Attributes from and to also have nodes in tree

Dom implementations

DOM Implementations

  • DOM-based parsers

    • Microsoft’s msxml

    • Microsoft.NET System.Xml Namspace

    • Sun Microsystem’s JAXP

Creating nodes

Creating Nodes

  • Create XML document at run time

Traversing the dom

Traversing the DOM

  • Use DOM to traverse XML document

    • Output element nodes

    • Output attribute nodes

    • Output text nodes

Dom components

DOM Components

  • Manipulate XML document



  • XML Path Language (XPath)

    • Syntax for locating information in XML document

      • e.g., attribute values

    • String-based language of expressions

      • Not structural language like XML

    • Used by other XML technologies

      • XSLT

Xpath nodes

XPATH - Nodes

  • XML document

    • Tree structure with nodes

    • Each node represents part of XML document

      • Seven types

        • Root

        • Element

        • Attribute

        • Text

        • Comment

        • Processing instruction

        • Namespace

      • Attributes and namespaces are not children of their parent node

        • They describe their parent node

Xpath node types

XPath node types

Xpath node types part 2

XPath node types. (Part 2)

Location paths

Location Paths

  • Location path

    • Expression specifying how to navigate XPath tree

    • Composed of location steps

      • Each location step composed of

        • Axis

        • Node test

        • Predicate

Xml introduction


  • XPath searches are made relative to context node

  • Axis

    • Indicates which nodes are included in search

      • Relative to context node

    • Dictates node ordering in set

      • Forward axes select nodes that follow context node

      • Reverse axes select nodes that precede context node

Node tests

Node Tests

  • Node tests

    • Refine set of nodes selected by axis

      • Rely upon axis’ principle node type

        • Corresponds to type of node axis can select

Node set operators and functions cont

Node-set Operators and Functions (cont.)

  • Location-path expressions

    • Combine node-set operators and functions

      • Select all head and body children element nodes

        head | body

      • Select last bold element node in head element node

        head/title[ last() ]

      • Select third book element

        book[ position() = 3 ]

        • Or alternatively

          book[ 3 ]

      • Return total number of element-node children

        count( * )

      • Select all book element nodes in document


Sample data for queries

Sample Data for Queries

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><bookprice=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>


Data model for xpath


Data Model for XPath

The root

The root element





. . . .


Serge Abiteboul

Xpath simple expressions

XPath: Simple Expressions

Result: <year> 1995 </year>

<year> 1998 </year>

Result: empty (there were no papers)



Xml document type definitions

XML Document Type Definitions

  • Declarations

    Definition of element and attribute

  • Content Model (regular expressions)

    • Association of attributes with elements

    • Association of elements with other

    • Order and cardinality constraints

Element declarations

Element Declarations

  • Basic form

    –<!ELEMENT elementname (contentmodel)>

    –Contentmodel determines which

    –Given by a regular expression

  • Atomic contents

    • Element content

      <!ELEMENT example ( a )>

    • Text content

      <!ELEMENT example (#PCDATA)>

      –Empty Element

      <!ELEMENT example EMPTY>

      –Arbitrary content

      <!ELEMENT example ANY>

Element declarations1

Element Declarations

  • Sequence

    <!ELEMENT example ( a, b )>

  • Alternative

    <!ELEMENT example ( a | b )>

  • Optional (zero or one)

    <!ELEMENT example ( a )?>

  • Optional and repeatable (zero or more)

    <!ELEMENT example ( a )*>

  • Required and repeatable (one or more)

    <!ELEMENT example ( a )+>

  • Mixed content

    <!ELEMENT example (#PCDATA | a)*>

  • Content model can be grouped by parentheses

  • Cyclic element containment is allowed

Attribute declarations

Attribute Declarations

  • Each element can be associated with an arbitrary number of attributes

  • Basic form

    – <!ATTLIST ElementnameAttributename Type Default

    Attributename Type Default

    ... >

  • Example:

    Document Type Definition

    <!ELEMENT shipTo ( #PCDATA)>

    <!ATTLIST shipTo country CDATA #REQUIRED "US"

    state CDATA #IMPLIED

    version CDATA #FIXED "1.0"

    payment (cash|creditCard) "cash">


    <shipTo country="Switzerland"


    payment="creditCard"> … </shipTo>

Attribute declarations types

Attribute Declarations - Types


    – String

    – <!ATTLIST example HREF CDATA

  • Enumeration

    – Token from given set of values, Default

    – <!ATTLIST example selection (

  • Possible Defaults

    – Required attribute: #REQUIRED

    – Optional attribute: #IMPLIED

    – Fixed attribute: #FIXED

    – Default for enumeration: "value"


Id idref example id idref



    – ID is a unique identifier within the document

    – IDREF is a reference to an ID

    – Referential integrity checked by the parser

    – ID's determined by the application

    – <!ATTLIST example identity ID #IMPLIED

    reference IDREF #IMPLIED>

Inclusion of xml document type definitions

Inclusion of XML Document Type Definitions

  • External DTD Declaration

    <?xml version="1.0" encoding="ISO-8859-1"?>

    <!DOCTYPE test PUBLIC "-//Test AG//DTD test V1.0//EN"

    SYSTEM "http://www.test.org/test.dtd">

    <test> "test" is a document element </test>

  • Internal DTD Declaration

    <!DOCTYPE test [ <!ELEMENT test EMPTY> ]>


  • Mixed usage

    <!DOCTYPE test SYSTEM "http://www.test.org/test.dtd" [

    <!ENTITY hello "hello world"> ]>


Working with namespaces

Working with Namespaces

  • Name collision occurs when elements from two or more documents share the same name.

  • Name collision isn’t a problem if you are not concerned with validation. The document content only needs to be well-formed.

  • However, name collision will keep a document from being validated.

Name collision

Name Collision

This figure shows two documents each with a Name element

Using namespaces to avoid name collision

Using Namespaces to Avoid Name Collision

This figure shows how to use a namespace to avoid collision

Declaring a namespace

Declaring a Namespace

  • A namespace is a defined collection of element and attribute names.

  • Names that belong to the same namespace must be unique. Elements can share the same name if they reside in different namespaces.

  • Namespaces must be declared before they can be used.

Declaring a namespace1

Declaring a Namespace

  • A namespace can be declared in the prolog or as an element attribute. The syntax to declare a namespace in the prolog is:

    <?xml:namespace ns=“URI” prefix=“prefix”?>

  • Where URI is a Uniform Resource Identifier that assigns a unique name to the namespace, and prefix is a string of letters that associates each element or attribute in the document with the declared namespace.

Declaring a namespace2

Declaring a Namespace

  • For example,

    <?xml:namespace ns=http://uhosp/patients/ns prefix=“pat”>

  • Declares a namespace with the prefix “pat” and the URI http://uhosp/patients/ns.

  • The URI is not a Web address. A URI identifies a physical or an abstract resource.

Xml introduction

1<?xml version = "1.0"?>


3<!-- Fig. 5.9 : defaultnamespace.xml -->

4<!-- Using Default Namespaces -->


6<directory xmlns = "urn:deitel:textInfo"

7xmlns:image = "urn:deitel:imageInfo">


9<file filename = "book.xml">

10<description>A book list</description>



13<image:file filename = "funny.jpg">

14<image:description>A funny picture</image:description>

15<image:size width = "200"height = "100"/>




Xml introduction

<part-catalog xmlns:nw="http://www.nutware.com/" xmlns="http://www.bobco.com/" >

<nw:entry nw:number="1327"> <nw:description>torque-balancing hexnut</nw:description>


<part id="555">

<name>type 4 wingnut</name>





  • A schema is an XML document that defines the content and structure of one or more XML documents.

  • To avoid confusion, the XML document containing the content is called the instance document.

  • It represents a specific instance of the structure defined in the schema.

Comparing schemas and dtds

Comparing Schemas and DTDs

This figure compares schemas and DTDs

Schema dialects

Schema Dialects

  • There is no single schema form.

  • Several schema “dialects” have been developed in the XML language.

  • Support for a particular schema depends on the XML parser being used for validation.

Starting a schema file

Starting a Schema File

  • A schema is always placed in a separate XML document that is referenced by the instance document.

Schema types

Schema Types

  • XML Schema recognize two categories of element types: complex and simple.

  • A complextype element has one or more attributes, or is the parent to one or more child elements.

  • A simpletype element contains only character data and has no attributes.

Schema types1

Schema Types

This figure shows types of elements

Understanding data types

Understanding Data Types

  • XML Schema supports two data types: built-in and user-derived.

  • A built-in data type is part of the XML Schema specifications and is available to all XML Schema authors.

  • A user-derived data type is created by the XML Schema author for specific data values in the instance document.

Understanding data types1

Understanding Data Types

  • A primitive data type, also called a base type, is one of 19 fundamental data types not defined in terms of other types.

  • A derived data type is a collection of 25 data types that the XML Schema developers created based on the 19 primitive types.

Example document sequence constructor

Example Document – Sequence Constructor

  • XML Document

    <USAddress country="US">

    <name>Alice Smith</name>

    <street>123 Maple Street</street>

    <city>Mill Valley</city>



    </USAddress >

  • DTD

    <!ELEMENT USAdress(name,street,city, state,zip )>

    <!ATTLIST USAdress country CDATA #FIXED >

    <!ELEMENT name #PCDATA> etc.

Example document sequence constructor1

Example Document – Sequence Constructor

  • XML Schema

    <xsd:complexType name="USAddress">


    <xsd:element name="name" type="xsd:string"/>

    <xsd:element name="street" type="xsd:string"/>

    <xsd:element name="city" type="xsd:string"/>

    <xsd:element name="state" type="xsd:string"/>

    <xsd:element name="zip" type="xsd:decimal"/>


    <xsd:attribute name="country" type="xsd:NMTOKEN"

    use="fixed" value="US"/>


Anonymous types and user defined simple types

Anonymous Types and User-Defined Simple Types

<xsd:complexType name="Items">


<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">



<xsd:element name="productName" type="xsd:string"/>

<xsd:element name="quantity">


<xsd:restriction base="xsd:positiveInteger">

<xsd:maxExclusive value="100"/>




<xsd:element name="USPrice" type="xsd:decimal"/>

<xsd:element ref="comment" minOccurs="0"/>

<xsd:element name="shipDate" type="xsd:date“ minOccurs="0"/>


<xsd:attribute name="partNum" type="SKU"/>





  • Login