Xml validation i dtds
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

XML Validation I DTDs PowerPoint PPT Presentation


  • 46 Views
  • Uploaded on
  • Presentation posted in: General

XML Validation I DTDs. Robin Burke ECT 360 Winter 2004. Outline. History Grammars / Regular expressions DTDs elements attributes entities Declarations. Validation. Why bother?. The idea. Language consists of terminals a, b, c Set of productions beginning with non-terminals

Download Presentation

XML Validation I DTDs

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Xml validation i dtds

XML Validation IDTDs

Robin Burke

ECT 360

Winter 2004


Outline

Outline

  • History

  • Grammars / Regular expressions

  • DTDs

    • elements

    • attributes

    • entities

  • Declarations


Validation

Validation

  • Why bother?


The idea

The idea

  • Language consists of terminals

    • a, b, c

  • Set of productions

    • beginning with non-terminals

      • A, B, C

    • rules specifying how to generate sequences of terminals


Example

Example

  • A  aB

  • A  aBA

  • B  b

  • generates strings

    • ababab etc.


Grammar

Grammar

  • Can be used to efficiently parse a language

    • basis of all modern programming language parsing since Algol-60

    • Java Language Specification is completely in EBNF grammar


Grammar1

Grammar

  • XML

    • grammar-based syntax

    • adheres to EBNF

  • SGML

    • SGML had a more complex language definition syntax

    • HTML is defined the SGML way


Regular expressions

Regular expressions

  • Language for expressing patterns

  • Basic components

    • pattern elements

    • optional element = ?

    • repetition (1 or more) = +

    • repetition (0 or more) = *

    • choice = |

    • grouping = ( )

    • sequence = ,


Examples

Examples

  • (a, b)*

    • all strings "ab" "abab" etc.

  • (a | b | c)+, q, (b, c)*

    • aaqb

    • bq

    • bqcccccccc


Xml validation i dtds

Note

  • Regular expressions are different in different applications

    • Perl

    • Javascript

    • XML Schemas

  • DTDs only support

    • ?+*|,()


Xml validation i dtds

EBNF

  • EBNF is more compact version of BNF

    • it uses regular expressions to simplify grammar expression

  • A  aB

  • A  aBA

  • turns into

    • A  aB(A)?

  • only one production per non-terminal allowed


Xml validation i dtds

DTDs

  • Use EBNF to specify structure of XML documents

  • Plus

    • attributes

    • entities

  • Syntax

    • holdover from SGML

    • Ugly


Dtd syntax

DTD Syntax

  • <!ELEMENT element-namecontent_model>

  • Content model contains the RHS of the production rule

  • Example

    <!ELEMENT name

    (firstName, lastName)>


Dtd syntax cont d

DTD Syntax cont'd

  • Not XML

    • <! begins a declaration

    • No "content"

    • Empty elements not indicated with />


Simple content models

Simple content models

  • Content can be any text

    • #PCDATA

  • Content can be anything at all

    • (useful for debugging)

    • ANY

  • Element has no content

    • EMPTY


Example1

Example

<grades>

<grade>

<student>Jane Doe</student>

<assigned-grade>A</assigned-grade>

</grade>

<grade>

<student>John Doe</student>

<assigned-grade>A-</assigned-grade>

</grade>

</grades>


Example2

Example

<grades>

<grade>

<student>Jane Doe</student>

<assigned-grade>A</assigned-grade>

</grade>

<grade>

<student>John Doe</student>

<assigned-grade>A-</assigned-grade>

</grade>

<grade>

<student>Wayne Doe</student>

<assigned-grade>I</assigned-grade>

<reason>Alien abduction</reason>

</grade>

</grades>


Mixed content

Mixed content

  • Legal to have a content model with text and element data

    <story category="national" byline="Karen Wheatley">

    <headline>President Meets with Congress</headline>

    <![CDATA[

    The President meet with Congressional leaders today in effort to jump-start

    faltering budget negotiations. Sources described the mood of the meeting

    as "cordial".

    ]]>

    <full_text ref="news801" />

    <image src="img2071.jpg" />

    <image src="img2072.jpg" />

    <image src="img2073.jpg" />

    </story>


Cdata

CDATA?

  • Forgot to mention last week

  • Content that appears here will not be parsed

    • Can include arbitrary text including <, &, etc.

  • Only restriction

    • termination sequence

    • ]]>


Mixed content cont d

Mixed content, cont'd

  • <!ELEMENT story (headline, #PCDATA, full-story, image*)>

  • Mixed content makes handling XML complex

    • necessary for many applications


Recursion

Recursion

  • Unlike grammars

    • recursive formulation ≠ repetition

  • Difference between

    • <!ELEMENT students (student+)>

    • <!ELEMENT students (student, students?)>


Restriction

Restriction

  • The grammar cannot be ambiguous

    • A  (a, b)| (a, c)

    • this makes the parser implementation difficult

  • Usually easy to make non-ambiguous

    • A  a, (b | c)


Attribute lists

Attribute lists

  • Declared separately from elements

    • can be anywhere in the DTD

  • Specification includes

    • name of the element

    • name of the attribute

    • attribute type

    • default


Attribute types

Attribute types

  • Character data

    • CDATA

    • different from XML CDATA section!

  • Enumerated

    • (yes|no)

  • ID

    • must be unique in the document

  • IDREF

    • must refer to an id in the document

  • NMTOKEN

    • a restriction of CDATA to single "word"

  • Also IDREFS and NMTOKENS


Default declaration

Default declaration

  • #REQUIRED

  • #IMPLIED

    • means optional

  • Value

    • this becomes the default

  • #FIXED

    • value provided


Examples1

Examples

<!ATTLIST img

src CDATA #REQUIRED

alt CDATA #REQUIRED

align (left|right|center) "left"

id ID #IMPLIED

>

<!ATTLIST timestamp

time-zone NMTOKEN #IMPLIED>


Entities

Entities

  • Like macros

    • content to be inserted

    • indicated with &name;

  • Predefined general entities

    • &amp; &lt;

    • essential part of XML

  • User-defined general entities

    • &disclaimer;


Entities cont d

Entities, cont'd

  • Parameter entities

    • can also be used to simplify DTD creation

    • or to combine DTDs

    • indicated with a %

  • More on this next week


Defining general entities

Defining general entities

<!ENTITY name content>

  • Example

    <!ENTITY disclaimer

    "This is a work of fiction. Any resemblance to persons living or dead is unintentional.">


Unparsed data

Unparsed data

  • What about non-text data?

    • images, audio files

  • In XML

    • we define a notation

      • create a name and associate an application

    • suggestion to the application

      • how to interpret the unparsed data

      • not part of parsing operation


Using notation

Using Notation

  • <!NOTATION name SYSTEM url>

  • Example

    • <!NOTATION jpeg SYSTEM "IExplore.exe">

    • declares the jpeg notation

  • Example

    • <!ENTITY "photo53" SYSTEM "photo53.jpg" NDATA jpeg>


Notation cont d

Notation, cont'd

  • Note that the content is defined in the DTD

    • not the document

    • binary data embedded in XML document

  • Not that useful in practice

    • more likely to use URLs


Typical example

Typical Example

<story category="national" byline="Karen Wheatley">

...

<full_text ref="news801" />

<image src="img2071.jpg" />

<image src="img2072.jpg" />

<image src="img2073.jpg" />

</story>

  • Now it is up to the application to do something appropriate with the src attribute


A better solution

A better solution

  • Use XLink

  • We'll talk about this later


Dtd limitations

DTD limitations

  • Not in XML

    • need a special parser for the DTD

  • No content type restrictions

    • #PCDATA can be anything

  • Element names must be globally unique

    • cannot reuse a common term at different places in the document

      • course-name

      • professor-name


Dtd benefits

DTD benefits

  • Relatively easy to write and understand

    • wait until you see XML Schema!

  • Possible to modularize and combine DTDs

    • more next week


Next week

Next week

  • More DTDs

    • Modularization and parameterization

    • on-line reading

  • Beginning Schemas

    • 4.1-4.30


Xml validation i dtds

Lab


  • Login