xml validation i dtds
Download
Skip this Video
Download Presentation
XML Validation I DTDs

Loading in 2 Seconds...

play fullscreen
1 / 38

XML Validation I DTDs - PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on

XML Validation I DTDs. Robin Burke ECT 360 Winter 2004. Outline. History Grammars / Regular expressions DTDs elements attributes entities Declarations. Validation. Why bother?. The idea. Language consists of terminals a, b, c Set of productions beginning with non-terminals

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' XML Validation I DTDs' - mikko


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
xml validation i dtds

XML Validation IDTDs

Robin Burke

ECT 360

Winter 2004

outline
Outline
  • History
  • Grammars / Regular expressions
  • DTDs
    • elements
    • attributes
    • entities
  • Declarations
validation
Validation
  • Why bother?
the idea
The idea
  • Language consists of terminals
    • a, b, c
  • Set of productions
    • beginning with non-terminals
      • A, B, C
    • rules specifying how to generate sequences of terminals
example
Example
  • A  aB
  • A  aBA
  • B  b
  • generates strings
    • ababab etc.
grammar
Grammar
  • Can be used to efficiently parse a language
    • basis of all modern programming language parsing since Algol-60
    • Java Language Specification is completely in EBNF grammar
grammar1
Grammar
  • XML
    • grammar-based syntax
    • adheres to EBNF
  • SGML
    • SGML had a more complex language definition syntax
    • HTML is defined the SGML way
regular expressions
Regular expressions
  • Language for expressing patterns
  • Basic components
    • pattern elements
    • optional element = ?
    • repetition (1 or more) = +
    • repetition (0 or more) = *
    • choice = |
    • grouping = ( )
    • sequence = ,
examples
Examples
  • (a, b)*
    • all strings "ab" "abab" etc.
  • (a | b | c)+, q, (b, c)*
    • aaqb
    • bq
    • bqcccccccc
slide10
Note
  • Regular expressions are different in different applications
    • Perl
    • Javascript
    • XML Schemas
  • DTDs only support
    • ?+*|,()
slide11
EBNF
  • EBNF is more compact version of BNF
    • it uses regular expressions to simplify grammar expression
  • A  aB
  • A  aBA
  • turns into
    • A  aB(A)?
  • only one production per non-terminal allowed
slide12
DTDs
  • Use EBNF to specify structure of XML documents
  • Plus
    • attributes
    • entities
  • Syntax
    • holdover from SGML
    • Ugly
dtd syntax
DTD Syntax
  • <!ELEMENT element-namecontent_model>
  • Content model contains the RHS of the production rule
  • Example

<!ELEMENT name

(firstName, lastName)>

dtd syntax cont d
DTD Syntax cont\'d
  • Not XML
    • <! begins a declaration
    • No "content"
    • Empty elements not indicated with />
simple content models
Simple content models
  • Content can be any text
    • #PCDATA
  • Content can be anything at all
    • (useful for debugging)
    • ANY
  • Element has no content
    • EMPTY
example1
Example

<grades>

<grade>

<student>Jane Doe</student>

<assigned-grade>A</assigned-grade>

</grade>

<grade>

<student>John Doe</student>

<assigned-grade>A-</assigned-grade>

</grade>

</grades>

example2
Example

<grades>

<grade>

<student>Jane Doe</student>

<assigned-grade>A</assigned-grade>

</grade>

<grade>

<student>John Doe</student>

<assigned-grade>A-</assigned-grade>

</grade>

<grade>

<student>Wayne Doe</student>

<assigned-grade>I</assigned-grade>

<reason>Alien abduction</reason>

</grade>

</grades>

mixed content
Mixed content
  • Legal to have a content model with text and element data

<story category="national" byline="Karen Wheatley">

<headline>President Meets with Congress</headline>

<![CDATA[

The President meet with Congressional leaders today in effort to jump-start

faltering budget negotiations. Sources described the mood of the meeting

as "cordial".

]]>

<full_text ref="news801" />

<image src="img2071.jpg" />

<image src="img2072.jpg" />

<image src="img2073.jpg" />

</story>

cdata
CDATA?
  • Forgot to mention last week
  • Content that appears here will not be parsed
    • Can include arbitrary text including <, &, etc.
  • Only restriction
    • termination sequence
    • ]]>
mixed content cont d
Mixed content, cont\'d
  • <!ELEMENT story (headline, #PCDATA, full-story, image*)>
  • Mixed content makes handling XML complex
    • necessary for many applications
recursion
Recursion
  • Unlike grammars
    • recursive formulation ≠ repetition
  • Difference between
    • <!ELEMENT students (student+)>
    • <!ELEMENT students (student, students?)>
restriction
Restriction
  • The grammar cannot be ambiguous
    • A  (a, b)| (a, c)
    • this makes the parser implementation difficult
  • Usually easy to make non-ambiguous
    • A  a, (b | c)
attribute lists
Attribute lists
  • Declared separately from elements
    • can be anywhere in the DTD
  • Specification includes
    • name of the element
    • name of the attribute
    • attribute type
    • default
attribute types
Attribute types
  • Character data
    • CDATA
    • different from XML CDATA section!
  • Enumerated
    • (yes|no)
  • ID
    • must be unique in the document
  • IDREF
    • must refer to an id in the document
  • NMTOKEN
    • a restriction of CDATA to single "word"
  • Also IDREFS and NMTOKENS
default declaration
Default declaration
  • #REQUIRED
  • #IMPLIED
    • means optional
  • Value
    • this becomes the default
  • #FIXED
    • value provided
examples1
Examples

<!ATTLIST img

src CDATA #REQUIRED

alt CDATA #REQUIRED

align (left|right|center) "left"

id ID #IMPLIED

>

<!ATTLIST timestamp

time-zone NMTOKEN #IMPLIED>

entities
Entities
  • Like macros
    • content to be inserted
    • indicated with &name;
  • Predefined general entities
    • &amp; &lt;
    • essential part of XML
  • User-defined general entities
    • &disclaimer;
entities cont d
Entities, cont\'d
  • Parameter entities
    • can also be used to simplify DTD creation
    • or to combine DTDs
    • indicated with a %
  • More on this next week
defining general entities
Defining general entities

<!ENTITY name content>

  • Example

<!ENTITY disclaimer

"This is a work of fiction. Any resemblance to persons living or dead is unintentional.">

unparsed data
Unparsed data
  • What about non-text data?
    • images, audio files
  • In XML
    • we define a notation
      • create a name and associate an application
    • suggestion to the application
      • how to interpret the unparsed data
      • not part of parsing operation
using notation
Using Notation
  • <!NOTATION name SYSTEM url>
  • Example
    • <!NOTATION jpeg SYSTEM "IExplore.exe">
    • declares the jpeg notation
  • Example
    • <!ENTITY "photo53" SYSTEM "photo53.jpg" NDATA jpeg>
notation cont d
Notation, cont\'d
  • Note that the content is defined in the DTD
    • not the document
    • binary data embedded in XML document
  • Not that useful in practice
    • more likely to use URLs
typical example
Typical Example

<story category="national" byline="Karen Wheatley">

...

<full_text ref="news801" />

<image src="img2071.jpg" />

<image src="img2072.jpg" />

<image src="img2073.jpg" />

</story>

  • Now it is up to the application to do something appropriate with the src attribute
a better solution
A better solution
  • Use XLink
  • We\'ll talk about this later
dtd limitations
DTD limitations
  • Not in XML
    • need a special parser for the DTD
  • No content type restrictions
    • #PCDATA can be anything
  • Element names must be globally unique
    • cannot reuse a common term at different places in the document
      • course-name
      • professor-name
dtd benefits
DTD benefits
  • Relatively easy to write and understand
    • wait until you see XML Schema!
  • Possible to modularize and combine DTDs
    • more next week
next week
Next week
  • More DTDs
    • Modularization and parameterization
    • on-line reading
  • Beginning Schemas
    • 4.1-4.30
ad