dtd document type definition l.
Skip this Video
Loading SlideShow in 5 Seconds..
DTD (Document Type Definition) PowerPoint Presentation
Download Presentation
DTD (Document Type Definition)

Loading in 2 Seconds...

play fullscreen
1 / 34

DTD (Document Type Definition) - PowerPoint PPT Presentation

  • Uploaded on

DTD (Document Type Definition). Imposing Structure on XML Documents ( W3Schools on DTDs ). Motivation. A DTD adds syntactical requirements in addition to the well-formed requirement It helps in eliminating errors when creating or editing XML documents It clarifies the intended semantics

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'DTD (Document Type Definition)' - lenka

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
dtd document type definition

DTD(Document Type Definition)

Imposing Structure on

XML Documents

(W3Schools on DTDs)

  • A DTD adds syntactical requirements in addition to the well-formed requirement
  • It helps in eliminating errors when creating or editing XML documents
  • It clarifies the intended semantics
  • It simplifies the processing of XML documents
an example
An Example
  • In an address book, where can a phone number appear?
    • Under <person>, under <name> or under both?
  • If we have to check for all possibilities, processing takes longer and it may not be clear to whom a phone belongs
document type definitions
Document Type Definitions
  • Document Type Definitions (DTDs) impose structure on XML documents
  • There is some relationship between a DTD and a schema, but it is not close – hence the need for additional “typing” systems (XML schemas)
  • The DTD is a syntactic specification
example an address book


At most one greeting

As many address

lines as needed

(in order)

Mixed telephones and faxes

As many

as needed

Example: An Address Book



<greet>Dr. H. Simpson</greet>

<addr>1234 Springwater Road</addr>

<addr>Springfield USA, 98765</addr>

<tel>(321) 786 2543</tel>

<fax>(321) 786 2544</fax>

<tel>(321) 786 2544</tel>



specifying the structure
Specifying the Structure
  • name to specify a name element
  • greet? to specify an optional (0 or 1) greet elements
  • name, greet? to specify a name followed by an optional greet
specifying the structure cont d
Specifying the Structure (cont’d)
  • addr* to specify 0 or more address lines
  • tel | fax a telor a fax element
  • (tel | fax)* 0 or more repeats of tel or fax
  • email* 0 or more email elements
specifying the structure cont d8
Specifying the Structure (cont’d)
  • So the whole structure of a person entry is specified by

name, greet?, addr*, (tel | fax)*, email*

  • This is known as a regular expression
element type definition
Element Type Definition
  • for each element type E, a declaration of the form:
  • <!ELEMENT E P>
  • where P is a regular expression, i.e.,
  • P ::= EMPTY | ANY | #PCDATA | E’ |
  • P1, P2 | P1 | P2 | P? | P+ | P*
    • E’: element type
    • P1 , P2: concatenation
    • P1 | P2: disjunction
    • P?: optional
    • P+: one or more occurrences
    • P*: the Kleene closure
summary of regular expressions
Summary of Regular Expressions
  • A The tag (i.e., element) A occurs
  • e1,e2 The expression e1 followed by e2
  • e* 0 or more occurrences of e
  • e? Optional: 0 or 1 occurrences
  • e+ 1 or more occurrences
  • e1 | e2 either e1 or e2
  • (e) grouping
the definition of an element consists of exactly one of the following
The Definition of an Element Consists of Exactly One of the Following
  • A regular expression(as defined earlier)
  • EMPTY means that the element has no content
  • ANY means that content can be any mixture of PCDATA and elements defined in the DTD
  • Mixed content which is defined as described on the next slide
  • (#PCDATA)
the definition of mixed content
The Definition of Mixed Content
  • Mixed content is described by a repeatable OR group

(#PCDATA | element-name | …)*

    • Inside the group, no regular expressions – just element names
    • #PCDATA must be first followed by 0 or more element names, separated by |
    • The group can be repeated 0 or more times
an address book xml document with an internal dtd

The name of

the DTD is


“Internal” means that the DTD and the

XML Document are in the same file

An Address-Book XML Document with an Internal DTD

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE addressbook [

<!ELEMENT addressbook (person*)>

<!ELEMENT person

(name, greet?, address*, (fax | tel)*, email*)>


<!ELEMENT greet (#PCDATA)>

<!ELEMENT address (#PCDATA)>



<!ELEMENT email (#PCDATA)>


The syntax of a DTD is not XML syntax

the rest of the address book xml document
The Rest of theAddress-Book XML Document
  • <addressbook>
  • <person>
  • <name> Jeff Cohen </name>
  • <greet> Dr. Cohen </greet>
      • <email> jc@penny.com </email>
  • </person>
  • </addressbook>
regular expressions




Regular Expressions
  • Each regular expression determines a corresponding finite-state automaton
  • Let’s start with a simpler example:

name, addr*, email

A double circle denotes an accepting state

This suggests a simple parsing program

another example










Another Example

name,address*,(tel | fax)*,email*

some things are hard to specify
Some Things are Hard to Specify

Each employee element should contain name, age and ssn elements in some order

<!ELEMENT employee

( (name, age, ssn) | (age, ssn, name) |

(ssn, name, age) | ...


Suppose that there were many more fields!

some things are hard to specify cont d
Some Things are Hard to Specify (cont’d)

<!ELEMENT employee

( (name, age, ssn) | (age, ssn, name) |

(ssn, name, age) | ...


Suppose there were many more fields!

There are n! different

orders of n elements

It is not even polynomial

specifying attributes in the dtd
Specifying Attributes in the DTD

<!ELEMENT height (#PCDATA)>

<!ATTLIST height


accuracy CDATA #IMPLIED >

The dimension attribute is required

The accuracy attribute is optional

CDATA is the “type” of the attribute – it means “character data,”and may take any literal string as a value

the format of an attribute definition
The Format of an Attribute Definition
  • <!ATTLIST element-nameattr-nameattr-typedefault-value>
  • The default value is given inside quotes
  • attribute types:
    • CDATA
summary of attribute default values
Summary of AttributeDefault Values
  • #REQUIRED means that the attribute must by included in the element
  • #FIXED “value”
    • The given value (inside quotes) is the only possible one
  • “value”
    • The default value of the attribute if none is given
recursive dtds
Recursive DTDs

Each person

should have

a father and a

mother. This

leads to either

infinite data or

a person that

is a descendent

of herself.

<DOCTYPE genealogy [

<!ELEMENT genealogy (person*)>

<!ELEMENT person (



person, -- mother

person )> -- father



What is the problem with this?

A parser does not notice it!

recursive dtds cont d
Recursive DTDs (cont’d)

If a person only

has a father,

how can you

tell that he has

a father and

does not have

a mother?

<DOCTYPE genealogy [

<!ELEMENT genealogy (person*)>

<!ELEMENT person (



person?, -- mother

person? )> -- father



What is now the problem with this?

using id and idref attributes
Using ID and IDREF Attributes

<!DOCTYPE family [

<!ELEMENT family (person)*>

<!ELEMENT person (name)>


<!ATTLIST person






ids and idrefs
IDs and IDREFs
  • ID attribute: unique within the entire document.
    • An element can have at most one ID attribute.
    • No default (fixed default) value is allowed.
      • #required: a value must be provided
      • #implied: a value is optional
  • IDREF attribute: its value must be some other element’s ID value in the document.
  • IDREFS attribute: its value is a set, each element of the set is the ID value of some other element in the document.

<person id=“898” father=“332” mother=“336”

children=“982 984 986”>

some conforming data
Some Conforming Data


<personid=“lisa” mother=“marge” father=“homer”>

<name> Lisa Simpson </name>


<personid=“bart” mother=“marge” father=“homer”>

<name> Bart Simpson </name>


<personid=“marge” children=“bartlisa”>

<name> Marge Simpson </name>


<personid=“homer” children=“bartlisa”>

<name> Homer Simpson </name>



id references do not have types
ID References do not Have Types
  • The attributes mother and father are references to IDs of other elements
  • However, those are not necessarily person elements!
  • The mother attribute is not necessarily a reference to a female person
an alternative specification
An Alternative Specification

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE family [

<!ELEMENT family (person)*>

<!ELEMENT person (name, mother?, father?, children?)>







<!ELEMENT children EMPTY>

<!ATTLIST children idrefs IDREFS #REQUIRED>


the revised data

<person id="marge">

<name> Marge

Simpson </name>

<children idrefs="bart lisa"/>


<person id="homer">

<name> Homer

Simpson </name>

<children idrefs="bart lisa"/>


<person id="bart">

<name> Bart Simpson </name>

<mother idref="marge"/>

<father idref="homer"/>


<person id="lisa">

<name> Lisa

Simpson </name>

<mother idref="marge"/>

<father idref="homer"/>



The Revised Data
consistency of id and idref attribute values
Consistency of ID and IDREF Attribute Values
  • If an attribute is declared as ID
    • The associated value must be distinct, i.e., different elements (in the given document) must have different values for the ID attribute (no confusion)
      • Even if the two elements have different element names
  • If an attribute is declared as IDREF
    • The associated value must exist as the value of some ID attribute (no dangling “pointers”)
  • Similarly for all the values of an IDREFS attribute
  • ID, IDREF and IDREFS attributes are not typed
adding a dtd to the document
Adding a DTD to the Document
  • A DTD can be internal
    • The DTD is part of the document file
  • or external
    • The DTD and the document are on separate files
    • An external DTD may reside
      • In the local file system

(where the document is)

      • In a remote file system
connecting a document with its dtd
Connecting a Document with its DTD
  • An internal DTD:


<!DOCTYPE db [<!ELEMENT ...> … ]>

<db> ... </db>

  • A DTD from the local file system:

<!DOCTYPE db SYSTEM "schema.dtd">

  • A DTD from a remote file system:

<!DOCTYPE db SYSTEM "http://www.schemaauthority.com/schema.dtd">

well formed xml documents
Well-Formed XML Documents
  • An XML document (with or without a DTD) is well-formed if
    • Tags are syntactically correct
    • Every tag has an end tag
    • Tags are properly nested
    • There is a root tag
    • A start tag does not have two occurrences of the same attribute

An XML document must be well formed

valid documents
Valid Documents
  • A well-formed XML document isvalid if it conforms to its DTD, that is,
    • The document conforms to the regular-expression grammar,
    • The types of attributes are correct, and
    • The constraints on references are satisfied