1 / 35

Introduction to DTDs

Introduction to DTDs. Formal syntax that explains precisely which elements and attributes can appear in a document Dictates element order Parsers that validate compare documents to its DTD and lists places where validation fails. Validation errors aren’t necessarily fatal. Validation.

phuoc
Download Presentation

Introduction to DTDs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to DTDs • Formal syntax that explains precisely which elements and attributes can appear in a document • Dictates element order • Parsers that validate compare documents to its DTD and lists places where validation fails. • Validation errors aren’t necessarily fatal

  2. Validation • A document can be well formed but not valid • A valid document must be well formed • A valid document adheres to the rules of either an internal or external DTD. • Everything not expressly permitted in a DTD is forbidden.

  3. What a DTD Does NOT Indicate • The document root element • The exact number of times an element can appear in a document • Character data content • The semantic meaning of an element

  4. Dissecting a Simple DTD • See example 3-1 on page 29 • DTDs are generally stored in separate files so they can be referenced by multiple XML files. • External DTDs generally use the DTD extension. • Web servers return DTDs using the MIME media type “application/xml-dtd”

  5. Dissecting a DTD Part Two • Each line of 3-1 is an element declaration • You can have more than one declaration per line and declarations can span lines. • The asterisk after the profession element indicates “zero or more” • The element order must be adhered to; the name element must appear before the profession element. • Both elements must be contained within a person element. • See valid and invalid examples on page 29.

  6. Dissecting a DTD Part 3 • You cannot add additional elements that are not included in the DTD • You cannot use mixed content without indicating such. • See invalid examples on page 30 • #PCDATA is short for parsed character data, raw text that can contain character entities but no tags or child elements.

  7. Document Type Declarations • Should not be confused with Document Type Definitions. • Either contain the internal DTD or reference the external DTD. • <!DOCTYPE person SYSTEM http://www.cafeconleche.org/dtds/person.dtd> • The above DT declaration indicates that the DTD for the XML file with the root element person can be found at the indicated URL. • DT declarations are located in an XML document’s prolog, the area after the XML declaration, but before the root element. • Relative URLs are acceptable.

  8. Internal DTD Subsets • The DT Declaration may contain the DTD rather than reference it externally (Example 3-4 on page 32) • You can combine the two methods (See the bottom of page 32) The internal and external DTDs together make the complete DTD. Neither can override the element declarations made by the other. • When you use external DTDs, set the XML declaration’s standalone attribute value to “no” (See page 33)

  9. Validating a Document • Web browsers do not validate documents, but the DTDs must be free of syntax errors. • Online validators are available. The URLs are on the bottom of page 33. • XML Spy home edition (altova.com) will validate XML documents.

  10. Element Declarations • Every element used in a document must be declared in the document’s DTD using the format <!ELEMENT element_name content_specification> • The element name can be any legal XML name • The content_specification specifies the children the element may or must have in order.

  11. #PCDATA • The simplest content specification • Parsed character data • Cannot contain child elements • <!ELEMENT phone_number (#PCDATA)>

  12. Child Elements • If ?, *, or + does not follow element name, the child element must appear once, no more no less • ? Zero or one allowed • * Zero or more allowed • + One or more required • See examples on 37-38. • Multiple elements separated with commas. • The elements must appear in the specified order.

  13. Choices • List of element names separated by vertical bars • Choices can be marked with +, *, or ? • Parentheses can be nested to give more complex options. • Choices can be extended to an indefinite number. • See examples on 38-39.

  14. Mixed Content • Declared using <!ELEMENT definition (#PCDATA | term)*> • This means that zero to infinite term children can appear in a definition element along with parsed character data. • You can add additional child elements but PCDATA must always be listed first • The only way to indicate mixed content.

  15. Empty Elements • <!ELEMENT image EMPTY> • Generally has just one tag that acts as both opening and closing tag. • Even though no data appears between tags, a separate closing tag can be used. • An empty element contains nothing, not even whitespace.

  16. ANY • Indicates that any type of content is allowed • <!ELEMENT page ANY> • All elements must still be declared • Useful when DTDs are in design stage • Bad form to use in finished DTDs. • Not used often

  17. Attribute Declarations • Valid documents declare all attributes • Done with ATTLIST declarations • Each attribute must be declared for EACH element it relates to. • <!ATTLIST image source CDATA #REQUIRED> indicates that the image element has a source attribute that is character data and must be included. • Multiple attributes can be included for the same element. See example on page 41.

  18. Attribute Types • CDATA • NMTOKEN • Enumeration • ENTITY • ENTITIES • ID • IDREF • IDREFS • NOTATION

  19. CDATA • Can contain any text string acceptable in a well-formed attribute value • See example on the top of page 43.

  20. NMTOKEN • Name Token • Close to XML name • Same characters allowed as in XML names. However, all allowed characters can be the first character in a name token • Every XML Name is an XML Name Token, but not every XML Name Token is an XML Name.

  21. NMTOKENS • Contains one or more XML name tokens separated by white space. • See example near the bottom of page 43.

  22. Enumeration • Only attribute type that isn’t XML keyword • Lists all possible attribute values • Values separated by vertical bars/pipe • Possible values must be name tokens • See examples on page 44

  23. ID • Must contain unique XML Name • No other ID type in XML document can have same value • Non ID values are not considered • Each element can have only one ID type attribute. • Assign unique identifiers to elements • ID numbers are tricky since a number isn’t a valid XML name. ID numbers can be preceded with a valid beginning XML name character

  24. IDREF • Refers to the ID type attribute of some document element • See sample on page 45 • IDREFs can’t be constrained to a specific element’s ID. Any ID is suitable. • Therefore, true referential integrity cannot be enforced.

  25. IDREFS • Whitespace separated list of XML names. • Must be the IDs of document elements. • See example on page 46.

  26. ENTITY/ENTITIES • Contains the name of an unparsed entity declared elsewhere in the DTD • ENTITIES contains the names of one or more unparsed entities declared elsewhere in the DTD.

  27. Attribute Defaults • #IMPLIED – Optional attribute • #REQUIRED – Required. No default is provided. • #FIXED – Attribute has specified value no matter what • Literal – Default value given as quoted string. • See examples on page 48.

  28. General Entity Declarations • Defined with an ENTITY declaration in the DTD. • Entity name must be an XML name. • Value in quotes is the replacement text. • See example on the bottom of page 48. • Entities can contain markup as well. • Replacement text must be well-formed once it is merged with the document. • Must use different quotes than those that surround it. • Replacement text can contain entity references. They are resolved before the text is replaced.

  29. External Parsed General Entities • Web sites usually store repeated content in external files. • External files are referenced in a general entity reference. • URLs indicate the location of the external file. • Relative and absolute URLs are acceptable. • An external entity is accessed inside of an element similarly to an internal entity (&entityname) • References to external parsed entities are NOT allowed in attribute values • Once inserted, the document must still be well formed in order to be parsed.

  30. Parameter Entities • There are times when elements “share” attributes • You can use parameter entities to avoid defining the same list of attributes over and over again. • See example in the middle of page 53. • You create a parameter entity by defining a constant that can hold all of the repeated values. • Adding an attribute to the single parameterized entity will add the attribute to all elements utilizing the parameter entity.

  31. Parameter Entities Part Two • Parameter entities are necessary since entity references can’t provide replacement text for attributes; just for XML document content. • Parameter entities act like and are declared almost the same as a general entity. • Use a % instead of an &. • They can only be used in a DTD.

  32. Parameter Entities Part Three • See example on page 53. • See implementation on page 54. • An internal DTD subset can specify replacement text for the externally defined entity since internal DTDs take preference.

  33. External DTD Subsets • Industry Standard DTDs can be quite large. • However, DTDs can be broken up into multiple files. • They are combined together at validation time using external parameter entity references. • http://xmlwriter.net/xml_guide/doctype_declaration.shtml

  34. Conditional Inclusion • The IGNORE directive will comment out a section of declaration • The INCLUDE directive will indicate that a section should be included in a declaration. • Both directives are useless until you consider that you could toggle the values using a parameter entity. • See the example on page 55.

  35. Locating Standard DTDs • There are many standard DTDs for different professions • It’s better to use an established DTD than design your own. • There is no central DTD repository • See three attempts at repository creation by visiting links listed in text on page 58. • It’s likely that any group that uses IT heavily has created a DTD.

More Related