1 / 22

What Are Real DTDs Like

What Are Real DTDs Like. Group Members : Xijie Zeng Peiyu Cai Presentor : Xijie Zeng. Outline. Overview Introduction Local properties Global properties. Overview. XML is widely used in a variety of areas DTDs with different structures define XML with different usages

cutter
Download Presentation

What Are Real DTDs Like

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Are Real DTDs Like Group Members : Xijie Zeng Peiyu Cai Presentor : Xijie Zeng

  2. Outline • Overview • Introduction • Local properties • Global properties

  3. Overview • XML is widely used in a variety of areas • DTDs with different structures define XML with different usages • A survey based on a number of DTDs in our real world

  4. Introduction • DTDs are from XML.org DTD repository • Three DTD categories : • app : Describe objects interchanged between programs/applications • data : Describe data stored in database • meta : Describe the structure of document markup • 60 DTDs - 7 are app, 13 are data, 40 are meta

  5. Introduction (cont.) • A DTD can be described as a collection of element declarations of the form eα where e is the element name and α is the content model. The content model α::= ε| pcdata |e |α,α| α|α|α* | α+ | α?

  6. Introduction (cont.) Email DTD <!ELEMENT email (head, body)> <!ELEMENT head (from, to+, cc*, subject)> <!ELEMENT from EMPTY> <!ATTLIST from name CDATA #IMPLIED address CDATA #REQUIRED> <!ELEMENT to EMPTY> <!ATTLIST to name CDATA #IMPLIED address CDATA #REQUIRED> <!ELEMENT cc EMPTY> <!ATTLIST cc name CDATA #IMPLIED address CDATA #REQUIRED> <!ELEMENT subject (#PCDATA)> <!ELEMENT body (text, attachment*)> <!ELEMENT text (#PCDATA)> <!ELEMENT attachment EMPTY> <!ATTLIST attachment encoding (mime|binhex) "mime" file CDATA #REQUIRED> email (head, body) head (from, to+, cc*, subject) from (ε) to (ε) cc (ε) subject (pcdata) body (text, attachment*) text (pcdata) attachment (ε)

  7. Introduction (cont.) • Local properties Describe content models in individual element declarations • Global properties Describe the graph-theoretic structure of the whole DTD

  8. body1 (pcdata, attatchment*) Local properties • Content model classification • (1) pcdata • (2) ε • (3) any • No restriction on subelements • (4) Mixed content body (text, attachment*) text (pcdata) • (5) “|” only but not mixed content • (6) “,” only • (7) Complex content • Contains both “|” and “,” directory (dirname, dirinfo?, dirdesc?, (file | directory)*) • (8) List • α * • α + • (9) Single • α ?

  9. Local properties (cont.) • Content model classification

  10. Local properties (cont.) • Syntactic complexity depth(ε) = 0; depth(е) = 1; depth(α*) = depth(α+) = depth(α?) = depth(pcdata) = 1; depth(α1,α2,…,αn) = depth(α1|α2,…|αn) = depth(α) + 1; max(depth(αi)) + 1;

  11. Local properties (cont.) • An example head (from, to+, cc*, subject) depth(from, to+, cc*, subject) = depth(cc*) + 1 = depth(cc) + 1 + 1 = 1 + 1 + 1 = 3

  12. Local properties (cont.) • Determinism If a content model DOES NOT require look ahead when parsing, it is a deterministic content model. non-deterministic content model : (a, b) | (a, c) deterministic content model : a, (b|c) • Result It detects 5 non-deterministic content models in 4 DTDs.

  13. Local properties (cont.) • Ambiguity Definition : An expression R is ambiguous if and only if there exists some string s in R such that there can be distinct ways to parse string s. partner (name?, onetime?, partnrid?, partnrtype?, syncind?, name*, parentid?, partnridx?, partnrratg*) • Result It detects 2 ambiguous content models.

  14. email head head subject email subject Global properties • Reachability Definition : An element name e’ is reachable from e, denoted by ee’ , if either eαand e’ occurs in α, or ee” and e” e’. An example : email (head, body) head (from, to+, cc*, subject) Definition : An element namee is reachable if r e, where r is the name of the root element. Otherwise element name e is called unreachable or useless.

  15. Global properties (cont.) • Reachability Unreachable element names in DTDs

  16. email (head, body) email head (from, to+, cc*, subject) Global properties (cont.) • Recursions Definition : A content model αis derivable from an element name e, denoted by eα, if either eα, or eα’, e’α”, and α= α’[e’/α”], where α= α’[e’/α”] denotes the content model obtained by substituting α” for all occurrences of e’ in α’. An example :email (head, body) head (from, to+, cc*, subject) Definition : A DTD is recursive if and only if it has an element name e such that e e and e is reachable. (from, to+, cc*, subject, body)

  17. Global properties (cont.) • Recursions Definition : A DTD is linear recursive if and only if it is recursive and for any reachable element name e and any eα, e occurs at most once inαand the occurrence is not enclosed in “*” or “+”. A DTD is said to be non-linear recursive if it is recursive but is not linear recursive. An example of non-linear recursive : directory (dirname, dirinfo?, dirdesc?, (file | directory)*) An example of linear recursive : e (pcdata | e) • Result No linear recursive DTD is found in the sample DTDs. There are 7, 2 and 26 non-linear recursive DTDs in the app, data and meta category respectively.

  18. Global properties (cont.) • Chain of stars An example : entity (name*, contact*, location*, phone*, fax*) location (city*, otherinfo?) There is a chain of 2 stars.

  19. Global properties (cont.) • Chain of stars

  20. Global properties (cont.) • Hubs Definition : Fan-in of an element name e is the cardinality of the set {e’ | e’αand e occurs in α}. An element name with a large fan-in value is called hub. An example :email (head, body) head (from, to+, cc*, subject) from (ε) to (ε) cc (ε) subject (pcdata) body (text, attachment*) text (pcdata) attachment (ε) The fan-in value of email element is 0, and the fan-in value of all other elements in this DTD is 1.

  21. Global properties (cont.) Result : Fan-in of elements in data DTDs Fan-in of elements in meta DTDs

  22. Summary • Local properties • Content model classification • Syntactic complexity • Determinism • Ambiguity • Global properties • Reachability • Recursions • Chain of stars • Hubs • One drawback of this survey • It does not study any properties of attributes

More Related