Dtd 2 0 adding support for co constraints
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

DTD++ 2.0: Adding support for co-constraints PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on
  • Presentation posted in: General

DTD++ 2.0: Adding support for co-constraints. Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna. Two sales pitches here. DTDs aren’t dead yet and should not be Co-constraints are important, and the very next step in validation. The war of schema languages.

Download Presentation

DTD++ 2.0: Adding support for co-constraints

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Dtd 2 0 adding support for co constraints

DTD++ 2.0: Adding support for co-constraints

Davide Fiorello

Nicola Gessa

Paolo Marinelli

Fabio Vitali

University of Bologna


Two sales pitches here

Two sales pitches here

  • DTDs aren’t dead yet and should not be

  • Co-constraints are important, and the very next step in validation

Next: The war of schema languages2/31


The war of schema languages

The war of schema languages

DTD?

XML Schema?

Relax NG?

Schematron?

ISO/IEC 19757 DSDL (especially part9: “Data type- and namespace-aware DTDs”)


My own story

My own story

  • The project NormeInRete (http://www.normeinrete.it): XML-ization of national and regional laws and basically any kind of normative document in Italy

  • Supported by the Italian Office of the Prime Minister, the Ministry of Justice and the department for Informatics in Public Administration. All national laws and regional laws from 3 (soon 7) of the 20 regions are now available in XML and locatable through URNs.

  • Yours truly is the main author of the DTDs and documentation manuals providing guidance for conversion.

  • The document type contains 150+ elements and 50+ attributes, dealing with content, meta-content, evolution in time and space, non-ASCII characters. By the end of the year we will deal with judicial documents.

Next: NormeInRete: DTD or XML Schema?4/31


Normeinrete dtd or xml schema

NormeInRete: DTD or XML Schema?

  • Started in 1999, the first versions of the rules was readied in 2000: necessarily DTD!

  • The syntax is clear, easy to look up and use, well-known by the users and tool implementers.

  • The birth of XML Schema created many discussions on whether to switch:

    • “All my friends use XML Schema”

    • “XML Spy creates very nice drawings of an XML Schema”

    • “XML Schema is the future”

    • “Admit you don’t know the first thing about XML Schema”

  • In truth, there is very little real reason to switch: DTDs are fine for our purposes.

  • So far, the parts are balanced. European integration may provide the necessary pressure.

Next: But…5/31


Dtd 2 0 adding support for co constraints

But…

… is the switch inevitable?

Next: Are DTDs dead?6/31


Are dtds dead

Are DTDs dead?

  • The need for an XML-based syntax

    • For automatic processing and generation

  • The presence of strong competition

    • XML Schema

    • Relax NG

  • The absence of many important features

    Yes, but …

  • DTDs are easier to learn,

  • DTDs are easier to read,

  • DTDs are easier to use

  • Many people still think in terms of DTDs

Next: So: DTD++ 1.0 (Extreme Markup 2003)7/31


So dtd 1 0 extreme markup 2003

So: DTD++ 1.0 (Extreme Markup 2003)

  • The idea: create a DTD-like language that is as powerful as the most powerful validation language: XML schema.

  • Syntax from DTD, structures and concepts from XML Schema:

    • Namespace support

    • Complex types for managing markup structures

    • Simple types for managing constraints on data containers

  • Use as much as possible of DTD syntax, invent as little as possible, recycle concepts with new meanings.

Next: What about XML-based syntax?8/31


What about xml based syntax

What about XML-based syntax?

  • Semantic equivalence to another XML-based schema language means this is no longer a problem.

    Just convert it!

  • All human tasks use the original DTD++ form, All computer task use the corresponding XSD version. Conversion is easy and fast.

Next: A taste of DTD++ (1)9/31


A taste of dtd 1

A taste of DTD++ (1)

  • Anonymous complex types in XSD are content models

    <!ELEMENT X (A?, (B | C)[2-5], D*) >

  • Predefined simple types are predefined keywords

    <!ELEMENT A (#PCDATA)> or <!ELEMENT A (#STRING)>

    <!ELEMENT B (#INTEGER)>

    <!ELEMENT C (#DATE)>

  • Anonymous simple types add facets to predefined simple types. Syntax for facets uses well-known mathematical constructs: for instance {} for lengths and [] for ranges.

    <!ELEMENT D (#INTEGER[,100])>

Next: A taste of DTD++ (2)10/31


A taste of dtd 2

A taste of DTD++ (2)

  • Named types are named entities using different characters to differentiate themselves

    <!ENTITY # myInt “(#INTEGER[0,100])”>

    <!ELEMENT D #myInt; >

    <!ENTITY @ myType “(A?, (B | C)[2-5], D*)” >

    <!ELEMENT X @myType; >

  • Complex types that specify attributes have an additional block of quotes:

    <!ENTITY @ myType “(A?, (B | C)[2-5], D*)”

    “anAttr #STRING{10} #IMPLIED”>

    <!ELEMENT X @myType; >

Next: A taste of DTD++ (3)11/31


A taste of dtd 3

A taste of DTD++ (3)

  • Mixed content models extend the DTD syntax to allow any structure allowable with XSD:

    <!ENTITY @ myType “#PCDATA (A?, (B | C)[2-5], D*)” >

    <!ELEMENT X @myType; >

  • The ANY structure is extended

    <!ELEMENT comment ANY[0,3]{http://www.foo.org}>

  • Target namespaces use the newly introduced TARGETNS structure

    <!TARGETNS “http://www.foo.org”>

    <!TARGETNS ns “http://www.bar.org”>

    <!ELEMENT name (ns:firstname)>

    <!ELEMENT ns:firstname (#PCDATA)>

Next: Limits12/31


Limits

Limits

  • No support (yet) for keys, keyrefs, uniques.

  • No local elements

  • No support for refs

  • Only two design styles supported:

    • Salami slices

    • Garden of Eden.

  • No redefine or include (but no need for them)

Next: Co-constraints and what are they for13/31


Co constraints and what are they for

Co-constraints and what are they for

Better constraints

Real-life constraints

Constraints difficult to formalize


Is dtd 1 0 enough then

Is DTD++ 1.0 enough, then?

  • No, since XML Schema is not enough

  • XML Schema cannot express all the structure and data constraints that document designers may need:

    • Mutual exclusion (“element x may have either the a attribute or the b attribute, but not both”)

    • Deep exclusions (“element x cannot contain, at any level of its subtree, element y”)

    • Structure-dependent structures (“if the item is gratis, i.e., the attribute gratis is present, then no price should be specified, i.e., the element price should be absent”)

    • Data-dependent structures (“if the address is a PO box, then the address must include a PO box number, otherwise it must include a street name and a street number”)

  • These kinds of constraints are known as co-constraints, or co-occurrence constraints. Most real life XML document types have one or more of those constraints.

Next: For example…15/31


For example

For example…

  • XHTML

    • “a elements cannot contain other a elements” (appendix B)

    • Both the normative DTD and the non normative XML Schema cannot express fully this requirement (they only express a weaker form: “a elements cannot directly contain other a elements”)

  • XSLT

    • “In a template element at least one of the match and name attributes must be present”

    • Again, the DTD and XML schema cannot express this requirement, and specify both attributes as optional.

  • XML Schema itself

    • “An element definition must either contain a ref or a name attribute, but not both. Furthermore, if the name attribute is present, then the type attribute or one of the simpleType or complexType elements must be present, but not two.”

    • The normative XML schema can only specify all these elements and attributes as optional.

  • … and plenty more…

Next: Who cares?16/31


Who cares

?

?

?

XMLdoc

rules

rules

DOM

tree

downstream

application

DOM

parser

Not

well-formed

Schema

validator

DOM

Tree +

PSVI

invalid

Who cares?

  • Documents that contain violations to these rules are still considered valid by the XML schema validator.

  • Three solutions:

    • Hope for the best (“It won’t happen”) - subject to Murphy’s Law

    • Provide a default behavior (“If both attributes are present, consider the first only”)

    • Provide validation code within the downstream application

Next: SchemaPath and DTD++ 2.017/31


Schemapath and dtd 2 0

SchemaPath and DTD++ 2.0

  • At the WWW2004 conference, we presented SchemaPath, our proposal to minimally extend XML Schema to handle co-constraints.

  • The idea is to find a way to conditionally assign types to elements and attributes. Furthermore, a non-satisfiable type is added for specifying error conditions to avoid.

  • SchemaPath maintains the XML Schema syntax, adds only ONE construct and ONE pre-defined simple type, maintains important XML Schema properties (the validation theorem and round-tripping and reverse round-tripping properties), and does not impact the PSVI for valid documents.

  • DTD++ 2.0 is the DTD-like syntax for Schematron

Next: DTD++ 2.018/31


Dtd 2 0

DTD++ 2.0

  • Conditional assignment of types

    • Multiple definitions of the same element, each conditioned by an XPath expression. Implicit and explicit priorities are used.

    • Each condition is tested on the instance element, and the one that holds with the highest priority is selected.

    • The type specified by the selected definition is assigned to the element.

    • This is NOT a way to provide conditional types: types are just plain old DTD++ 1.0 (XML Schema) types.

  • The #ERROR simple type

    • When we want to specify the non-validity of a condition, we assign the element the #ERROR type.

    • The #ERROR type is a non-satisfiable type, whose presence in the instance document always and automatically signals a validation error.

Next: Examples19/31


Examples

Examples

  • Mutual exclusion

    • “Element x may have either the a attribute or the b attribute but not both”. Suppose we have defined a type myType with both a and b attributes as optional

      <xsd:element name=“x”><xsd:alt cond=“(@a and @b)” type=“xsd:error”/><xsd:alt type=“myType”/>

      </xsd:element>

      <!ELEMENT x “(@a and @b)” #ERROR>

      <!ELEMENT x “” @myType;>

  • Data-dependent structures

    • “The element quantity must be an integer if the unit element is ‘items’, and it must be a decimal value if the unit element is ‘meters’”. Suppose we have already defined the data type for the unit element to only contain the values “meters” or “items”.

      <xsd:element name=“quantity”><xsd:alt cond=“../unit=‘items’” type=“xsd:integer”/><xsd:alt cond=“../unit=‘meters’” type=“xsd:decimal”/>

      </xsd:element>

Next: One possible solution to the W3C problems (1)20/31


One possible solution to the w3c problems 1

One possible solution to the W3C problems (1)

  • XHTML

    • “a elements cannot contain other a elements” (appendix B)

      <!ELEMENT A “.//a” (#ERROR)>

      <!ELEMENT A “” (@inlineType;)>

  • XSLT

    • “In a template element at least one of the match and name attributes must be present”

      <!ELEMENT template "not(@match) and not(@name)" (#ERROR) >

      <!ELEMENT template "" (@templateType;) >

      <!ENTITY @ templateType "%templateContent;"

      "match (#patternType;) name(#NCName;)">

Next: One possible solution to the W3C problems (2)21/31


One possible solution to the w3c problems 2

One possible solution to the W3C problems (2)

  • XML Schema

    • “An element definition must either contain a ref or a name attribute, but not both. Furthermore, if the name attribute is present, then the type attribute or one of the simpleType or complexType elements must be present, but not two.”

      <!ELEMENT simpleType (@localSimpleType;)>

      <!ELEMENT complexType (@localComplexType;)>

      <!ENTITY @ element "(simpleType|complexType)"

      "name (#NCName;) #IMPLIED

      ref (#QName;) #IMPLIED

      type (#QName;) #IMPLIED">

      <!ELEMENT element "@name and @ref":4 (#ERROR)>

      <!ELEMENT element "(@type or @ref) and (xsd:simpleType or xsd:complexType)":3 (#ERROR)>

      <!ELEMENT element "../xsd:schema and @ref":2 (#ERROR)>

      <!ELEMENT element "not(@ref) and not(@name)":1 (#ERROR)>

      <!ELEMENT element "":0 (@element;)>

Next: The “Trojan Milestones” requirements22/31


The trojan milestones requirements

The “Trojan Milestones” requirements

“1.the element must be empty exactly when its sID or eID attribute is set.

2.when eID is present, no other attributes are permitted.

3.each sID/eID value should occur only twice (once on sID and once on eID)

4.empty elements with matching sID and eID values should match up in proper pairs and in order.

Note that because of the second rule above, no attributes may be required for milestoneable elements.

Schema languages that can make attributes optional or required depending on the presence of other attributes (in this case eID) do not suffer this problem.”

[DeRose, Extreme Markup 2004]

Next: A DTD++ 2.0 solution to the Trojan Milestones requirements23/31


A dtd 2 0 solution to the trojan milestones requirements

A DTD++ 2.0 solution to the Trojan Milestones requirements

<!ENTITY @ startMarker “EMPTY”

“sID ID #REQUIRED %regularAtts;”>

<!ENTITY @ endMarker “EMPTY”

“eID IDREF #REQUIRED”>

<!ELEMENT X “”:0 %regularCM; >

<!ATTLIST X “”:0 %regularAtts;>

<!ELEMENT X [email protected]:2 @startMarker;>

<!ELEMENT X [email protected] = preceding::[email protected]:3 #ERROR>

<!ELEMENT X [email protected]=preceding::[email protected]:4 @endMarker;>

<!ELEMENT X [email protected] = preceding::[email protected]:3 #ERROR>

<!ELEMENT X [email protected]:2 #ERROR>

Next: Implementation of the DTD++2.0 parser24/31


Implementation of the dtd 2 0 parser

Implementation of the DTD++2.0 parser

  • A DTD++ 2.0 validator exists and can be tested online at http://tesi.fabio.web.cs.unibo.it/dpp

  • It is a Java application and a plain XML Schema validating engine (tested with Xalan and MS XML parsers)

  • The application is a pre-processor to any XML Schema validator, and, given an XML document X and a DTD++ document D,

    • it converts D into (one or more) equivalent Schemapath file SP

    • It converts SP into a plain XML Schema file XS

    • It converts X into a different XML file X’, so that

    • XS validates X’ if and only if SP validates X and thus if and only if D validates X

Next: … but who cares for DTD anyway?25/31


But who cares for dtd anyway

… but who cares for DTD anyway?

This part is not in the published paper

  • On July 21st, 2004 we did a test on the relative speed and precision of DTD++ and XML schema

  • 14 volunteers (10M, 4F) were summoned, all 3rd and 4th year computer science students, versed in both DTD and XML schema (they all had passed with good marks bot the Web Technologies exam and specifically the questions on DTDs and XML schema)

  • The volunteers were divided in two groups and given 15 questions. Half had to solve them using XML schema, half using DTD++.

Next: The test26/31


The test

The test

  • The 15 questions were identical in both tests, and regarded:

    • Write XML: applying the rules from a schema and write valid XML fragments (5 questions)

    • Validate XML: applying the rules from a schema and find errors in XML fragments (5 questions)

    • Write Schemas: write a fragment of schema given a plain text description of the problem (5 questions)

Next: A sample question27/31


A sample question

A sample question

  • Verify whether the fragment:

    <order><to id=”125”>John Smith</to><lines><line><art>130</art><description>Some nice stuff</description><col>Red</col><price>0,65</price><quant>130</quant></line></lines></order>

    is valid with respect the following DTD++ fragment:<!ELEMENT order (to, lines) ><!ELEMENT to (#STRING)><!ATTLIST to idID#REQUIRED><!ELEMENT lines (line+) > <!ELEMENT line (art, col, price, quant)><!ELEMENT art (#PCDATA{,20}) ><!ENTITY # colors (“red | blue | green | yellow)” > <!ELEMENT col (#colors;) ><!ELEMENT quant (#INTEGER]0,]) ><!ELEMENT price (#DECIMAL]0,]) >

Next: The results28/31


The results

The results

  • DTD++ resulted a clear winner in all categories

    • 36% faster on group A (Write XML)

    • 53% faster on group B (Validate XML)

    • Twice as fast (99%) on group C (Write Schemas)

    • The question on the previous slide was answered on the average in 0:01:33 with DTD++, and in 0:03:03 average with XML Schema.

    • Errors are slightly more with DTD++ than XML schema (123%), but this might be due to the fact that the language was brand new.

  • Of course the volunteers are very few, and the test might be considered non-significant, but it gives at least an initial approximate measure of the relative value of the two languages.

  • An interesting note is that one of the volunteer converted the XML Schema into DTD fragments with textual annotations before answering each question.

Next: Demo29/31


Dtd 2 0 adding support for co constraints

Demo

  • A demo of the validating engine and the full result of the tests can be found at

    http://tesi.fabio.web.cs.unibo.it/dpp

  • Time for a demo?

Next: Conclusions30/31


Conclusions

Conclusions

  • DTDs are faster to learn and use

  • XML Schema are powerful and expressive

  • Schematron-like co-constraints are even more expressive

  • Why learning three languages?

    • DTD++ 1.0 is semantically equivalent to a relevant subset of XML schema

    • SchemaPath provides co-constraints with a very limited syntax and the new idea of conditional assignment of types (rather than conditional typing)

    • DTD++ 2.0 uses the same principle with a DTD-like syntax

  • What now? Maybe ISO/IEC 19757 - DSDL:

    Part 5Data types

    Part 9Data type- and namespace-aware DTDs

Fine presentazione31/31


  • Login