1 / 86

XML Schema Languages

XML Schema Languages. Dongwon Lee, Ph.D. The Pennsylvania State University IST 516 / Fall 2010. http://www.practicingsafetechs.com/TechsV1/XMLSchemas/. In the Last Class. Today. DUE: Proj #1 Planning Doc Team Based Assignment: Lab #1 Individual. Course Objectives.

khayden
Download Presentation

XML Schema Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML Schema Languages Dongwon Lee, Ph.D. The Pennsylvania State University IST 516 / Fall 2010 http://www.practicingsafetechs.com/TechsV1/XMLSchemas/

  2. In the Last Class

  3. Today • DUE: Proj #1 Planning Doc • Team Based • Assignment: Lab #1 • Individual

  4. Course Objectives • Understand the purpose of using schemas • Study regular expression as the framework to formalize schema languages • Understand details of DTD and XML Schema • Learn the concept of schema validation

  5. Outline • Schema Language in general • Content Model • DTD • XML Schema (XSD) • Schema Validation

  6. Motivation • The company “Nittany Vacations, LLC” wants to standardize all internal documents using the XML-based format  nvML • Gather requirements from all employees • Informal description (ie, narrative) of how nvML should look like • Q • How to describe formally and unambiguously? • How to validate an nvML document?

  7. Motivation: Schema = Rules • XML Schemas is all about expressing rules: • Rules about what data is allowed • Rules about how the data must be organized • Rules about the relationships between data

  8. Motivation: sep-9.xml <Vacation date=“2010-09-09” guide-by=“Lee”> <Trip segment="1" mode="air"> <Transportation>airplane<Transportation> </Trip> <Trip segment="2" mode="water"> <Transportation>boat</Transportation> </Trip> <Trip segment="3" mode="ground"> <Transportation>car</Transportation> </Trip> </Vacation> Example modified from Roger L. Costello’s slides @ xfront.com

  9. Motivation: Validate <Vacation date=“2010-09-09” guide-by=“Lee”> <Segment id="1" mode="air"> <Transportation>airplane</Transportation> </Segment> <Segment id="2" mode="water"> <Transportation>boat</Transportation> </Segment> <Segment id="3" mode="ground"> <Transportation>car</Transportation> </Segment> </Vacation> Validate the XML document against the XML Schema XML Schema = RULES nvML.dtd or nvML.xsd Rule 1: A vacation has segments. Rule 2: Each segment is uniquely identified. Rule 3: There are three modes of transportation: air, water, gound. Rule 4: Each segment has a mode of transportation. Rule 5: Each segment must identify the specific mode used.

  10. Schema Languages • Schema: a formal description of structures / constraints • Eg, relational schema describes tables, attributes, keys, .. • Schema Language: a formal language to describe schemas • Eg, SQL DDL for relational model CREATE TABLE employees ( id INTEGER PRIMARY KEY, first_name CHAR(50) NULL, last_name CHAR(75) NOT NULL, dateofbirth DATE NULL );

  11. Rules in Formal Schema Lang. • Why bother formalizing the syntax with a schema? • A formal definition provides a precise but human-readable reference • Schema processing can be done with existing implementations • One’s own tools for own language can benefit: by piping input documents through a schema processor, one can assume that the input is valid and defaults have been inserted

  12. Schema Processing http://www.brics.dk/~amoeller/XML/schemas/schemas.html

  13. Requirements for Schema Lang. • Expressiveness • Efficiency • Comprehensibility

  14. Regular Expressions (RE) • Commonly used to describe sequences of characters or elements in schema languages • RE to capture content models • Σ: a finite Alphabet • α in Σ: set only containing the character α • α ?: matches zero or one α • α *: matches zero or more α’s • α +: matches one ore more α’s • α β: matches the concatenation of α and β • α | β: matches the union of α and β

  15. RE Examples • a|b* denotes {ε, a, b, bb, bbb, ...} • (a|b)* denotes the set of all strings with no symbols other than a and b, including the empty string: {ε, a, b, aa, ab, ba, bb, aaa, ...} • ab*(c|ε) denotes the set of strings starting with a, then zero or more bs and finally optionally a c: {a, ac, ab, abc, abb, abbc, ...}

  16. RE Examples • Valid integers: • 0 | -? (1|2|3|4|5|6|7|8|9) (1|2|3|4|5|6|7|8|9) * • Valid contents of table element in XHTML: • caption ? (col * | colgroups *) thead ? tfoot ? (tbody * | tr *)

  17. Which Schema Language? • Many proposals competing for acceptance • W3C Proposals: DTD, XML Data, DCD, DDML, SOX, XML-Schema, … • Non-W3C Proposals: Assertion Grammars, Schematron, DSD, TREX, RELAX, XDuce, RELAX-NG, … • Different applications have different needs from a schema language

  18. By Rick Jelliffe

  19. Expressive Power (content model) DTD XML-Schema XDuce, RELAX-NG “Taxonomy of XML Schema Languages using Formal Language Theory”, Makoto Murata, Dongwon Lee, Murali Mani, Kohsuke Kawaguchi, In ACM Trans. on Internet Technology (TOIT), Vol. 5, No. 4, page 1-45, November 2005

  20. Closure (content model) DTD XML-Schema XDuce, RELAX-NG Closed under INTERSECT Closed under INTERSECT, UNION, DIFFERENCE

  21. DTD: Document Type Definition • XML DTD is a subset of SGML DTD • XML DTD is the standard XML Schema Language of the past (and present maybe…) • It is one of the simplest and least expressive schema languages proposed for XML model • It does not use XML tag notation, but use its own weird notation • It cannot express relatively complex constraint (eg, key with scope) well • It is being replaced by XML-Schema of W3C and RELAX-NG of OASIS

  22. DTD: Elements • <!ELEMENT element-namecontent-model> • Associates a content model to all elements of the given name content models • EMPTY: no content is allowed • ANY: any content is allowed • Mixed content: (#PCDATA | e1 | … | en)* • arbitrary sequence of character data and listed elements

  23. DTD: Elements • Eg: “Name” element consists of an • optional FirstName, followed by • mandatory LastName elements, where • Both are text string <!ELEMENT Name (FirstName? , LastName) <!ELEMENT FirstName (#PCDATA)> <!ELEMENT LastName (#PCDATA)

  24. DTD: Attributes • <!ATTLIST element-nameattr-nameattr-typeattr-default ...> • Declares which attributes are allowed or required in which elements attribute types: • CDATA: any value is allowed (the default) • (value|...): enumeration of allowed values • ID, IDREF, IDREFS: ID attribute values must be unique (contain "element identity"), IDREF attribute values must match some ID (reference to an element) • ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION: consider them obsolete…

  25. DTD: Attributes • Attribute defaults: • #REQUIRED: the attribute must be explicitly provided • #IMPLIED: attribute is optional, no default provided • "value": if not explicitly provided, this value inserted by default • #FIXED "value": as above, but only this value is allowed

  26. DTD: Attributes • Eg: “Name” element consists of an • optional FirstName, followed by • mandatory LastName attributes, where • Both are text string <!ELEMENT Name (EMPTY)> <!ATTLIST Name FirstName CDATA #IMPLIED LastName CDATA#REQUIRED>

  27. DTD: Attributes • ID vs. IDREF/IDREFS • ID: document-wide unique ID (like key in DB) • IDREF: referring attribute (like foreign key in DB) <!ELEMENT employee (…)> <!ATTLIST employee eID ID #REQUIRED boss IDREF #IMPLIED> … <employee eID=“a”>…</>…. <employee eID=“b” boss=“a”>…</>

  28. <?xml version="1.0"?> <!DOCTYPE event SYSTEM “../../dir/event.dtd” <event eID=“sigmod02”> <acronym>SIGMOD</acronym> <society>ACM</society> <url>www.sigmod02.org</url> <loc> <city>Madison</city> <state>WI</state> </loc> <year>2002</year> </event> DTD Example XML document that conforms to “event.dtd”

  29. DTD: Example // event.dtd <!ELEMENT event (acronym, society*,url?, loc, year)> <!ATTLIST event eID ID #REQUIRED> <!ELEMENT acronym (#PCDATA)> <!ELEMENT society (#PCDATA)> <!ELEMENT url (#PCDATA)> <!ELEMENT loc (city, state)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT year (#PCDATA)>

  30. DTD: Type Declaration • Associate a DTD schema with an XML document • At the beginning lines of an XML document // DTD File: event.dtd <!ELEMENT …. > // XMLFile: event.xml <?xml version="1.0"?> <!DOCTYPE event SYSTEM “http://foo.com/event.dtd">

  31. Exercise: Citation Authors Papers Publish *AID Name Title *PID Area Venues Attend Appear *Vname Year City 31

  32. Exercise: Citation To RDBMS: Authors(*AID, Name, Title) Papers(*PID, Area, *Vname) Venues(*Vname, Year, City) Publish(*AID, *PID) Attend(*AID, *Pname) Appear(*PID, *Vname) 32

  33. Exercise: Citation Relational  XML: Create a dummy root element Make entities as 1st-level children Make columns of entities as attributes of those Relationship as attributes No violation of 1st normal form for many-many relationship like RDBMS One => IDREF, Many => IDREFS 33

  34. Exercise: Citation “Lee” publishes two papers “p1” and “p2” which appear in venues “X” and “Y” in 2006, respectively, and attend only “Y”. “p2” is co-authored by “John” who attends “X”. <Author AID=‘1’ Name=‘Lee’ Title=‘Prof.’/> <Author AID=‘2’ Name=‘John’ Title=‘Prof.’/> <Paper PID=‘p1’ Area=‘DB’/> <Paper PID=‘p2’ Area=‘DB’/> <Venue Vname=‘X’ Year=‘2006’ … /> <Venue Vname=‘Y’ Year=‘2006’ … /> 34

  35. Exercise: Citation “Lee” publishes two papers “p1” and “p2” which appear in venues “X” and “Y” in 2006, respectively, and attend only “Y”. “p2” is co-authored by “John” who attends “X”. <Author AID=‘1’ Name=‘Lee’ Title=‘Prof.’/> <Author AID=‘2’ Name=‘John’ Title=‘Prof.’/> <Paper PID=‘p1’ Area=‘DB’ Vname=‘X’ /> <Paper PID=‘p2’ Area=‘DB’ Vname=‘Y’ /> <Venue Vname=‘X’ Year=‘2006’ … /> <Venue Vname=‘Y’ Year=‘2006’ … /> 35

  36. Exercise: Citation “Lee” publishes two papers “p1” and “p2” which appear in venues “X” and “Y” in 2006, respectively, and attend only “Y”. “p2” is co-authored by “John” who attends “X”. <Author AID=‘1’ Name=‘Lee’ Title=‘Prof.’ Publish=‘p1 p2’ Attend=‘Y’ /> <Author AID=‘2’ Name=‘John’ Title=‘Prof.’ Publish=‘p2’ Attend=‘X’ /> <Paper PID=‘p1’ Area=‘DB’ Vname=‘X’ /> <Paper PID=‘p2’ Area=‘DB’ Vname=‘Y’ /> <Venue Vname=‘X’ Year=‘2006’ … /> <Venue Vname=‘Y’ Year=‘2006’ … /> 36

  37. Exercise: Citation <Dummy> <Author AID=‘1’ Name=‘Lee’ Title=‘Prof.’ Publish=‘p1 p2’ Attend=‘Y’ /> <Author AID=‘2’ Name=‘John’ Title=‘Prof.’ Publish=‘p2’ Attend=‘X’ /> <Paper PID=‘p1’ Area=‘DB’ Vname=‘X’ /> <Paper PID=‘p2’ Area=‘DB’ Vname=‘Y’ /> <Venue Vname=‘X’ Year=‘2006’ … /> <Venue Vname=‘Y’ Year=‘2006’ … /> </Dummy> 37

  38. Exercise: Citation <!ELEMENT Dummy (Author*|Paper*|Venue*)> <!ELEMENT Author EMPTY> <!ATTLIST Author AID ID #REQUIRED Name CDATA #IMPLIED Title CDATA #IMPLIED PublishIDREFS#IMPLIED AttendIDREFS#IMPLIED> <!ELEMENT Paper EMPTY> <!ATTLIST Paper PID ID #REQUIRED Area CDATA #IMPLIED Title CDATA #IMPLIED VnameIDREF#REQUIRED> <!ELEMENT Venue EMPTY> <!ATTLIST Venue Vname ID #REQUIRED Year CDATA #IMPLIED City CDATA #IMPLIED> 38

  39. XML Schema • New XML schema language from W3C • Successor of DTD • Unlike DTD, XML Schema is in XML syntax • http://www.w3.org/XML/Schema <xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType>

  40. XML Schema vs. DTD: What’s New • XML Schemas are extensible to future additions • XML Schema V 1.0  1.1  … • XML Schemas are richer and more powerful than DTDs • XML Schemas are written in XML • No <!ELEMENT …> or <!ATTLIST ..> notation • XML Schemas support data types • XML Schemas support namespaces

  41. New: Data Types • XML Schema support data types. Easier to: • Describe allowable document content • Validate the correctness of data • Work with data from a database • Define data facets (restrictions on data) • Define data patterns (data formats) • Convert data between different data types • Eg, <date type="date">2010-09-11</date> • Ensures a mutual understanding of the content • The XML data type "date" requires the format “YYYY-MM-DD”

  42. New: in XML Notation • XML Schema uses XML notation • <> and </> • XML Schema file itself IS an XML file, too • No need to learn a new language • No need to use new tools • Use an XML editor to edit XML Schema files • Use XML parser to parse XML Schema files • Manipulate an XML Schema using DOM • Transform an XML Schema with XSLT

  43. New: Extensibility • XML Schema is extensible because XML is extensible • XML Schema lets you: • Reuse your schema in other schemas • Create your own data types derived from the standard types  Inheritance • Reference multiple schemas in the same document

  44. Well-Formed: Not Enough • Well-Formed: a document conforms to XML syntax rules such as: • Begin with XML decl. • One unique root • Case-sensitive • Matching Start / End tags • Properly nested • Well-formed documents can still contain semantic errors or inconsistencies •  Need VALID documents according to schema

  45. note.xml <?xml version="1.0"?> // Reference to schema goes here <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

  46. note.dtd <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>

  47. note.xml with Reference to DTD • <?xml version="1.0"?> • <!DOCTYPE note SYSTEM"http://www.w3schools.com/dtd/note.dtd"> • <note> • <to>Tove</to> • <from>Jani</from> • <heading>Reminder</heading> • <body>Don't forget me this weekend!</body> • </note>

  48. note.xsd <?xml version="1.0"?> <xs:schema xmlns:xs= “http://www.w3.org/2001/XMLSchema” targetNamespace= “http://www.w3schools.com” xmlns= “http://www.w3schools.com” elementFormDefault= "qualified"> <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

  49. <schema> element <?xml version="1.0"?> <xs:schema xmlns:xs = “http://www.w3.org/2001/XMLSchema” targetNamespace = “http://www.w3schools.com” xmlns = “http://www.w3schools.com” elementFormDefault= "qualified"> . . . </xs:schema> • <schema> element is the root element of every XML Schema

  50. <schema> element <?xml version="1.0"?> <xs:schema xmlns:xs = “http://www.w3.org/2001/XMLSchema” targetNamespace = “http://www.w3schools.com” xmlns = “http://www.w3schools.com” elementFormDefault= "qualified"> . . . </xs:schema> • Elements & data types in this schema file come from http://www.w3.org/2001/XMLSchemanamespace • They are to be prefixed with “xs:”

More Related