1 / 44

XML Lecture 2

XML Lecture 2. Specifying the structure of an XML file: DTD & XML Schema Monica Farrow email : M.Farrow@hw.ac.uk. Topics – specifying the structure. Problem – how to specify the structure Using a Document Type Definition Using XML Schema Simple nested example Use of namespaces

cole-lynn
Download Presentation

XML Lecture 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML Lecture 2 Specifying the structure of an XML file: DTD & XML Schema Monica Farrow email : M.Farrow@hw.ac.uk DTD & XML Schema

  2. Topics – specifying the structure Problem – how to specify the structure Using a Document Type Definition Using XML Schema Simple nested example Use of namespaces Extra constraints using simple types Named simple and complex types Specifying links between elements using keys and keyrefs 08/03/11 DTD & XML Schema 2

  3. Example: An Address Book <?xml version ="1.0" encoding="UTF-8"?> <people> } top-level element <person ssn = “4444”> } element containing sub-elements and an attribute <title> Mr </title> } optional element <name> Homer Simpson </name> } required element <tel> 2543 </tel> } at least one tel number <tel> 2544 </tel> <email> homer@math.springfield.edu </email> } up to 2 </person> emails, optional . . . . . } any number of person elements </people> DTD & XML Schema

  4. Reminder of XML syntax • An XML file consists of elements • Each element has a tagname, and is surrounded by start and end tags. Tagnames should be chosen to describe the data. • E.g. <name>Lisa Simpson</name> • Elements can contain data or subelements. • A person element, for example, contains title, name, tel and email sub-elements • Elements can contain attributes, which provide additional information • E.g. <person ssn= “123 4589”> DTD & XML Schema

  5. Defining the structure of an XML file • We can check if an XML file is well-formed (i.e. correct use of XML syntax) • By looking at it, maybe • By loading it into a browser • If well-formed, it will be displayed • However, how can we check that the well-formed file contains the correct elements in the correct quantities? • We need to write a specification for the XML file DTD & XML Schema

  6. Defining the structure of an XML file • There are 2 main alternatives • Document Type Definitions (DTDs) • Original and relatively simple • XML Schema • More versatile and complex • We will look at both • Concentrating on XML Schema DTD & XML Schema

  7. DTD - Specifying the Structure • In a Document Type Definition, we can specify the permitted content for each element, using regular expressions. • A regular expression describes the ‘pattern’ of a string in a concise and flexible way. • Regular expressions are very powerful. This module just uses simple examples. DTD & XML Schema

  8. What’s in a person Element? • For a person element, the regular expression is • name, title?, tel+, email* • This means • name= there must be a name element • title? = there is an optional title element (i.e., 0 or 1 title elements) • name, title?= the name element is followed by an optional title element • tel+ = there are 1 or more telelements • email*= there are 0 or more email elements DTD & XML Schema

  9. The DTD for the address file • The DTD shown on the next slide specifies that: • The top level element is called people • The people element consists of 1 or more personelements • The person element consists of name, title, tel and email elements, with the constraints mentioned in the previous slide • The name, title, tel and email elements each consist of character data • The person element has a required attribute consisting of character data, called ssn DTD & XML Schema

  10. DTD For the Address Book <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE people [ <!ELEMENT people (person+)> <!ELEMENT person (name, title?, tel+, email*)> <!ELEMENT name (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT tel (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ATTLIST person ssn CDATA REQUIRED> ]> PCDATA means parsed character data DTD & XML Schema

  11. DTD Problems • DTDs are rather weak specifications by DB & programming-language standards • Some limitations: • Only one base type – PCDATA • Also no easy way of specifying constraints, e.g range of values, frequency of occurrence • Not easily parsed (since they are not XML) • DTDs are now being superceded by XML schema. DTD & XML Schema

  12. XML Schema • XML Schema are more precise and therefore more complicated than DTDs • They were designed to replace DTDs but DTDs are very well established, and simpler • http://www.w3schools.com/schema DTD & XML Schema

  13. XML Schema features • XML schemas provide the following features • They are written using XML Syntax • So they can be parsed and validated with standard XML tools • XML Schema specific tags are used, such as xs:elementto describe an element in the document. • xs: is a prefix is associated with a namespace (explained soon) • Data types other than #PCDATA • There are some basic built-in simple types such as xs:string, xs:decimal, xs:integer, xs:ID • You can also define your own types. DTD & XML Schema

  14. XML Schema features contd • There is greater control over the permitted constructs • Maximum and minimum occurrences of each element can be easily specified. • The default value of minOccurs and maxOccurs is 1 • A set of permitted values can be specified • Regular expressions can be used to set patterns to be matched • Modularity and inheritance is supported • Types can be named and referred to DTD & XML Schema

  15. Example of specification for one element • The title element is specified like this:<xs:element name = "title" type = "xs:string" minOccurs="0" /> • The xs:element tag is used to specify elements. • The tagname is specified using the name attribute The type is specified using the type attribute and the built-in XML schema type for strings xs:string • The maximum occurrence is not specified, so the default value 1 will be used. • xs:element is an empty element (i.e. no data or sub-elements), so is closed using the backslash instead of an end tag. DTD & XML Schema

  16. Complex elements • Elements can be simple, as in the title element shown on the previous slide • Elements can be more complex, such as the person element shown on the next slide. • A person element has a complex type. It consists of • a sequence of other elements (title, name, tel and email) • A required attribute ssn. Separate slide on attributes. DTD & XML Schema

  17. person element example Details of the person element <xs:element name="person" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name = "title" type = "xs:string" minOccurs="0" /> <xs:element name ="name" type="xs:string"/> <xs:element name = "tel" type="xs:string" /> <xs:element name = "email" type="xs:string" minOccurs="0" maxOccurs=“2"/> </xs:sequence> <xs:attribute name= "ssn" type="xs:positiveInteger" use="required"/> </xs:complexType> </xs:element> DTD & XML Schema

  18. More on attribute specification • For optional attributes • omit use = “required” • Attributes can have a default value • see http://www.w3schools.com/schema • There are special xsd types for id and idref attributes • xs:ID, xs:IDREF • However, using xs:key and xs:keyref is now considered better – discussed at end. DTD & XML Schema

  19. Complete XML Schema for the addresses • The complete XML Schema for our XML file of addresses is supplied in a separate file, and the outer section is shown on the next slide. • The first 2 lines are standard. • The top-level element is called people. It is a complex type consisting of a sequence of personelements • There will be at least one person element (default value of minOccurs), and there is no upper limit to the number of person elements DTD & XML Schema

  20. Simple Schema Example – see addresses1.xsd <?xml version="1.0" ?> <xs:schema xmlns:xs= "http://www.w3.org/2001/XMLSchema"> <xs:element name="people"> <xs:complexType> <xs:sequence> <xs:element name="person" maxOccurs = "unbounded"> details of the person element should be here </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> DTD & XML Schema

  21. Referring to a schema • Save your schema in a file with the extension xsd. • Linking schema definition with our own XML document is done using a special attribute xmlns:xsi of the root node of our XML document: <people xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation=“addresses1.xsd"> • In this example • The specification file, addresses1.xsd, is in the same directory as our xml file • The attribute xsi:noNamespaceSchemaLocation is used because our xml file does not have its own namespace DTD & XML Schema

  22. The xml file <?xml version ="1.0" encoding="UTF-8"?> <people xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="addresses1.xsd"> <person ssn = “4444”> <title> Mr </title> <name> Homer Simpson </name> <tel> 2543 </tel> <tel> 2544 </tel> <email> homer@math.springfield.edu </email> </person> . . . . . any number of person elements </people> 08/03/11 DTD & XML Schema 22

  23. Exercise 1 • Write a sample data xml file for the schema on the following slide. • Don’t worry about the first 2 standard lines • Include 2 owner elements and demonstrate variations in occurrence. DTD & XML Schema

  24. <xs:elementname="dogs"> <xs:complexType> <xs:sequence> <xs:elementmaxOccurs="unbounded"name="owner"> <xs:complexType> <xs:sequence> <xs:elementname="ownername"type="xs:string" /> <xs:elementname="ownertel"type="xs:string" /> <xs:elementmaxOccurs="unbounded"name="dog"> <xs:complexType> <xs:sequence> <xs:elementname="breed"type="xs:string" /> <xs:elementname="dogname"type="xs:string" /> <xs:elementminOccurs= “0" name="age" type="xs:nonNegativeInteger" /> <xs:elementmaxOccurs= "2"name="colour" type="xs:string" /> </xs:sequence> <xs:attributename="id" type="xs:unsignedByte"use="required" /> </xs:complexType> .....closing tags omitted (no space!) DTD & XML Schema

  25. Tools • Tools to generate basic XML Schema file. For simplest clearest files, use: • Visual Studio (in Windows labs) is best • Online http://www.freeformatter.com/xsd-generator.html • Beware of others – see next slide! • XML aware text editor, to edit the basic schema to include simple types and alter max/min occurrences and data types if necessary. • Validators, as referred to in later slide DTD & XML Schema

  26. Using Tools - warning • BEWARE that many tools automatically create schemas from your xml file but.... • They may insert extra unnecessary or incorrect details. E.g. • msdata Ordinal type – not used in this course • xs:choice, implying a choice rather than a sequence • Enumerated lists assuming that your data will never change • Complex types where it is not necessary DTD & XML Schema

  27. Validating • Validator • http://www.utilities-online.info/xsdvalidation/#.UQcFuWfG18E • insert xsd and xml file • Validate separately and against each other • http://www.corefiling.com/opensource/schemaValidate.html • Upload xsd and xml file • Others also on the web DTD & XML Schema

  28. Namespaces • You’ll see namespaces when using XML schemas and stylesheets. • There is a namespace associated with the tags used that lets them be used unambiguously. • e.g. a schema element or a chemical element? • An html table or furniture in a room? • A namespace is identified by • A short prefix e.g. xs (as used in xsd files) • A unique URL DTD & XML Schema

  29. Namespace declaration • So at the start of a document we must specify what namespaces we are using. • In the schema example, we are using the XML schema namespace which we identify with the xs prefix • We declare this namespace in an attribute in the top-level element<xs:schema xmlns:xs= "http://www.w3.org/2001/XMLSchema"> DTD & XML Schema

  30. Namespace declaration continued • We then use the xs prefix in all the XML Schema elements and built-in types e.g. complexType, sequence, element, string etc • You may see alternative prefixes used e.g. xsd • We could use a namespace for our own XML file, and this is recommended if it will be widely used. • Use your company/project URL, and invent a suitable prefix. • Or use multiple namespaces, your own and/or those defined by others. • It is not necessary in this course. DTD & XML Schema

  31. Example of using own namespace • In xmlschema • <xs:schema xmlns:xs= "http://www.w3.org/2001/XMLSchema" • xmlns = "http://example.org/pp" • targetNamespace= "http://example.org/pp" • elementFormDefault="qualified" attributeFormDefault="qualified"> • <xs:element name="people"> • ...etc • In xml file • <?xml version ="1.0" encoding="UTF-8"?> • <pp:people • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xmlns = "http://example.org/pp" • xsi:schemaLocation "http://example.org/pp addresses1.xsd"> • <pp:person ssn = “4444”> • <pp:title> Mr </pp:title> • ...etc 08/03/11 DTD & XML Schema 31

  32. Constraints on data 1 • You can also constrain the values of the data in • a range • <xs:minInclusive value="0"/> <xs:maxInclusive value="120"/> • a length (also minLength, maxLength) • <length value="3" /> • a pattern • <xs:pattern value="([a-z])*"/> • Means 0 or more lowercase alphabetic chars DTD & XML Schema

  33. Constraints on data 2 • an enumerated list • <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> • These constraints should be applied to a basic type, as shown on the next slide. • See http://www.w3.org/TR/xmlschema-2/ for which constraints can be applied to which basic types DTD & XML Schema

  34. Declaring your own types – see addresses2.xsd • Named types can be used for elements or attributes. Here’s an example which specifies constraints on the attribute • A named type is declared <xs:simpleType name = "ssstype"> <xs:restriction base="xs:positiveInteger"> <xs:maxInclusive value="999999"/> </xs:restriction> </xs:simpleType> • And then used as the attribute type <xs:attribute name= "ssn" type="ssstype" use="required"/> DTD & XML Schema

  35. Exercises 2 • Alter the schema given in the lecture notes by creating a simple type for the tel element. Specify that there must be between 1 and 4 telephone numbers which must be in the range 1000 – 9999 DTD & XML Schema

  36. Homes with shared contact details • <homelist > • <homes> • <home id = "1" > • <hname>Rose Cottage</hname> • <location>Inverness</location> • <cID>C1</cID> • <cID>C2</cID> • </home> • . . . More homes • </homes> • <contacts> • <contactdetails cID="C1“ > • <cname>John Smith</cname> • <phone>0131 123 1234</phone> • </contactdetails> • . . . More contact details • </contacts> • </homelist> Store full contact details in separate ‘contactdetails’ elements, just refer to them in the ‘home’ element. Declare in schema which are the keys and which are the keyrefs (next slide) keyref key 08/03/11 DTD & XML Schema 36

  37. Keys and keyrefs • In XML schema, keys and keyrefs can be defined (instead of using ids and idrefs of DTDs, not discussed). • A key is any unique non-null element or attribute • A keyref is another element which contains the value of one of these keys, and can be used to refer to it. • For example, each holiday home has some people who are contacts. The contact details are stored separately to the homes, to avoid duplication. DTD & XML Schema

  38. Defining the key • Give the key a name • contactKey • Use selector to define which element has a key • the contact details in the list of contacts • Use field to define which element or attribute is the key • the cID attribute • <xs:key name = "contactKey"> • <xs:selector xpath ="homelist/contacts/contactdetails"/> • <xs:field xpath="@cID"/> • </xs:key> 08/03/11 DTD & XML Schema 38

  39. Defining the keyref • Give the keyref a name • contactRef • Specify which key it refers to • contactKey • Use selector to define which element contains the reference • The home element • Use field to define which element or attribute is the reference • the cID element • <xs:keyref name = "contactRef" refer = "contactKey" > • <xs:selector xpath ="homelist/homes/home" /> • <xs:field xpath="cID"/> • </xs:keyref> 08/03/11 DTD & XML Schema 39

  40. More designs of schemas • addresses1.xsd shows a totally nested schema. • addresses2.xsd shows one named type declared first, the main schema still nested • To make the schema easier to maintain: • Declare all the simple elements first and then refer to them in the body of the document • Name the declaration of simple and complex types, which can then be used later in the document, more than once if necessary • http://www.w3schools.com/Schema/schema_example.asp • http://www.xfront.com/GlobalVersusLocal.html DTD & XML Schema

  41. Fully named types • The HolidayHomes1.xsd shows a schema with several named types. • Look at the full listing. It looks a bit like this.... • A homelist has some keys and keyrefs, • and is a complex type with a sequence: • a ‘homes’ element which is a sequence of ‘home’ elements defined by ‘homeType’. • a ‘contacts’ element which is a sequence of ‘contactDetail’ elements defined by ‘contactDetailsType’. • The cID attribute is also defined by a simple type DTD & XML Schema

  42. Fully named types • The HolidayHomes2.xsd shows a similar schema but all elements are named types. • Look at the full listing. DTD & XML Schema

  43. XML Schema summary • XML Schema provides a flexible way of describing the structure and the constraints on the values of an XML document • This course introduces the basics. • There is more, of course. For example, schemas can be extended and multiple documents used. DTD & XML Schema

  44. XML: Summary • XML lets you choose application specific element names and define special purpose document types. • A document type definition or schema is needed to define the permitted markup. • What can we do with our valid document? – next 2 lectures DTD & XML Schema

More Related