440 likes | 585 Views
XML Lecture 2. Specifying the structure of an XML file: DTD & XML Schema Monica Farrow email : M.Farrow@hw.ac.uk. Topics – specifying the structure. Problem – how to specify the structure Using a Document Type Definition Using XML Schema Simple nested example Use of namespaces
E N D
XML Lecture 2 Specifying the structure of an XML file: DTD & XML Schema Monica Farrow email : M.Farrow@hw.ac.uk DTD & XML Schema
Topics – specifying the structure Problem – how to specify the structure Using a Document Type Definition Using XML Schema Simple nested example Use of namespaces Extra constraints using simple types Named simple and complex types Specifying links between elements using keys and keyrefs 08/03/11 DTD & XML Schema 2
Example: An Address Book <?xml version ="1.0" encoding="UTF-8"?> <people> } top-level element <person ssn = “4444”> } element containing sub-elements and an attribute <title> Mr </title> } optional element <name> Homer Simpson </name> } required element <tel> 2543 </tel> } at least one tel number <tel> 2544 </tel> <email> homer@math.springfield.edu </email> } up to 2 </person> emails, optional . . . . . } any number of person elements </people> DTD & XML Schema
Reminder of XML syntax • An XML file consists of elements • Each element has a tagname, and is surrounded by start and end tags. Tagnames should be chosen to describe the data. • E.g. <name>Lisa Simpson</name> • Elements can contain data or subelements. • A person element, for example, contains title, name, tel and email sub-elements • Elements can contain attributes, which provide additional information • E.g. <person ssn= “123 4589”> DTD & XML Schema
Defining the structure of an XML file • We can check if an XML file is well-formed (i.e. correct use of XML syntax) • By looking at it, maybe • By loading it into a browser • If well-formed, it will be displayed • However, how can we check that the well-formed file contains the correct elements in the correct quantities? • We need to write a specification for the XML file DTD & XML Schema
Defining the structure of an XML file • There are 2 main alternatives • Document Type Definitions (DTDs) • Original and relatively simple • XML Schema • More versatile and complex • We will look at both • Concentrating on XML Schema DTD & XML Schema
DTD - Specifying the Structure • In a Document Type Definition, we can specify the permitted content for each element, using regular expressions. • A regular expression describes the ‘pattern’ of a string in a concise and flexible way. • Regular expressions are very powerful. This module just uses simple examples. DTD & XML Schema
What’s in a person Element? • For a person element, the regular expression is • name, title?, tel+, email* • This means • name= there must be a name element • title? = there is an optional title element (i.e., 0 or 1 title elements) • name, title?= the name element is followed by an optional title element • tel+ = there are 1 or more telelements • email*= there are 0 or more email elements DTD & XML Schema
The DTD for the address file • The DTD shown on the next slide specifies that: • The top level element is called people • The people element consists of 1 or more personelements • The person element consists of name, title, tel and email elements, with the constraints mentioned in the previous slide • The name, title, tel and email elements each consist of character data • The person element has a required attribute consisting of character data, called ssn DTD & XML Schema
DTD For the Address Book <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE people [ <!ELEMENT people (person+)> <!ELEMENT person (name, title?, tel+, email*)> <!ELEMENT name (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT tel (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ATTLIST person ssn CDATA REQUIRED> ]> PCDATA means parsed character data DTD & XML Schema
DTD Problems • DTDs are rather weak specifications by DB & programming-language standards • Some limitations: • Only one base type – PCDATA • Also no easy way of specifying constraints, e.g range of values, frequency of occurrence • Not easily parsed (since they are not XML) • DTDs are now being superceded by XML schema. DTD & XML Schema
XML Schema • XML Schema are more precise and therefore more complicated than DTDs • They were designed to replace DTDs but DTDs are very well established, and simpler • http://www.w3schools.com/schema DTD & XML Schema
XML Schema features • XML schemas provide the following features • They are written using XML Syntax • So they can be parsed and validated with standard XML tools • XML Schema specific tags are used, such as xs:elementto describe an element in the document. • xs: is a prefix is associated with a namespace (explained soon) • Data types other than #PCDATA • There are some basic built-in simple types such as xs:string, xs:decimal, xs:integer, xs:ID • You can also define your own types. DTD & XML Schema
XML Schema features contd • There is greater control over the permitted constructs • Maximum and minimum occurrences of each element can be easily specified. • The default value of minOccurs and maxOccurs is 1 • A set of permitted values can be specified • Regular expressions can be used to set patterns to be matched • Modularity and inheritance is supported • Types can be named and referred to DTD & XML Schema
Example of specification for one element • The title element is specified like this:<xs:element name = "title" type = "xs:string" minOccurs="0" /> • The xs:element tag is used to specify elements. • The tagname is specified using the name attribute The type is specified using the type attribute and the built-in XML schema type for strings xs:string • The maximum occurrence is not specified, so the default value 1 will be used. • xs:element is an empty element (i.e. no data or sub-elements), so is closed using the backslash instead of an end tag. DTD & XML Schema
Complex elements • Elements can be simple, as in the title element shown on the previous slide • Elements can be more complex, such as the person element shown on the next slide. • A person element has a complex type. It consists of • a sequence of other elements (title, name, tel and email) • A required attribute ssn. Separate slide on attributes. DTD & XML Schema
person element example Details of the person element <xs:element name="person" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name = "title" type = "xs:string" minOccurs="0" /> <xs:element name ="name" type="xs:string"/> <xs:element name = "tel" type="xs:string" /> <xs:element name = "email" type="xs:string" minOccurs="0" maxOccurs=“2"/> </xs:sequence> <xs:attribute name= "ssn" type="xs:positiveInteger" use="required"/> </xs:complexType> </xs:element> DTD & XML Schema
More on attribute specification • For optional attributes • omit use = “required” • Attributes can have a default value • see http://www.w3schools.com/schema • There are special xsd types for id and idref attributes • xs:ID, xs:IDREF • However, using xs:key and xs:keyref is now considered better – discussed at end. DTD & XML Schema
Complete XML Schema for the addresses • The complete XML Schema for our XML file of addresses is supplied in a separate file, and the outer section is shown on the next slide. • The first 2 lines are standard. • The top-level element is called people. It is a complex type consisting of a sequence of personelements • There will be at least one person element (default value of minOccurs), and there is no upper limit to the number of person elements DTD & XML Schema
Simple Schema Example – see addresses1.xsd <?xml version="1.0" ?> <xs:schema xmlns:xs= "http://www.w3.org/2001/XMLSchema"> <xs:element name="people"> <xs:complexType> <xs:sequence> <xs:element name="person" maxOccurs = "unbounded"> details of the person element should be here </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> DTD & XML Schema
Referring to a schema • Save your schema in a file with the extension xsd. • Linking schema definition with our own XML document is done using a special attribute xmlns:xsi of the root node of our XML document: <people xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation=“addresses1.xsd"> • In this example • The specification file, addresses1.xsd, is in the same directory as our xml file • The attribute xsi:noNamespaceSchemaLocation is used because our xml file does not have its own namespace DTD & XML Schema
The xml file <?xml version ="1.0" encoding="UTF-8"?> <people xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="addresses1.xsd"> <person ssn = “4444”> <title> Mr </title> <name> Homer Simpson </name> <tel> 2543 </tel> <tel> 2544 </tel> <email> homer@math.springfield.edu </email> </person> . . . . . any number of person elements </people> 08/03/11 DTD & XML Schema 22
Exercise 1 • Write a sample data xml file for the schema on the following slide. • Don’t worry about the first 2 standard lines • Include 2 owner elements and demonstrate variations in occurrence. DTD & XML Schema
<xs:elementname="dogs"> <xs:complexType> <xs:sequence> <xs:elementmaxOccurs="unbounded"name="owner"> <xs:complexType> <xs:sequence> <xs:elementname="ownername"type="xs:string" /> <xs:elementname="ownertel"type="xs:string" /> <xs:elementmaxOccurs="unbounded"name="dog"> <xs:complexType> <xs:sequence> <xs:elementname="breed"type="xs:string" /> <xs:elementname="dogname"type="xs:string" /> <xs:elementminOccurs= “0" name="age" type="xs:nonNegativeInteger" /> <xs:elementmaxOccurs= "2"name="colour" type="xs:string" /> </xs:sequence> <xs:attributename="id" type="xs:unsignedByte"use="required" /> </xs:complexType> .....closing tags omitted (no space!) DTD & XML Schema
Tools • Tools to generate basic XML Schema file. For simplest clearest files, use: • Visual Studio (in Windows labs) is best • Online http://www.freeformatter.com/xsd-generator.html • Beware of others – see next slide! • XML aware text editor, to edit the basic schema to include simple types and alter max/min occurrences and data types if necessary. • Validators, as referred to in later slide DTD & XML Schema
Using Tools - warning • BEWARE that many tools automatically create schemas from your xml file but.... • They may insert extra unnecessary or incorrect details. E.g. • msdata Ordinal type – not used in this course • xs:choice, implying a choice rather than a sequence • Enumerated lists assuming that your data will never change • Complex types where it is not necessary DTD & XML Schema
Validating • Validator • http://www.utilities-online.info/xsdvalidation/#.UQcFuWfG18E • insert xsd and xml file • Validate separately and against each other • http://www.corefiling.com/opensource/schemaValidate.html • Upload xsd and xml file • Others also on the web DTD & XML Schema
Namespaces • You’ll see namespaces when using XML schemas and stylesheets. • There is a namespace associated with the tags used that lets them be used unambiguously. • e.g. a schema element or a chemical element? • An html table or furniture in a room? • A namespace is identified by • A short prefix e.g. xs (as used in xsd files) • A unique URL DTD & XML Schema
Namespace declaration • So at the start of a document we must specify what namespaces we are using. • In the schema example, we are using the XML schema namespace which we identify with the xs prefix • We declare this namespace in an attribute in the top-level element<xs:schema xmlns:xs= "http://www.w3.org/2001/XMLSchema"> DTD & XML Schema
Namespace declaration continued • We then use the xs prefix in all the XML Schema elements and built-in types e.g. complexType, sequence, element, string etc • You may see alternative prefixes used e.g. xsd • We could use a namespace for our own XML file, and this is recommended if it will be widely used. • Use your company/project URL, and invent a suitable prefix. • Or use multiple namespaces, your own and/or those defined by others. • It is not necessary in this course. DTD & XML Schema
Example of using own namespace • In xmlschema • <xs:schema xmlns:xs= "http://www.w3.org/2001/XMLSchema" • xmlns = "http://example.org/pp" • targetNamespace= "http://example.org/pp" • elementFormDefault="qualified" attributeFormDefault="qualified"> • <xs:element name="people"> • ...etc • In xml file • <?xml version ="1.0" encoding="UTF-8"?> • <pp:people • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xmlns = "http://example.org/pp" • xsi:schemaLocation "http://example.org/pp addresses1.xsd"> • <pp:person ssn = “4444”> • <pp:title> Mr </pp:title> • ...etc 08/03/11 DTD & XML Schema 31
Constraints on data 1 • You can also constrain the values of the data in • a range • <xs:minInclusive value="0"/> <xs:maxInclusive value="120"/> • a length (also minLength, maxLength) • <length value="3" /> • a pattern • <xs:pattern value="([a-z])*"/> • Means 0 or more lowercase alphabetic chars DTD & XML Schema
Constraints on data 2 • an enumerated list • <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> • These constraints should be applied to a basic type, as shown on the next slide. • See http://www.w3.org/TR/xmlschema-2/ for which constraints can be applied to which basic types DTD & XML Schema
Declaring your own types – see addresses2.xsd • Named types can be used for elements or attributes. Here’s an example which specifies constraints on the attribute • A named type is declared <xs:simpleType name = "ssstype"> <xs:restriction base="xs:positiveInteger"> <xs:maxInclusive value="999999"/> </xs:restriction> </xs:simpleType> • And then used as the attribute type <xs:attribute name= "ssn" type="ssstype" use="required"/> DTD & XML Schema
Exercises 2 • Alter the schema given in the lecture notes by creating a simple type for the tel element. Specify that there must be between 1 and 4 telephone numbers which must be in the range 1000 – 9999 DTD & XML Schema
Homes with shared contact details • <homelist > • <homes> • <home id = "1" > • <hname>Rose Cottage</hname> • <location>Inverness</location> • <cID>C1</cID> • <cID>C2</cID> • </home> • . . . More homes • </homes> • <contacts> • <contactdetails cID="C1“ > • <cname>John Smith</cname> • <phone>0131 123 1234</phone> • </contactdetails> • . . . More contact details • </contacts> • </homelist> Store full contact details in separate ‘contactdetails’ elements, just refer to them in the ‘home’ element. Declare in schema which are the keys and which are the keyrefs (next slide) keyref key 08/03/11 DTD & XML Schema 36
Keys and keyrefs • In XML schema, keys and keyrefs can be defined (instead of using ids and idrefs of DTDs, not discussed). • A key is any unique non-null element or attribute • A keyref is another element which contains the value of one of these keys, and can be used to refer to it. • For example, each holiday home has some people who are contacts. The contact details are stored separately to the homes, to avoid duplication. DTD & XML Schema
Defining the key • Give the key a name • contactKey • Use selector to define which element has a key • the contact details in the list of contacts • Use field to define which element or attribute is the key • the cID attribute • <xs:key name = "contactKey"> • <xs:selector xpath ="homelist/contacts/contactdetails"/> • <xs:field xpath="@cID"/> • </xs:key> 08/03/11 DTD & XML Schema 38
Defining the keyref • Give the keyref a name • contactRef • Specify which key it refers to • contactKey • Use selector to define which element contains the reference • The home element • Use field to define which element or attribute is the reference • the cID element • <xs:keyref name = "contactRef" refer = "contactKey" > • <xs:selector xpath ="homelist/homes/home" /> • <xs:field xpath="cID"/> • </xs:keyref> 08/03/11 DTD & XML Schema 39
More designs of schemas • addresses1.xsd shows a totally nested schema. • addresses2.xsd shows one named type declared first, the main schema still nested • To make the schema easier to maintain: • Declare all the simple elements first and then refer to them in the body of the document • Name the declaration of simple and complex types, which can then be used later in the document, more than once if necessary • http://www.w3schools.com/Schema/schema_example.asp • http://www.xfront.com/GlobalVersusLocal.html DTD & XML Schema
Fully named types • The HolidayHomes1.xsd shows a schema with several named types. • Look at the full listing. It looks a bit like this.... • A homelist has some keys and keyrefs, • and is a complex type with a sequence: • a ‘homes’ element which is a sequence of ‘home’ elements defined by ‘homeType’. • a ‘contacts’ element which is a sequence of ‘contactDetail’ elements defined by ‘contactDetailsType’. • The cID attribute is also defined by a simple type DTD & XML Schema
Fully named types • The HolidayHomes2.xsd shows a similar schema but all elements are named types. • Look at the full listing. DTD & XML Schema
XML Schema summary • XML Schema provides a flexible way of describing the structure and the constraints on the values of an XML document • This course introduces the basics. • There is more, of course. For example, schemas can be extended and multiple documents used. DTD & XML Schema
XML: Summary • XML lets you choose application specific element names and define special purpose document types. • A document type definition or schema is needed to define the permitted markup. • What can we do with our valid document? – next 2 lectures DTD & XML Schema