1 / 50

ACE104 Lecture 2

ACE104 Lecture 2. XML Simple XML Schema. XML in messaging. Most modern languages have method of representing structured data. Typical flow of events in application. Read data (file, db, socket). Marshal objects. Manipulate in program. Unmarshal (file, db, socket).

gavin
Download Presentation

ACE104 Lecture 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ACE104Lecture 2 XML Simple XML Schema

  2. XML in messaging • Most modern languages have method of representing structured data. • Typical flow of events in application Read data (file, db, socket) Marshal objects Manipulate in program Unmarshal (file, db, socket) • Many language-specific technologies to reduce these steps: RMI, object • serialization in any language, CORBA (actually somewhat language neutral), • MPI, etc. • XML provides a very appealing alternative that hits the sweet spot for • many applications

  3. Fortran Java C type Student character(len=*) :: name character(len=*) :: ssn integer :: age real :: gpa end type Student class Student{ public String name; public String ssn; public int age; public float gpa; } struct Student{ char* name; char* ssn; int age; float gpa; } User-defined types in programming languages • One view of XML is as a text-based, programming-language-neutral way of representing structured information.Compare:

  4. Sample XML Schema • In XML, (a common) datatype description is called an XML schema. • DTD and Relax NG are other common alternatives • Below uses schema just for illustration purposes • Note that schema itself is written in XML • <?xml version="1.0" encoding="UTF-8"?> • <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" • elementFormDefault="qualified" attributeFormDefault="unqualified"> • <xs:element name="student"> • <xs:complexType> • <xs:sequence> • <xs:element name="name" type="xs:string"/> • <xs:element name="ssn" type="xs:string"/> • <xs:element name="age" type="xs:integer"/> • <xs:element name="gpa" type="xs:decimal"/> • </xs:sequence> • </xs:complexType> • </xs:element> • </xs:schema> Ignore this For now

  5. Alternative schema • In this example studentType is defined separately rather than anonymously • <xs:schema> • <xs:element name="student" type="studentType“/> • <xs:complexType name="studentType"> • <xs:sequence> • <xs:element name="name" type="xs:string"/> • <xs:element name="ssn" type="xs:string"/> • <xs:element name="age" type="xs:integer"/> • <xs:element name="gpa" type="xs:decimal"/> • </xs:sequence> • </xs:complexType> • </xs:schema> new type defined separately

  6. Alternative: DTD • Can also use a DTD (Document Type Descriptor), but this is • much simpler than a schema but also much less powerful • (notice the lack of types) • <!DOCTYPE Student [ • <! – Each XML file is stored in a document whose name is the same as the root node -- > • <! ELEMENT Student (name,ssn,age,gpa)> • <! – Student has four attributes -- > • <!ELEMENT name (#PCDATA)> • <! – name is parsed character data -- > • <!ELEMENT ssn (#PCDATA)> • <!ELEMENT age (#PCDATA)> • <!ELEMENT gpa (#PCDATA)> • ]>

  7. Another alternative: Relax NG • Gaining in popularity • Can be very simple to write and at same time has many more features than DTD • Still much less common than Schema

  8. Creating instances of types In programming languages, we instantiate objects: struct Student s1, s2; s1.name = “Andrew” s1.ssn=“123-45-6789”; Student s = new Student(); s1.name = “Andrew”; s1.ssn=“123-45-6789”; . type(Student) :: s1 s1%name = ‘Andrew’ . C Java Fortran

  9. Creating XML documents • XML is not a programming language! • In XML we make a Student “object” in an xml file (Student.xml): <Student> <name>Andrew</name> <ssn>123-45-6789</ssn> <age>39</age> <gpa>2.0</gpa> </Student> • Think of this as like a serialized object.

  10. XML and Schema • Note that there are two parts to what we did • Defining the “structure” layout • Defining an “instance” of the structure • The first is done with an appropriate Schema or DTD. • The second is the XML part • Both can go in the same file, or an XML file can refer to an external Schema or DTD (typical) • From this point on we use only Schema • Exercise 1

  11. ? • Question: What can we do with such a file? • Some answers: • Write corresponding Schema to define its content • Write XSL transformation to display • Parse into a programming language

  12. Exercise 1

  13. Exercise 1 Solution <?xml version="1.0" encoding="UTF-8"?> <cars> <car> <make>dodge</make> <model>ram</model> <color>red</color> <year>2004</year> <mileage>22000</mileage> </car> <car> <make>Ford</make> <model>Pinto</model> <color>white</color> <year>1980</year> <mileage>100000</mileage> </car> </cars>

  14. Some sample XML documents

  15. Order / Whitespace • Note that element order is important, but whitespace in element data is not. This is the same as far as the xml parser is concerned: • <Article > • <Headline>Direct Marketer Offended by Term 'Junk Mail' </Headline> • <authors> • <author> Joe Garden</author> • <author> Tim Harrod</author> • </authors> • <abstract>Dan Spengler, CEO of the direct-mail-marketing firm Mailbox of • Savings, took umbrage Monday at the use of the term <it>junk mail</it> • </abstract> • <body type="url" > http://www.theonion.com/archive/3-11-01.html </body> • </Article>

  16. Molecule Example XML is extremely useful for standardizing data sharing within specialized domains. Below is a part of the Chemical Markup Language describing a water molecule and its constituents <?xml version "1.0" ?> <CML> <MOL TITLE="Water" > <ATOMS> <ARRAY BUILTIN="ELSYM" > H O H</ARRAY> </ATOMS> <BONDS> <ARRAY BUILTIN="ATID1" >1 2</ARRAY> <ARRAY BUILTIN="ATID2" >2 3</ARRAY> <ARRAY BUILTIN="ORDER" >1 1</ARRAY> </BONDS> </MOL> </CML>

  17. Rooms example A typical example showing a few more XML features: <?xml version="1.0" ?> <rooms> <room name="Red"> <capacity>10</capacity> <equipmentList> <equipment>Projector</equipment> </equipmentList> </room> <room name="Green"> <capacity>5</capacity> <equipmentList /> <features> <feature>No Roof</feature> </features> </room> </rooms>

  18. Suggestion • Try building each of those documents in an XML builder tool (XMLSpy, Oxygen, etc.) or at least an XML-aware editor. • Note: it is not required to create a schema to do this. Just create new XML document and start building.

  19. Dissecting an XML Document

  20. Things that can appear in an XML document • ELEMENTS: simple, complex, empty, or mixed content model; attributes. • The XML declaration • Processing instructions(PIs) <? …?> • Most common is <?xml-stylesheet …?> • <?xml-stylesheet type=“text/css” href=“mys.css”?> • Comments<!-- comment text -->

  21. Parts of an XML document Declaration <?xml version "1.0"?> <CML><MOL TITLE="Water" > <ATOMS> <ARRAY BUILTIN="ELSYM" > H O H</ARRAY> </ATOMS> <BONDS> <ARRAY BUILTIN="ATID1" >1 2</ARRAY> <ARRAY BUILTIN="ATID2" >2 3</ARRAY> <ARRAY BUILTIN="ORDER" >1 1</ARRAY> </BONDS> </MOL> </CML> Tags Begin Tags End Tags Attributes Attribute Values An XML element is everything from (including) the element's start tag to (including) the element's end tag.

  22. XML and Trees Root element • Tags give the structure of a document. They divide the document up into Elements, starting at the top most element, the root element. The stuff inside an element is its content – content can include other elements along with ‘character data’ CML MOL ATOMS BONDS ARRAY ARRAY ARRAY ARRAY CDATA sections 12 23 11 HOH

  23. XML and Trees Root element <?xml version "1.0"?> <CML> <MOL TITLE="Water" > <ATOMS> <ARRAY BUILTIN="ELSYM" > H O H</ARRAY> </ATOMS> <BONDS> <ARRAY BUILTIN="ATID1" >1 2</ARRAY> <ARRAY BUILTIN="ATID2" >2 3</ARRAY> <ARRAY BUILTIN="ORDER" >1 1</ARRAY> </BONDS> </MOL> </CML> CML MOL ATOMS BONDS ARRAY ARRAY ARRAY ARRAY Data sections 12 23 11 HOH

  24. XML and Trees rooms room room capacity features capacity equipmentlist equipmentlist equipment 10 5 feature projector No Roof

  25. More detail on elements

  26. Element relationships • Book is the root element. • Title, prod, and chapter are • child elements of book. • Book is the parent element • of title, prod, and chapter. • Title, prod, and chapter are • siblings (or sister elements) • because they have the • same parent. <book> <title>My First XML</title> <prod id="33-657" media="paper"></prod> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing tag</para> <para>Elements must be properly nested</para> </chapter> </book>

  27. Well formed XML

  28. Well-formed vs Valid • An XML document is said to be well-formed if it obeys basic semantic and syntactic constraints. • This is different from a valid XML document, which (as we will see in more depth) properly matches a schema.

  29. Rules for Well-Formed XML • An XML document is considered well-formed if it obeys the following rules: • There must be one element that contains all others (root element) • All tags must be balanced • <BOOK>...</BOOK> • <BOOK /> • Tags must be nested properly: • <BOOK> <LINE> This is OK </LINE> </BOOK> • <LINE> <BOOK> This is </LINE> definitely NOT </BOOK> OK • Element text is case-sensitive so • <P>This is not ok, even though we do it all the time in HTML!</p>

  30. More Rules for Well-Formed XML • The attributes in a tag must be in quotes • < ITEM CATEGORY=“Home and Garden” Name=“hoe-matic t500”> • Comments are allowed • <!–- They are done just as in HTML… --> • Must begin with • <?xml version=‘1.0’ ?> • Special characters must be escaped: the most common are • < " ' > & • <formula> x &lt; y+2x </formula> • <cd title="&quot; mmusic">

  31. Naming Rules • Naming rules for XML elements • Names may contain letters, numbers, and other characters • Names must not start with a number or punctuation character • Names must not start with the letters xml (or XML or Xml ..) • Names cannot contain spaces • Any name can be used, no words are reserved, but the idea is to make names descriptive. Names with an underscore separator are typical • Examples: <first_name>, <date_of_birth>, etc.

  32. XML Tools • XML can be created with any text editor • Normally we use an XML-friendly editor • e.g. XMLSpy • nXML emacs extensions • MSXML on Windows • Oxygen • Etc etc. • To check and validate XML, use either these tools and/or xmllint on Unix systems.

  33. Another View • XML-as-data is one way to introduce XML • Another is as a markup language similar to html. • One typically says that html has a fixed tag set, whereas XML allows the definition of arbitrary tags • This analogy is particularly useful when the goal is to use XML for text presentation -- that is, when most of our data fields contain text • Note that mixed element/text fields are permissible in XML

  34. Article example <Article > <Headline>Direct Marketer Offended by Term 'Junk Mail' </Headline> <authors> <author> Joe Garden</author> <author> Tim Harrod</author> </authors> <abstract>Dan Spengler, CEO of the direct-mail-marketing firm Mailbox of Savings, took umbrage Monday at the use of the term <it>junk mail</it>. </abstract> <body type="url" > http://www.theonion.com/archive/3-11-01.html </body> </Article>

  35. More uses of XML • There is more! • A very popular use of XML is as a base syntax for programming languages (the elements become program control structures) • XSLT, BPEL, ant, etc. are good examples • XML is ubiqitous and must have a deep understanding to be efficient and productive • Many other current and potential uses -- up to the creativity of the programmer

  36. XML Schema • There are many details to cover of schema specification. It is extremely rich, flexible, and somewhat complex • We will do this in detail next lecture • Now we begin with a brief introduction

  37. XML Schema • XML itself does not restrict what elements existing in a document. • In a given application, you want to fix a vocabulary -- what elements make sense, what their types are, etc. • Use a Schema to define an XML dialect • MusicXML, ChemXML, VoiceXML, ADXML, etc. • Restrict documents to those tags. • Schema can be used to validate a document -- ie to see if it obeys the rules of the dialect.

  38. Schema determine … • What sort of elements can appear in the document. • What elements MUST appear • Which elements can appear as part of another element • What attributes can appear or must appear • What kind of values can/must be in an attribute.

  39. <?xml version="1.0" encoding="UTF-8"?> <library> <book id="b0836217462" available="true"> <isbn> 0836217462 </isbn> <title lang="en"> Being a Dog is a Full-Time Job </title> <author id="CMS"> <name> Charles Schulz </name> <born> 1922-11-26 </born> <dead> 2000-02-12 </dead> </author> <character id="PP"> <name> Peppermint Patty </name> <born> 1966-08-22 </born> <qualification> bold,brash, and tomboyish </qualification> </character> <character id="Snoopy"> <name> Snoopy</name> <born>1950-10-04</born> <qualification>extroverted beagle</qualification> </character> <character id="Schroeder"> <name>Schroeder</name> <born>1951-05-30</born> <qualification>brought classical music to the Peanuts Strip</qualification> </character> <character id="Lucy"> <name>Lucy</name> <born>1952-03-03</born> <qualification>bossy, crabby, and selfish</qualification> </character> </book> </library> • We start with sample • XML document and • reverse engineer a • schema as a simple • example • First identify the elements: • author, book, born, character, • dead, isbn, library, name, • qualification, title • Next categorize by content • model • Empty: contains nothing • Simple: only text nodes • Complex: only sub-elements • Mixed: text nodes + sub-elements • Note: content model independent • of comments, attributes, or • processing instructions!

  40. Content models • Simple content model: name, born, title, dead, isbn, qualification • Complex content model: libarary, character, book, author

  41. Content Types • We further distinguish between complex and simple content Types: • Simple Type: An element with only text nodes and no child elements or attributes • Complex Type: All other cases • We also say (and require) that all attributes themselves have simple type

  42. Content Types • Simple content type: name, born, dead, isbn, qualification • Complex content type: library, character, book, author, title

  43. Exercise2 answer • In the previous example <book> • book has element content, because it contains other elements. • Chapter has mixed content because it contains both text and other elements. • Para has simple content (or text content) because it contains only text. • Prod has empty content, because it carries no information

  44. Building the schema • Schema are XML documents • They must contain a schema root element as such <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com" elementFormDefault="qualified"> ... ... </xs:schema> • We will discuss details in a bit -- note for now that yellow part can be excluded for now.

  45. Flat schema for library Start by defining all of the simple types (including attributes): <xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema> <xs:element name=“name” type=“xs:string”/> <xs:element name=“qualification” type=“xs:string”/> <xs:element name=“born” type=“xs:date”/> <xs:element name=“dead” type=“xs:date”/> <xs:element name=“isbn” type=“xs:string”/> <xs:attribute name=“id” type=“xs:ID”/> <xs:attribute name=“available” type=“xs:boolean”/> <xs:attribute name=“lang” type=“xs:language/> …/… </xs:schema>

  46. Complex types with simple content Now to complex types with simple content: <title lang=“en”> Being a Dog is … </title> <xs:element name=“title”> <xs:complexType> <xs:simpleContent> <xs:extension base=“xs:string”> <xs:attribute ref=“lang”/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> “the element named title has a complex type which is a simple content obtained by extending the predefined datatype xs:string by adding the attribute defined in this schema and having the name lang.”

  47. Complex Types All other types are complex types with complex content. For example: <xs:element name=“library”> <xs:complexType> <xs:sequence> <xs:element ref=“book” maxOccurs=“unbounded”/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name=“author”> <xs:complexType> <xs:sequence> <xs:element ref=“name”/> <xs:element ref=“born”/> <xs:element ref=“dead” minOccurs=0/> </xs:sequence> <xs:attribute ref=“id”/> </xs:complexType> </xs:element>

  48. <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="name" type="xs:string"/> <xs:element name="qualification" type="xs:string"/> <xs:element name="born" type="xs:date"> </xs:element> <xs:element name="dead" type="xs:date"> </xs:element> <xs:element name="isbn" type="xs:string"> </xs:element> <xs:attribute name="id" type="xs:ID"> </xs:attribute> <xs:attribute name="available" type="xs:boolean"> </xs:attribute> <xs:attribute name="lang" type="xs:language"> </xs:attribute> <xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute ref="lang"> </xs:attribute> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element maxOccurs="unbounded" ref="book"> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="author"> <xs:complexType> <xs:sequence> <xs:element ref="name"> </xs:element> <xs:element ref="born"> </xs:element> <xs:element ref="dead" minOccurs="0"> </xs:element> </xs:sequence> <xs:attribute ref="id"> </xs:attribute> </xs:complexType> </xs:element> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="isbn"> </xs:element> <xs:element ref="title"> </xs:element> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded”/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="available"> </xs:attribute> <xs:attribute ref="id"> </xs:attribute> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:sequence> <xs:attribute ref="id"> </xs:attribute> </xs:complexType> </xs:element> </xs:schema>

  49. <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element name="book" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="isbn" type="xs:integer"> </xs:element> <xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="lang" type="xs:language" > </xs:attribute> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="author" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"> </xs:element> <xs:element name="born" type="xs:date"> </xs:element> <xs:element name="dead" type="xs:date"> </xs:element> </xs:sequence> <xs:attribute name="id" type="xs:ID"> </xs:attribute> </xs:complexType> </xs:element> <xs:element name="character" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"> </xs:element> <xs:element name="born" type="xs:date"> </xs:element> <xs:element name="qualification" type="xs:string" > </xs:element> </xs:sequence> <xs:attribute name="id" type="xs:ID"> </xs:attribute> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute type="xs:ID" name="id"> </xs:attribute> <xs:attribute name="available" type="xs:boolean"> </xs:attribute> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> Same schema but with everything defined locally!

  50. Next Lecture • Even with this simple example there are many design issues to discuss • When is a flat layout better • When is a nested layout better • What are scoping rules • When to use ref= vs. defining new type • Schema in depth is topic of next lecture

More Related