1 / 50

Towards a Knowledge Society

XML. Towards a Knowledge Society. Why & How?. What information can we see…. WWW2002 The eleventh international world wide web conference Sheraton waikiki hotel Honolulu, hawaii, USA 7-11 may 2002 1 location 5 days learn interact Registered participants coming from

jeanne
Download Presentation

Towards a Knowledge Society

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML Towards a Knowledge Society Why & How?

  2. What information can we see… WWW2002 The eleventh international world wide web conference Sheraton waikiki hotel Honolulu, hawaii, USA 7-11 may 2002 1 location 5 days learn interact Registered participants coming from australia, canada, chile denmark, france, germany, ghana, hong kong, india, ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire Register now On the 7th May Honolulu will provide the backdrop of the eleventh international world wide web conference. This prestigious event … Speakers confirmed Tim berners-lee Tim is the well known inventor of the Web, … Ian Foster Ian is the pioneer of the Grid, the next generation internet …

  3. What information can a machine see… WWW2002 The eleventh international world wide web conference Sheraton waikiki hotel Honolulu, hawaii, USA 7-11 may 2002 1 location 5 days learn interact Registered participants coming from australia, canada, chile denmark, france, germany, ghana, hong kong, india, ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire Register now On the 7th May Honolulu will provide the backdrop of the eleventh international world wide web conference. This prestigious event … Speakers confirmed Tim berners-lee Tim is the well known inventor of the Web, … Ian Foster Ian is the pioneer of the Grid, the next generation internet …

  4. Solution: markup with “meaningful” tags? <name>WWW2002 The eleventh international world wide webcon</name> <location>Sheraton waikiki hotel Honolulu, hawaii, USA</location> <date>7-11 may 2002</date> <slogan>1 location 5 days learn interact</slogan> <participants>Registered participants coming from australia, canada, chile denmark, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire</participants> <introduction>Register now On the 7th May Honolulu will provide the backdrop of the eleventh prestigious Speakers confirmed</introduction> <speaker>Tim berners-lee</speaker> <bio>Tim is the well known inventor of the Web,</bio>…

  5. Structured Web Documents in XML XML, a language that lets one write structured Web documents with a user-defined vocabulary

  6. Web page which contains information about a particular book in html <h2>Nonmonotonic Reasoning: Context-Dependent Reasoning</h2> <i>by <b>V. Marek</b> and <b>M. Truszczynski</b></i><br> Springer 1993<br> ISBN 0387976892

  7. A typical representation in xml <book> <title> Nonmonotonic Reasoning: Context-Dependent Reasoning </title> <author>V. Marek</author> <author>M. Truszczynski</author> <publisher>Springer</publisher> <year>1993</year> <ISBN>0387976892</ISBN> </book>

  8. Imagine an intelligent agent trying to retrieve the authors of the particular book • From html • From xml

  9. XML allows to represent information that is also machine-accessible. • XML separates content from use and presentation.

  10. Another Example <h2>Relationship matter-energy</h2> <i> E = M × c2 </i> <equation> <meaning>Relationship matter-energy</meaning> <leftside> E </leftside> <rightside> M × c2 </rightside> </equation>

  11. XML is a meta-language: it does not have a fixed set of tags, but allows users to define tags of their own. • applications on the WWW must agree on common vocabularies if they need to communicate and collaborate • Communities and business sectors are in the process of defining their specialized vocabularies, creating XML applications

  12. mathematics (MathML) • bioinformatics (BSML) • human resources (HRML) • astronomy (AML) • news (NewsML) • investment (IRML) • SBML (System Biology) • Bioinformatic Sequence Markup Language (BSML) • MicroArray and Gene Expression Markup Language (MAGE-ML)

  13. Chemical Markup Language • Molecular Dynamics [Markup] Language (MoDL) • StarDOM - Transforming Scientific Data into XML • Bioinformatic Sequence Markup Language (BSML) • BIOpolymer Markup Language (BIOML) • CellML • Gene Expression Markup Language (GEML) • GeneX Gene Expression Markup Language (GeneXML) • Genome Annotation Markup Elements (GAME) • Microarray Markup Language (MAML) • XML for Multiple Sequence Alignments (MSAML) • Systems Biology Markup Language (SBML) • OMG Gene Expression RFP • Protein Extensible Markup Language (PROXIML)

  14. The XML Language An XML document consists of • a prolog • a number of elements • and attributes

  15. Prolog of an XML Document The prolog consists of • an XML declaration and • an optional reference to external structuring documents <?xml version="1.0" encoding="UTF-16"?> <!DOCTYPE book SYSTEM "book.dtd">

  16. XML Elements • The “things” the XML document talks about • E.g. books, authors, publishers • An element consists of: • an opening tag • the content • a closing tag <lecturer>David Billington</lecturer>

  17. XML Elements (continue) • Tag names can be chosen almost freely. • The first character must be a letter, an underscore, or a colon • No name may begin with the string “xml” in any combination of cases • E.g. “Xml”, “xML”

  18. Content of XML Elements • Content may be text, or other elements, or nothing <lecturer> <name>David Billington</name> <phone> +61 − 7 − 3875 507 </phone> </lecturer> • If there is no content, then the element is called empty; it is abbreviated as follows: <lecturer/> for <lecturer></lecturer>

  19. XML Attributes • An empty element is not necessarily meaningless • It may have some properties in terms of attributes • An attribute is a name-value pair inside the opening tag of an element <lecturer name="David Billington" phone="+61 − 7 − 3875 507“ />

  20. XML Attributes: An Example <order orderNo="23456" customer="John Smith" date="October 15, 2002“> <item itemNo="a528" quantity="1"/> <item itemNo="c817" quantity="3"/> </order>

  21. The Same Example without Attributes <order> <orderNo>23456</orderNo> <customer>John Smith</customer> <date>October 15, 2002</date> <item> <itemNo>a528</itemNo> <quantity>1</quantity> </item> <item> <itemNo>c817</itemNo> <quantity>3</quantity> </item> </order>

  22. XML Elements vs Attributes • Attributes can be replaced by elements • When to use elements and when attributes is a matter of taste and need • But attributes cannot be nested

  23. Further Components of XML Docs • Comments • A piece of text that is to be ignored by parser • <!-- This is a comment --> • Processing Instructions (PIs) • Define procedural attachments • <?stylesheet type="text/css" href="mystyle.css"?>

  24. Well-Formed XML Documents • Syntactically correct documents • Some syntactic rules: • Only one outermost element (called root element) • Each element contains an opening and a corresponding closing tag • Tags may not overlap • <author><name>Lee Hong</author></name> • Attributes within an element have unique names • Element and tag names must be permissible

  25. The Tree Model of XML Documents: An Example <email> <head> <from name="Michael Maher" address="michaelmaher@cs.gu.edu.au"/> <to name="Grigoris Antoniou" address="grigoris@cs.unibremen.de"/> <subject>Where is your draft?</subject> </head> <body> Grigoris, where is the draft of the paper you promised me last week? </body> </email>

  26. The Tree Model of XML Documents: An Example

  27. The Tree Model of XML Docs • The tree representation of an XML document is an ordered labeled tree: • There is exactly one root • There are no cycles • Each non-root node has exactly one parent • Each node has a label. • The order of elements is important • … but the order of attributes is not important

  28. XML is not enough to ensure valid data structure! • Any XML document which conforms to the XML syntax (such as every tag must have a corresponding closing tag is considered) to be well-formed • However, this does not mean that all the structure of the data is what you wanted. For instance you may want to enforce: • That a particular data field is present for each child • Data fields in each child appear in the same order • That a data field may not be present more than once in a child node • How machines know about structure they process?

  29. Issues • Validation and Interoperability • How application can verify whether the data you receive from the outside world? • Is xml document follows the specified structure? • Is xml document follows the restrictions on the elements and attributes? • How all the xml document follows that particular structure?

  30. Structuring XML Documents • An XML document is valid if • it is well-formed • respects the structuring information it uses • There are two ways of defining the structure of XML documents: • DTDs (the older and more restricted way) • XML Schema (offers extended possibilities)

  31. 2nd Lecture

  32. DTD: Document Type Definition • DTD specifies grammar rules for an XML document enforcing the structure • This allows several XML documents prepared from various sources can be validated using a single set of grammar rules • An XML document that adheres to a DTD is called valid. • DTD specifies rules for elements (child nodes) and how it can be expanded into sub elements (child nodes) • DTD consists - Element declarations, Attribute list, Data types etc. • DTDs : difficult to create! CCTM: Course material developed by James King (james.king@londonmet.ac.uk)

  33. DTD: Element Type Definition <lecturer> <name>David Billington</name> <phone> +61 − 7 − 3875 507 </phone> </lecturer> DTD for above element (and all lecturer elements): <!ELEMENT lecturer (name,phone)> <!ELEMENT name (#PCDATA)> <!ELEMENT phone (#PCDATA)>

  34. The Meaning of the DTD • The element types lecturer, name, and phone may be used in the document • A lecturer element contains a name element and a phone element, in that order (sequence) • A name element and a phone element may have any content • In DTDs, #PCDATA is the only atomic type for elements

  35. DTD: Disjunction in Element Type Definitions • We express that a lecturer element contains either a name element or a phone element as follows: <!ELEMENT lecturer (name|phone)> • A lecturer element contains a name element and a phone element in any order. <!ELEMENT lecturer((name,phone)|(phone,name))>

  36. Example of an XML Element <order orderNo="23456" customer="John Smith" date="October 15, 2002"> <item itemNo="a528" quantity="1"/> <item itemNo="c817" quantity="3"/> </order>

  37. The Corresponding DTD <!ELEMENT order (item+)> <!ATTLIST order orderNo ID #REQUIRED customer CDATA #REQUIRED date CDATA #REQUIRED> <!ELEMENT item EMPTY> <!ATTLIST item itemNo ID #REQUIRED quantity CDATA #REQUIRED comments CDATA #IMPLIED>

  38. Comments on the DTD • The item element type is defined to be empty • + (after item) is a cardinality operator: • ?: appears zero times or once • *: appears zero or more times • +: appears one or more times • No cardinality operator means exactly once

  39. Comments on the DTD (cont.) • In addition to defining elements, we define attributes • This is done in an attribute list containing: • Name of the element type to which the list applies • A list of triplets of attribute name, attribute type, and value type • Attribute name: A name that may be used in an XML document using a DTD

  40. DTD: Attribute Types • Similar to predefined data types, but limited selection • The most important types are • CDATA, a string (sequence of characters) • ID, a name that is unique across the entire XML document • IDREF, a reference to another element with an ID attribute carrying the same value as the IDREF attribute • IDREFS, a series of IDREFs • (v1| . . . |vn), an enumeration of all possible values • Limitations: no dates, number ranges etc.

  41. DTD: Attribute Value Types • #REQUIRED • Attribute must appear in every occurrence of the element type in the XML document • #IMPLIED • The appearance of the attribute is optional • #FIXED "value" • Every element must have this attribute • "value" • This specifies the default value for the attribute

  42. Referencing with IDREF and IDREFS <!ELEMENT family (person*)> <!ELEMENT person (name)> <!ELEMENT name (#PCDATA)> <!ATTLIST person id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED children IDREFS #IMPLIED>

  43. An XML Document Respecting the DTD <family> <person id="bob" mother="mary" father="peter"> <name>Bob Marley</name> </person> <person id="bridget" mother="mary"> <name>Bridget Jones</name> </person> <person id="mary" children="bob bridget"> <name>Mary Poppins</name> </person> <person id="peter" children="bob"> <name>Peter Marley</name> </person> </family>

  44. A DTD for an Email Element <!ELEMENT email (head,body)> <!ELEMENT head (from,to+,cc*,subject)> <!ELEMENT from EMPTY> <!ATTLIST from name CDATA #IMPLIED address CDATA #REQUIRED> <!ELEMENT to EMPTY> <!ATTLIST to name CDATA #IMPLIED address CDATA #REQUIRED>

  45. A DTD for an Email Element (cont..) <!ELEMENT cc EMPTY> <!ATTLIST cc name CDATA #IMPLIED address CDATA #REQUIRED> <!ELEMENT subject (#PCDATA)> <!ELEMENT body (text,attachment*)> <!ELEMENT text (#PCDATA)> <!ELEMENT attachment EMPTY> <!ATTLIST attachment encoding (mime|binhex) "mime" file CDATA #REQUIRED>

  46. Interesting Parts of the DTD • A head element contains (in that order): • a from element • at least one to element • zero or more cc elements • a subject element • In from, to, and cc elements • the name attribute is not required • the address attribute is always required

  47. Interesting Parts of the DTD (cont..) • A body element contains • a text element • possibly followed by a number of attachment elements • The encoding attribute of an attachment element must have either the value “mime” or “binhex” • “mime” is the default value

  48. Remarks on DTDs • Recursive definitions possible in DTDs • <!ELEMENT bintree ((bintree root bintree)|emptytree)>

  49. DTD Problems • Attribute Value • It is possible to specify all the values an attribute can have or allow it to have any value. • It is not possible to perform type checking on an attribute’s value so you cannot specify an attributes value is a integer or float… • Does not follow xml syntax

  50. Next Lecture • XML Schema

More Related