1 / 85

Web data exchange formats

Web data exchange formats. Introduction and Overview. Web data exchange formats. XML JSON YAML. XML o utline. What is XML & Why XML The rules of XML documents XML schema and validation XML processing DOM SAX JAXP JAXB Digester. Before XML.

derry
Download Presentation

Web data exchange formats

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web data exchange formats Introduction and Overview

  2. Web data exchange formats • XML • JSON • YAML

  3. XML outline • What is XML & Why XML • The rules of XML documents • XML schema and validation • XML processing • DOM • SAX • JAXP • JAXB • Digester

  4. Before XML • HTML, Hyper-Text Markup Language, the most successful markup language of all the times • First definition, HTML 1.0 – 1992 • Latest version, HTML 4.01 – 1999 • Fixed collection of markup tags • <head>, <body>, <h1>, <br>, etc…

  5. What is XML? • XML, Extensible Markup Language, is a framework for defining markup languages • Created by the World Wide Web Consortium (W3C) to overcome the limitations of HTML • Like HTML, XML is based on SGML - Standard Generalized Markup Language • XML was designed with the Web in mind!

  6. XML design goals • XML shall be straightforwardly usable over the Internet • XML shall support a wide variety of applications • XML shall be compatible with SGML • It shall be easy to write programs which process XML documents • The number of optional features in XML is to be kept to the absolute minimum, ideally zero

  7. XML design goals • XML documents should be human-legible and reasonably clear • The XML design should be prepared quickly • The design of XML shall be formal and concise • XML documents shall be easy to create • Terseness in XML markup is of minimal importance

  8. Typical XML usages • Web development and content management • Data exchange • Data storage • Configuration files • Web services

  9. Historical outline • The development of XML began in the mid-90s • Initial XML draft – November 1996 • XML 1.0, W3C recommendation – February 1998 • XML 1.1 – February 2004

  10. More about XML • XML lets us define our own tags • Each XML language is targeted to a particular application domain • XML specification says nothing about the semantics of the markup tags • XML is internationalized and platform independent

  11. XML specification • Is located at • XML 1.0: http://www.w3.org/TR/REC-xml/ • XML 1.1:http://www.w3.org/TR/xml11/ • Defines the basic rules for XML documents

  12. Sample XML document <?xml version="1.0" encoding="UTF-8"?> <people> <person id="person_1"> <name>David</name> <surname>Gilmour</surname> </person> <person id="person_2"> <name>Richard</name> <surname>Wright</surname> </person> <person id="person_3"> <name>Nick</name> <surname>Mason</surname> </person> </people>

  13. Examples of XML markups • XHTML • WML - Wireless Markup Language • MathML – Mathematical Markup Language • ebXML - Electronic Business XML • CML - Chemical Markup Language • MusicXML – Musical Scores Markup Language • ThML - Theological Markup Language See more at http://en.wikipedia.org/wiki/List_of_XML_markup_languages

  14. XHTML versus HTML • XHTML 1.0 is W3C’s XMLification of HTML 4.01 • The most notable differences: • HTML allows certain elements to omit the end tag (forbidden in XML) • Element and attribute names must be lowercase • Attribute values in XHTML must be present and they must be surrounded by quotes

  15. XML document rules • The creators of XML decided to enforce document structure from the beginning • The XML specification requires a parser to reject any XML document that doesn't follow the basic rules • A parser is a piece of code that attempts to read a document and interpret its contents

  16. Three kinds of XML documents • Invalid documents • Don't follow the syntax rules defined by XML specification or DTD/schema • Valid documents • Follow both the XML syntax rules and the rules defined in their DTD/schema • Well-formed documents • Follow the XML syntax rules but don't have a DTD/schema

  17. How to check XML document? Easy way to check if XML document is well-formed: • Simply open it in a browser

  18. XML main notions • There are three common terms used to describe parts of an XML document: • tags • elements • attributes <people> <personid="person_1"> <name>David</name> <surname>Gilmour</surname> </person> </people> <people> <person id="person_1"> <name>David</name> <surname>Gilmour</surname> </person> </people> <people> <person id="person_1"> <name>David</name> <surname>Gilmour</surname> </person> </people>

  19. Rule: The root element An XML document must be contained in a single element <?xml version="1.0"?> <!-- A well-formed document --> <greeting> Hello, World! </greeting> <?xml version="1.0"?> <!-- An invalid document --> <greeting> Hello, World! </greeting> <greeting> Hola, el Mundo! </greeting>

  20. Rule: Elements can't overlap Invalid XML documents: <?xml version="1.0"?> <!-- An invalid document --> <person><name>Jonh Brown</person></name> <?xml version="1.0"?> <!-- An invalid document --> <p> <b>My name is <i>John Brown</b>.</i> </p>

  21. Rule: End tags are required • You can't leave out any end tags • If an element contains no markup at all it is called an empty element • In empty elements in XML documents, you can put the closing slash in the start tag <!-- NOT legal XML markup --> <p>My name is John Brown <p>I am 25 years old <p>... <!-- Two equivalent break elements --> <br></br> <br />

  22. Rule: Elements are case sensitive In HTML, <h1> and <H1> are the same; in XML, they're not <!-- NOT legal XML markup --> <Person> Elements are case sensitive </person> <!-- legal XML markup --> <person> Elements are case sensitive </person>

  23. Rule: Quoted attribute values There are two rules for attributes in XML documents: • Attributes must have values • Those values must be enclosed within quotation marks (single or double) <!-- NOT legal XML markup --> <ol compact> <!-- legal XML markup --> <ol compact="yes">

  24. XML declarations • Most XML documents start with an XML declaration that provides basic information about the document to the parser • An XML declaration is recommended, but not required <?xml version="1.0" encoding="UTF-8" standalone="no"?>

  25. XML document as a tree • Conceptually, an XML document is a hierarchical structure called an XML tree • Although there is no consensus on the terminology used on XML trees, at least two standard terminologies exist: • XPath Data Model • XML Information Set http://www.ibm.com/developerworks/xml/library/x-hands-on-xsl/

  26. Namespaces • Different XML languages may use the same tags • Namespaces • a solution for a name clashing problem <?xml version="1.0"?> <customer_summary xmlns:addr="http://www.xyz.com/addresses/" xmlns:books="http://www.zyx.com/books/" xmlns:mortgage="http://www.yyz.com/mortage/"> ... <addr:title>Mrs.</addr:title> ... ... <books:title>Lord of the Rings</books:title> ... ... <mortgage:title>NC2948-388-1983</mortgage:title> ...

  27. Namespaces • XML namespaces are similar to Java packages • The string in a namespace definition looks like a URL, but it’s just a string! • For simplicity, unprefixed element names are assigned a default namespace (xmlns=“ ”) • Can be overridden using a declaration in a form xmlns=“URI”

  28. Defining document content • The elements of particular XML language have to be defined in some way • A schema is a formal definition of the syntax of an XML-based language • Two main schema languages: • DTD • XML Schema

  29. DTD - Document Type Definition • Built-in schema language since the first XML working draft • DTD is not itself written in XML notations <!-- address.dtd --> <!ELEMENT address (name, street, city, postal-code)> <!ELEMENT name (title? first-name, last-name)> <!ELEMENT title (#PCDATA)> <!ELEMENT first-name (#PCDATA)> <!ELEMENT last-name (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT postal-code (#PCDATA)>

  30. Document Type Declaration • An XML document may contain a reference to a DTD schema • XHTML documents often contain: <?xml version="1.1"> <!DOCTYPE people SYSTEM "http://www.music.com/people.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

  31. DTD – Element declaration • An element declaration looks as follows: <!ELEMENT element-namecontent-model> • Content model defines the validity requirements of the contents (the sequence of its immediate child nodes) of all elements of the given name

  32. DTD – Content model Constructs used in content model description:

  33. DTD: example <!ELEMENT people (person+)> <!ELEMENT person (name, surname, birthdate?, address*)> <!ELEMENT name (#PCDATA)> <!ELEMENT surname (#PCDATA)> <!ELEMENT birthdate (#PCDATA)> <!ELEMENT address (#PCDATA)>

  34. DTD: Attribute-List declarations • An attribute-list declarations looks as follows: <!ATTLIST element-name attribute-definitions> • attribute-definitions is a list, each element in a form: attribute-name attribute-type default-declaration • Default declarations:

  35. <!ELEMENT rectangle EMPTY> <!ATTLIST rectangle length CDATA "0px" width CDATA "0px"> <rectangle width="80px" length="40px"/> <!ELEMENT img EMPTY> <!ATTLIST img alt CDATA #REQUIRED src CDATA #REQUIRED width CDATA #IMPLIED height CDATA #IMPLIED> <img src="xmlj.jpg" alt="XMLJ Image" width="300"/> <!ELEMENT address (#PCDATA)> <!ATTLIST address country CDATA #FIXED "USA"> <address country="USA"> 123 15th St. Troy NY 12180</ADDRESS> DTD: examples

  36. XML Schema • Shortly after XML 1.0, the W3C initiated the development of the next generation schema language to attack the problems with DTD • Some judicious guiding design principles, that the new schema language should be: • More expressive that XML DTD • Expressed in XML • Self-describing • Simple enough

  37. XML Schema Specification • Published in 2001 • Specification consist of the following parts: • Part 0 - Primer: http://w3.org/TR/xmlschema-0 • Part 1 - Document structures: http://w3.org/TR/xmlschema-1 • Part 2 - Datatypes: http://w3.org/TR/xmlschema-2

  38. XML Schema • Unfortunately, the resulting language does not fulfill the original requirement • Although it provides good support for namespaces, modularization and datatypes, but • It is not simple – Part 1 alone is more than 160 pages, and even XML experts do not find it human-readable • It is not fully self-describing – there is a schema for XML Schema, but it doesn’t capture all syntactical aspects of the language

  39. XML Schema advantages Several advantages over DTDs • XML schemas use XML syntax • You can process a schema just like any other document • XML schemas support datatypes • Integers, floating point numbers, dates, times, strings, URLs • XML schemas are extensible • User-defined datatypes, derived datatypes • XML schemas have more expressive power • XML schemas support namespaces

  40. XSD An XML Schema instance is an XML Schema Definition (XSD) and typically has the filename extension ".xsd" <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="country" type="Country"/> <xsd:complexType name="Country"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="population" type="xsd:decimal"/> </xsd:sequence> </xsd:complexType> </xsd:schema>

  41. Example: people.xsd <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xsd:element name="people" type="peopleType"/> <xsd:complexType name="peopleType"> <xsd:sequence maxOccurs="unbounded"> <xsd:element name="person" type="personType"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="personType"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="surname" type="xsd:string"/> </xsd:sequence> <xsd:attribute name="id" type="xsd:string"/> </xsd:complexType> </xsd:schema>

  42. Declaring XML Schema To declare that people.xml uses people.xsd schema, need to add the following: <?xml version="1.0" encoding="UTF-8"?> <!–- schema is located in the same folder --> <people xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="people.xsd"> . . . </people> <?xml version="1.0" encoding="UTF-8"?> <!–- schema location specified as URL --> <people xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation= "http://www.ante.lv/lab01-music-serverside/data/people.xsd"> . . . </people>

  43. XML Schema: Defining elements • To define an element is to define its name and content model (type) • A type can be simple or complex • A simple type cannot contain elements or attributes in its value • A complex type can create the effect of embedding elements in other elements or it can associate attributes with an element

  44. Simple, non-nested elements An element that does not contain attributes or other elements can be defined to be of a • simple type • predefined • user-defined <element name='name' type='string'/> <element name='birthday' type='date'/> <element name='age' type='integer'/> <element name='price' type='decimal'/> http://www.ibm.com/developerworks/xml/library/xml-schema/sidetable2.html

  45. Complex types • Elements with attributes must have a complex type • Elements that embed other elements must have a complex type <complexType name="personType"> <sequence> <element name="name" type="string"/> <element name="surname" type="string"/> </sequence> <attribute name="id" type="string"/> </complexType>

  46. Expressing constraints on elements • XML Schema offers greater flexibility than DTD for expressing constraints on the content model of elements • For example, element occurrence definition: • DTD: * + ? • XML Schema: • maxOccurs • minOccurs <element name='Book'> <complexType> <element ref='Title' minOccurs='0'/> <element ref='Author' maxOccurs='2'/> </complexType> </element>

  47. XML validation • Online XML validator against XML Schema: http://tools.decisionsoft.com/schemaValidate/ • Java API also provides a way to make a XML parser validate a document

  48. XML processing APIs • The three basic XML parsing interfaces are: • Document Object Model (DOM) • Simple API for XML (SAX) • Streaming API for XML (StAX) • Java API for XML Processing (JAXP) • Provides common interfaces for processing XML documents (using DOM, SAX or StAX) • XML to Java classes binding • Java Architecture for XML Binding (JAXB) • Digester

  49. DOM • The Document Object Model defines a set of interfaces to the parsed version of an XML document • The parser reads in the entire document and builds an in-memory tree • Your code can then use the DOM interfaces to manipulate the tree

  50. DOM Using DOM API you can • move through the tree to see what the original document contained • delete sections of the tree • rearrange the tree • add new branches • and so on . . .

More Related