1 / 52

Programming with XML

Programming with XML. Written by: Adam Carmi Zvika Gutterman. Agenda. About XML Review of XML syntax Document Object Model (DOM) JAXP W3C XML Schema Validating Parsers. About XML. XML – E X tensible M arkup L anguage Designed to describe data

herve
Download Presentation

Programming with XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming with XML Written by: Adam Carmi Zvika Gutterman

  2. Agenda • About XML • Review of XML syntax • Document Object Model (DOM) • JAXP • W3C XML Schema • Validating Parsers

  3. About XML • XML – EXtensible Markup Language • Designed to describe data • Provides semantic and structural information • Extensible • Human readable and computer-manipulable • Software and Hardware independent • Open and Standardized by W3C1 • Ideal for data exchange • World Wide Web Consortium (founded in 1994 by Tim Berners-Lee)

  4. offenders.xml Information is marked up with structural and semantic information. XML tags are not pre-defined. <offenders> <!-- Lists all traffic offenders --> <offender id="024378449 "> <firstName> David </firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> Ran a red light at Arik &amp; Bentz st. </violation> </offender> </offenders> Comment Tag Character Data Character Data

  5. offenders.xml: Tags XML tags are case sensitive. An XML document may have only one root tag. <offenders> <!-- Lists all traffic offenders --> <offender id="024378449 "> <firstName> David </firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> Ran a red light at Arik &amp; Bentz st. </violation> </offender> </offenders> Root Tag Start Tag Shorthand for: <code num=...></code> End Tag

  6. offenders.xml: Elements Elements mark-up information. Element x begins with a start-tag <x> and ends with an end-tag </x> XML Elements must be properly nested: <x>...<y>...</y>...</x> XML documents must contain exactly one root element. <offenders> <!-- Lists all traffic offenders --> <offender id="024378449 "> <firstName> David </firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> Ran a red light at Arik &amp; Bentz st. </violation> </offender> </offenders> Root Element

  7. offenders.xml: Content The content of an element is all the text that lies between its start and end tags. An XML parser is required to pass all characters in a document, including whitespace characters. <offenders> <!--Listsalltrafficoffenders--> <offender id="024378449"> <firstName>David</firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> RanaredlightatArik&amp;Benz st. </violation> </offender> </offenders> whitespace

  8. offenders.xml: Attributes Attributes are used to provide additional information about elements. Element values must always be enclosed in quotes (“/‘) The characters &, <, >, ‘, “ are reserved and can’t be used in character data. Use &amp;, &lt;, &gt;, &apos; and &quot; instead. <offenders> <!--Listsalltrafficoffenders--> <offender id="024378449"> <firstName>David</firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> Ranaredlightat Arik &amp; Benz st. </violation> </offender> </offenders>

  9. DOMTM • DOMTM – Document Object Model • A Standard hierarchy of objects, recommended by the W3C, that corresponds to XML documents. • Each element, attribute, comment, etc., in an XML document is represented by a Node in the DOM tree. • The DOM API1 allows data in an XML document to be accessed and modified by manipulating the nodes in a DOM tree. • Application Programming Interface

  10. :Text  :Text  :Text  :Text  offenders.xml: DOM tree :Document :Element offenders :Comment Listsalltrafficoffenders :Element offender :Attribute id :Text 024378449 :Element firstName :Text David

  11. :Text  Example: offenders DOM :Element lastName The element “middleName” was skipped :Text Harel :Element violation :Attribute id :Text 12 offenders offender :Text  :Element code :Attribute num :Text 232 :Text  :Attribute category :Text traffic :Element issueDate :Text 2001-11-02

  12. :Text  :Text  Example: offenders DOM :Text  offenders offender violation :Element issueTime :Text 10:32:00 :Text Ranaredlight atArik&Benzst.

  13. DOM Class Hierarchy1 <<interface>> NodeList <<interface>> Node <<interface>> NamedNodeMap <<interface>> Document <<interface>> CharacterData <<interface>> Element <<interface>> Text <<interface>> Comment • A partial class hierarchy is presented in this slide.

  14. JAXP • JAXP – JavaTM API for XML Processing • JAXP enables applications to parse and transform XML documents using an API that is independent of a particular XML processor implementation. • JAXP provides two parser types: • SAX1 parser: event driven • DOM document builder: constructs DOM trees by parsing XML documents. • Simple API for XML

  15. Creating a DOM Builder • Create a DocumentBuilderFactory object:DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); • Configure the factory object:dbf.setIgnoringComments(true); • Create a builder instance using the factory:DocumentBuilder docBuilder = dbf.newDocumentBuilder(); A ParserConfigurationException is thrown if a DocumentBuilder cannot be created which satisfies the configuration requested.

  16. Building a DOM Document • A DOM document can be built manually from within the application:Document doc = docBuilder.newDocument();Element offenders = doc.createElement("offenders");doc.appendChild(offenders);Element offender = doc.createElement("offender");offender.setAttribute("id", "024378449 ");offenders.appendChild(offender);Element firstName = doc.createElement(“firstName”);Text text = doc.createTextNode(“ David “);firstName.appendChild(text);... A DOMException is raised if an illegal character appears in a name, an illegal child is appended to a node etc.

  17. Building a DOM Document • A DOM representation of an XML document can be built automatically by parsing the XML document:Document doc = docBuilder.parse(new File(xmlFile)); A SAXParseException or SAXException is raised to report parse errors.

  18. DumpDom.java (1 of 5) import org.w3c.dom.Document; import org.w3c.dom.NodeList; import org.w3c.dom.NamedNodeMap; import org.w3c.dom.Node; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.ParserConfigurationException; import java.io.File; import java.io.IOException; Creating and traversing a DOM document

  19. DumpDom.java (2 of 5) public class DumpDom { private int indent = 0; // text indentation level public DumpDom(String xmlFile) { try { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder = dbf.newDocumentBuilder(); Document doc = docBuilder.parse(new File(xmlFile)); recursiveDump(doc); } catch (ParserConfigurationException pce) { System.err.println("Failed to create document builder"); } catch (SAXParseException spe) { System.err.println("Error: Line=" + spe.getLineNumber() + ": " + spe.getMessage()); } catch (SAXException se) { System.err.println("Parse error found: " + se); } catch (IOException e) { e.printStackTrace(); } }

  20. private void recursiveDump(Node node) { switch (node.getNodeType()) { case Node.DOCUMENT_NODE: dumpNode("document", node); break; case Node.COMMENT_NODE: dumpNode("comment", node); break; case Node.ATTRIBUTE_NODE: dumpNode("attribute", node); break; case Node.TEXT_NODE: dumpNode("text", node); break; case Node.ELEMENT_NODE: dumpNode("element", node); indent += 2; DumpDom.java (3 of 5)

  21. NamedNodeMap atts = node.getAttributes(); for (int i = 0 ; i < atts.getLength() ; ++i) recursiveDump(atts.item(i)); indent -= 2; break; default: System.err.println("Unknown node: " + node); System.exit(1); } // print children of the input node (if there are any) indent+=2; for (Node child = node.getFirstChild() ; child != null ; child = child.getNextSibling()) { recursiveDump(child); } indent-=2; } DumpDom.java (4 of 5)

  22. DumpDom.java (5 of 5) private void dumpNode(String type, Node node) { for (int i = 0 ; i < indent ; ++i) System.out.print(" "); System.out.print("[" + type + "]: "); System.out.print(node.getNodeName()); if (node.getNodeValue() != null) System.out.print("=\"" + node.getNodeValue() + "\""); System.out.print("\n"); } public final static void main(String[] args) { DumpDom dumper = new DumpDom(args[0]); } }

  23. XML Schema • The purpose of an XML Schema is to define a class of XML documents. • An XML document that is syntactically correct is considered well formed. If it also conforms to a XML schema is considered valid. • A XML document is not required to have a corresponding Schema. • XML Schemas are expected to replace the DTD1 as the primary means of describing document structure. • Document Type Definition (uses EBNF form)

  24. XML Schema (cont.) • XML Schema documents are themselves XML documents. • Can be manipulated as such • XML Schema is a language with a XML syntax. • A XML document may explicitly reference the schema document that validates it. • A schema language is validated by a DTD. • Several schema models exist. In this course we will use the W3C XML Schema1. • W3C recommendation since 2001

  25. W3C XML Schema <xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”> ... </schema> • A W3C XML Schema consists of a schema element and a variety of sub-elements which determine the appearance of elements and their content in instance documents • Each of the elements (and predefined simple types) in the schema has (by convention) a prefix xsd:which is associated with the W3C XML schema namespace.

  26. Elements & Attribute Declarations • Elements are declared using the element element:<xsd:element name=“firstName” type=“xsd:NMTOKEN”/><xsd:element name=“offenders” type=“Offenders”/> • Attributes are declared using the attribute element:<xsd:attribute name=“id” type=“xsd:positiveInteger”/> A pre-defined (simple) type

  27. Element & Attribute Types • Elements that contain sub-elements or carry attributes are said to have complex types. • Elements that contain only text (e.g. numbers, strings, dates etc.) but do contain any sub-elements are said to have simple types. • Attributes always have simple types. • Many simple types (e.g. string, date, integer etc.) are pre-defined.

  28. A Few Built in Simple Types • Should only be used as attribute types

  29. Derived Simple Types • New simple types may be defined by deriving them from existing simple types (build-in and derived) • New simple types are derived by restricting the range of permitted values for an existing simple type. • A new simple type is defined using the simpleType element.

  30. Derived Simple Types (cont.) • Example: Numeric Restriction<xsd:simpleType name="ViolationID"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="100"/> </xsd:restriction></xsd:simpleType> • Example: Enumeration<xsd:simpleType name="ViolationCategory"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="traffic"/> <xsd:enumeration value="criminal"/> <xsd:enumeration value="civil"/> </xsd:restriction></xsd:simpleType>

  31. Complex Types • Complex types are defined using the complexType element. • Elements with complex types may carry attributes. • The content of elements with complex types is categorized as follows: • Empty: no content is allowed. • Simple: content must be of simple type. • Element: content must include only child elements. • Mixed: both element and character content is allowed.

  32. Complex Types: Attributes • Attributes may be declared, using the use attribute, as required, optional (default) or prohibited. • Default values for attributes are declared using the default attribute • Allowed only for optional attributes • The fixed attribute is used to ensure that an attribute is set to a particular value. • Appearance of the attribute is optional. • fixed and use are mutually exclusive.

  33. Complex Types: Attributes (cont.) • Example: use, fixed <xsd:complexType name="Code"> <xsd:attribute name="num" type="ViolationID“ use="required"/> <xsd:attribute name="category" type="ViolationCategory“ fixed="traffic"/> </xsd:complexType> • Example: use, default <xsd:complexType name="IssueTime"> ... <xsd:attribute name="accuracy" type="Accuracy" use="optional" default="accurate"/> ... </xsd:complexType>

  34. Complex Types: Empty Content • Example: schema <xsd:complexType name="Code"> <xsd:attribute name="num" type="ViolationID" use="required"/> <xsd:attribute name="category" type="ViolationCategory“ fixed="traffic"/> </xsd:complexType> • Example: instance document <code num="232" category="traffic"/> <code num="232" category="traffic"></code> <code num="232"/>

  35. Complex Types: Simple Content • Example: element with no attributes <xsd:element name="firstName" type="xsd:NMTOKEN"/> • Example: element with attributes <xsd:complexType name="IssueTime"> <xsd:simpleContent> <xsd:extension base="xsd:time"> <xsd:attribute name="accuracy" type="Accuracy" use="optional" default="accurate"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> Simple type

  36. Complex Types: Element Content • Element Occurrence Constraints • The minimum number of times an element may appear is specified by the value of the optional attribute minOccurs. • The maximum number of times an element may appear is specified by the value of the optional attribute maxOccurs. • The value unbounded indicates that there maximum number of occurrences is unbounded. • The default value of minOccurs and maxOccurs is 1.

  37. Complex Types: Element Content (cont.) • The attribute sequence is used to specify a sequence of sub-elements. • Elements must appear in the same order that they are declared. <xsd:complexType> <xsd:sequence> <xsd:element name="firstName" type="xsd:NMTOKEN"/> <xsd:element name="middleName" type="xsd:NMTOKEN“ minOccurs="0"/> <xsd:element name="lastName" type="xsd:NMTOKEN"/> <xsd:element name="violation" type="Violation“ minOccurs="0" maxOccurs="unbounded"/> ... </xsd:sequence> ... </xsd:complexType>

  38. Complex Types: Mixed Content • The optional Boolean attribute mixed is used to specify mixed content: <xsd:complexType name="Violation" mixed="true"> <xsd:sequence> <xsd:element name="code" type="Code"/> <xsd:element name="issueDate" type="xsd:date"/> <xsd:element name="issueTime" type="IssueTime"/> </xsd:sequence> ... </xsd:complexType>

  39. Global Elements/Attributes • Global elements and global attributes are created by declarations that appear as the children of the schema element. • A global element is allowed to appear as the root element of an instance document. • The attribute ref of element/attribute elements may be used (instead of the name attribute)to reference a global element/attribute. • Cardinality constraints cannot be placed on global declarations, although they can be placed on local declarations that reference global declarations.

  40. Global Elements/Attributes (cont.) • Example: global declarations <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="offenders" type="Offenders"/> <xsd:element name="comment" type="xsd:string"/> <xsd:attribute name="id" type="xsd:positiveInteger"/> ... • Example: ref attribute <xsd:element ref="comment" minOccurs="0"/> <xsd:attribute ref="id" use="required"/>

  41. Anonymous Type Definitions • When a type is referenced only once, or contains very few constraints, it can be more succinctly defined as an anonymous type. • Saves the overhead of naming the type and explicitly referencing it.

  42. Anonymous Type Definitions (cont.) <xsd:element name="offender" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="firstName" type="xsd:NMTOKEN"/> <xsd:element name="middleName" type="xsd:NMTOKEN“ minOccurs="0"/> <xsd:element name="lastName" type="xsd:NMTOKEN"/> <xsd:element name="violation" type="Violation“ minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="comment" minOccurs="0"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/> </xsd:complexType> </xsd:element> Is this a global declaration? Anonymous

  43. offenders.xsd (1 of 4) Schema for offenders XML documents <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="offenders" type="Offenders"/> <xsd:element name="comment" type="xsd:string"/> <xsd:attribute name="id" type="xsd:positiveInteger"/> <xsd:complexType name="IssueTime"> <xsd:simpleContent> <xsd:extension base="xsd:time"> <xsd:attribute name="accuracy" type="Accuracy" use="optional" default="accurate"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> <xsd:complexType name="Code"> <xsd:attribute name="num" type="ViolationID" use="required"/> <xsd:attribute name="category" type="ViolationCategory" fixed="traffic"/> </xsd:complexType>

  44. offenders.xsd (2 of 4) <xsd:complexType name="Offenders"> <xsd:sequence> <xsd:element name="offender" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="firstName" type="xsd:NMTOKEN"/> <xsd:element name="middleName" type="xsd:NMTOKEN“ minOccurs="0"/> <xsd:element name="lastName" type="xsd:NMTOKEN"/> <xsd:element name="violation" type="Violation" minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="comment" minOccurs="0"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType>

  45. offenders.xsd (3 of 4) <xsd:complexType name="Violation" mixed="true"> <xsd:sequence> <xsd:element name="code" type="Code"/> <xsd:element name="issueDate" type="xsd:date"/> <xsd:element name="issueTime" type="IssueTime"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/> </xsd:complexType> <xsd:simpleType name="ViolationID"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="100"/> </xsd:restriction> </xsd:simpleType>

  46. offenders.xsd (4 of 4) <xsd:simpleType name="ViolationCategory"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="traffic"/> <xsd:enumeration value="criminal"/> <xsd:enumeration value="civil"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="Accuracy"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="accurate"/> <xsd:enumeration value="approx"/> </xsd:restriction> </xsd:simpleType> </xsd:schema>

  47. Validating Parsers • A validating parser is capable of reading a Schema specification or DTD and determine whether or not XML documents conform to it. • A non validating parser is capable of reading a Schema / DTD but cannot check XML documents for conformity. • Limited to syntax checking

  48. Creating a Validating DOM Parser • Create a DocumentBuilderFactory object: DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); • Configure the factory object to produce a validating parser: dbf.setAttribute("http://java.sun.com/xml/jaxp/properties" "/schemaLanguage", "http://www.w3.org/2001/XMLSchema"); dbf.setAttribute("http://java.sun.com/xml/jaxp/properties" "/schemaSource", new File(xmlSchema)); dbf.setValidating(true); • Create a builder instance and set its error-handler:DocumentBuilder docBuilder = dbf.newDocumentBuilder();docBuilder.setErrorHandler(new MyErrorHandler());

  49. Handling Parsing Errors • By default, JAXP parsers do not throw exceptions when documents are found to be invalid. • JAXP provides the interface ErrorHandler so that users will be able to implement their own error-handling semantics.

  50. BoundedErrorPrinter.java (1 of 3) import org.xml.sax.ErrorHandler; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; /** * An error handler that prints to the standard error stream a specified * number of errors. Once the specified number of errors is detected, * parsing is aborted. */ public class BoundedErrorPrinter implements ErrorHandler { private int errorCount = 0; private int errorsToPrint; public BoundedErrorPrinter(int errorsToPrint) { this.errorsToPrint = errorsToPrint; }

More Related