1 / 66

Processing XML with Java

Processing XML with Java. Representation and Management of Data on the Internet. XML. XML is eXtensible Markup Language It is a metalanguage: A language used to describe other languages using “markup” tags that describe properties of the data Designed to be structured

sun
Download Presentation

Processing XML with Java

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processing XML with Java Representation and Management of Data on the Internet

  2. XML • XML is eXtensible Markup Language • It is a metalanguage: • A language used to describe other languages using “markup” tags that describe properties of the data • Designed to be structured • Strict rules about how data can be formatted • Designed to be extensible • Can define own terms and markup

  3. XHTML XML Family • XML is an official recommendation of the W3C • Aims to accomplish what HTML cannot and be simpler to use and implement than SGML HTML XML SGML

  4. The Essence of XML • Syntax: The permitted arrangement or structure of letters and words in a language as defined by a grammar (XML) • Semantics:The meaning of letters or words in a language • XML uses Syntax to add Semantics to the documents

  5. Using XML • In XML there is a separation of the content from the display • XML can be used for: • Data representation • Data exchange

  6. Databases and XML • Database content can be presented in XML • XML processor can access DBMS or file system and convert data to XML • Web server can serve content as either XML or HTML

  7. <B><I>improper nesting</B></I> <B><I>proper nesting</I></B> allow start tags, without end tags like <BR> empty tags must have a trailing slash, as in <BR/> <font color=blue>unquoted attribute values</font> <font color=“blue">quoted attribute values</font> <B>HTML is case insensitive</b> <b>XML is case sensitive</b> Whitespace is ignored Whitespace is important Begins with <html> Begins with <?xml version=‘1.0’ ?> HTML vs. XML HTML XML

  8. Well defined set of tags Can use any tag you like tags have a known meaning tags have no known meaning HTML vs. XML HTML XML

  9. Some Things in Common • Comments are allowed - <!-- … --> • Special characters must be escaped (e.g., &gt; for >)

  10. Processing XML – The Idea

  11. Sample Document <transaction> <account>89-344</account> <buy shares=“100”> <ticker exch=“NASDAQ”>WEBM</ticker> </buy> <sell shares=“30”> <ticker exch=“NYSE”>GE</ticker> </sell> </transaction>

  12. DOM Parser • DOM = Document Object Model • Parser creates a tree object out of the document • User accesses data by traversing the tree • The API allows for constructing, accessing and manipulating the structure and content of XML documents

  13. Document as Tree Methods like: getRoot getChildren getAttributes etc. transaction account buy sell 89-344 shares shares ticker ticker 100 30 exch exch NASDAQ NYSE WEBM GE

  14. Advantages and Disadvantages • Advantages: • Natural and relatively easy to use • Can repeatedly traverse tree • Disadvantages: • High memory requirements – the whole document is kept in memory • Must parse the whole document before use

  15. SAX Parser • SAX = Simple API for XML • Parser creates “events” while traversing tree • Parser calls methods (that you write) to deal with the events • Similar to an IOStream, goes in one direction

  16.  End tag: account Start tag: transaction Start tag: account Text: 89-344 Value: 100  Start tag: buy Attribute: shares Document as Events <transaction> <account>89-344</account> <buy shares=“100”> <ticker exch=“NASDAQ”>WEBM</ticker> </buy> <sell shares=“30”> <ticker exch=“NYSE”>GE</ticker> </sell> </transaction>

  17. Advantages and Disadvantages • Advantages: • Requires little memory • Fast • Disadvantages: • Cannot reread • Less natural for object oriented programmers (perhaps)

  18. Which should we use?DOM vs. SAX • If your document is very large and you only need a few elements - use SAX • If you need to manipulate (i.e., change) the XML - use DOM • If you need to access the XML many times - use DOM

  19. XML Parsers

  20. XML Parsers • There are several different ways to categorise parsers: • Validating versus non-validating parsers • DOM parsers versus SAX parsers • Parsers written in a particular language (Java, C++, Perl, etc.)

  21. Validating Parsers • A validating parser makes sure that the document conforms to the specified DTD • This is time consuming, so a non-validating parser is faster

  22. Using an XML Parser • Three basic steps • Create a parser object • Pass the XML document to the parser • Process the results • Generally, writing out XML is not in the scope of parsers (though some may implement proprietary mechanisms)

  23. SAX – Simple API for XML

  24. The SAX Parser • SAX parser is an event-driven API • An XML document is sent to the SAX parser • The XML file is read sequentially • The parser notifies the class when events happen, including errors • The events are handled by the implemented API methods to handle events that the programmer implemented

  25. Handles document events: start tag, end tag, etc. Used to create a SAX Parser Handles Parser Errors Handles DTDs and Entities

  26. Problem • The SAX interface is an accepted standard • There are many implementations • Like to be able to change the implementation used without changing any code in the program • How is this done?

  27. Factory Design Pattern • Have a “Factory” class that creates the actual Parsers. • The Factory checks the value of a system property that states which implementation should be used • In order to change the implementation, simply change the system property

  28. Creating a SAX Parser • Import the following packages: • org.xml.sax.*; • org.xml.sax.helpers.*; • Set the following system property: • System.setProperty("org.xml.sax.driver", "org.apache.xerces.parsers.SAXParser"); • Create the instance from the Factory: • XMLReader reader = XMLReaderFactory.createXMLReader();

  29. Receiving Parsing Information • A SAX Parser calls methods such as “startDocument”, “startElement”, etc., as it runs • In order to react to such events we must: • implement the ContentHandler interface • set the parser’s content handler with an instance of our class

  30. ContentHandler // Methods (partial list) public void startDocument(); public void endDocument(); public void characters(char[] ch, int start, int length); public void startElement(String namespaceURI, String localName, String qName, Attributes atts); public void endElement(String namespaceURI, String localName, String qName);

  31. Namespaces and Element Names <?xml version='1.0' encoding='utf-8'?> <forsale date="12/2/03" xmlns:xhtml = "urn:http://www.w3.org/1999/xhtml"> <book> <title> <xhtml:em> DBI: </xhtml:em> The Course I Wish I never Took </title> <comment> My <xhtml:b> favorite </xhtml:b> book! </comment> </book> </forsale>

  32. Namespaces and Element Names namespaceURI = "" localName = book qName = book <?xml version='1.0' encoding='utf-8'?> <forsale date="12/2/03" xmlns:xhtml = "urn:http://www.w3.org/1999/xhtml"> <book> <title> <xhtml:em> DBI: </xhtml:em> The Course I Wish I never Took </title> <comment> My <xhtml:b> favorite </xhtml:b> book! </comment> </book> </forsale> namespaceURI = urn:http://www.w3.org/1999/xhtml localName = em qName = xhtml:em

  33. Receiving Parsing Information (cont.) • An easy way to implement the ContentHandler interface is the extend the DefaultHandler, which implements this interface (and a few others) in an empty fashion • To actually parse a document, create an InputSource from the document and supply the input source to the parse method of the XMLReader

  34. import java.io.*; • import org.xml.sax.*; • import org.xml.sax.helpers.*; • public class InfoWithSax extends DefaultHandler { • public static void main(String[] args) { • System.setProperty("org.xml.sax.driver", • "org.apache.xerces.parsers.SAXParser"); • try { • XMLReader reader = • XMLReaderFactory.createXMLReader(); • reader.setContentHandler(new InfoWithSax()); • reader.parse(new InputSource(new FileReader(args[0]))); • } catch(Exception e) { e.printStackTrace()} • }

  35. public static startDocument() throws SAXException { System.out.println(“START DOCUMENT”); } public static endDocument() throws SAXException { System.out.println(“END DOCUMENT”); } int depth; String indent = “ ”; private void println(String header, String value) { for (int i = 0 ; i < depth ; i++) System.out.print(indent); System.out.println(header + ": " + value); }

  36. public void characters(char buf[], int offset, int len) throws SAXException { String s = (new String(buf, offset, len)).trim(); if (!"".equals(s)) println("CHARACTERS", s); } public void endElement(String namespaceURI, String localName, String name) throws SAXException { depth--; String elementName = name; if (!"".equals(namespaceURI) && !"".equals(localName)) elementName = namespaceURI + ":" + localName; println("END ELEMENT", elementName); }

  37. public static startElement(String namespaceURI, String localName, String name, Attributes attrs) throws SAXException { String elementName = name; if (!"".equals(namespaceURI) && !"".equals(localName)) elementName = namespaceURI + ":" + localName; println("START ELEMENT", elementName); if (attrs != null && attrs.getLength() > 0) { for (int i = 0; i < attrs.getLength(); i++) println("ATTRIBUTE", attrs.getLocalName(i) + “=” + attrs.getValue(i)); } depth++; } Example Input Example Output

  38. Bachelor Tags • What do you think happens when the parser parses a bachelor tag? <rating stars="five" />

  39. Attributes Interface • Elements may have attributes • There is no distinction between attributes that are defined explicitly from those that are specified in the DTD (with a default value)

  40. Attributes Interface (cont.) • int getLength(); • String getQName(int i); • String getType(int i); • String getValue(int i); • String getType(String qname); • String getValue(String qname); • etc.

  41. Attributes Types • The following are possible types for attributes: • "CDATA", • "ID", • "IDREF", "IDREFS", • "NMTOKEN", "NMTOKENS", • "ENTITY", "ENTITIES", • "NOTATION"

  42. Setting Features • It is possible to set the features of a parser using the setFeature method. • Examples: • reader.setFeature(“http://xml.org/sax/features/namespaces”, true) • reader.setFeature(“http://xml.org/sax/features/validation", false) • For a full list, see: http://www.saxproject.org/?selected=get-set

  43. ErrorHandler Interface • We implement ErrorHandler to receive error events (similar to implementing ContentHandler) • DefaultHandler implements ErrorHandler in an empty fashion, so we can extend it (as before) • An ErrorHandler is registered with • reader.setErrorHandler(handler); • Three methods: • void error(SAXParseException ex); • void fatalError(SAXParserExcpetion ex); • void warning(SAXParserException ex);

  44. public void warning(SAXParseException err) throws SAXException { System.out.println(“Warning in line” + err.getLineNumber() + “ and column ” + err.getColumnNumber()); } public void error(SAXParseException err) throws SAXException { System.out.println(“Oy va’avoi, an error!”); } public void fatalError(SAXParseException err) throws SAXException { System.out.println(“OY VA’AVOI, a fatal error!”); } Extending the InfoWithSax Program Will these methods be called in the case of a problem?

  45. Lexical Events • Lexical events have to do with the way that a document was written and not with its content • Examples: • A comment is a lexical event (<!-- comment -->) • The use of an entity is a lexical event (&gt;) • These can be dealt with by implementing the LexicalHandler interface, and set on a parser by • reader.setProperty("http://xml.org/sax/properties/ lexical-handler",  mylexicalhandler);     

  46. LexicalHandler // Methods (partial list) public void startEntity(String name); public void endEntity(String name); public void comment(char[] ch, int start, int length); public void startCDATA(); public void endCDATA();

  47. DOM – Document Object Model

  48. Creating a DOM Tree • How can we create a DOM Tree independently of the implementation chosen? • Creating a DOM Tree using the Apache Xerces package: • Import: org.apache.xerces.parsers.DOMParser • Import: org.w3c.dom.*; • Use the following lines of code: DOMParser dom = new DOMParser(); dom.parse(fileName); Document doc = dom.getDocument();

  49. API Application XML File DOM Parser DOM Tree Using a DOM Tree

  50. Figure as appears in : “The XML Companion” - Neil Bradley NodeList NamedNodeMap Nodes in a DOM Tree DocumentFragment Document Text CDATASection CharacterData Comment Attr Element Node DocumentType Notation Entity EntityReference ProcessingInstruction DocumentType

More Related