1 / 31

3.4 Streaming API for XML ( StAX )

3.4 Streaming API for XML ( StAX ). Could we process XML documents more conveniently than with SAX, and yet more efficiently? A: Yes, with Streaming API for XML ( StAX ) general introduction an example comparison with SAX. StAX : General. Latest of standard Java XML parser interfaces

aleta
Download Presentation

3.4 Streaming API for XML ( StAX )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3.4 Streaming API for XML (StAX) • Could we process XML documents more conveniently than with SAX, and yet more efficiently? • A: Yes, with Streaming API for XML (StAX) • general introduction • an example • comparison with SAX 3.4 Streaming API for XML

  2. StAX: General • Latest of standard Java XML parser interfaces • Origin: the XMLPull API (A. Slominski, ~ 2000) • developed as a Java Community Process lead by BEA Systems (2003) • included in JAXP 1.4, in Java WSDP 1.6, and in Java SE 6 (JDK 1.6) • An event-driven streaming API, like SAX • does not build in-memory representation • A "pull API" • lets the application to ask for individual events • unlike a "push API" like SAX 3.4 Streaming API for XML

  3. Advantages of PullParsing • A pull APIprovidesevents, on demand, from the chosenstream • cancancelparsing, say, afterprocessing the header of a long message • canreadmultipledocumentssimultaneously • application-controlledaccess (~ iterator design pattern) usuallysimplerthanSAX-stylecall-backs (~ observer design pattern) 3.4 Streaming API for XML

  4. Cursor and IteratorAPIs • StAXconsists of twosets of APIs • (1)cursorAPIs, and (2) iteratorAPIs • differbyrepresentation of parseevents • (1) cursor API XMLStreamReader • lower-level • methodshasNext() andnext() to scanevents, represented by as int constants START_DOCUMENT, START_ELEMENT, ... • access methods, depending on current event type: • getName(), getAttributeValue(..), getText(), ... 3.4 Streaming API for XML

  5. (2) XMLEventReaderIterator API • XMLEventReader provides contents of an XML document to the application using an event objectiterator • Parseeventsrepresented as immutableXMLEventobjects • receivedusingmethodshasNext()and nextEvent() • eventpropertiesaccessedthroughtheirmethods • canbestored (ifneeded) • requiremoreresourcesthan the cursor API (Seelater) • Eventlookahead, withoutadvancing in the stream, withXMLEventReader.peek() and XMLStreamReader.getEventType() 3.4 Streaming API for XML

  6. WritingAPIs • StAX is a bidirectionalAPI • allowsalso to write XML data • through an XMLStreamWriteror anXMLEventWriter • Useful for "marshaling" data structures into XML • Writersarenotrequired to forcewell-formedness (not to mentionvalidity) • providesomesupport: escaping of reservedcharslike & and <, and addingunclosedend-tags 3.4 Streaming API for XML

  7. Example of Using StAX (1/6) • Use StAXiteratorinterfaces to • fold element tagnames to uppercase, and to • strip comments • Outline: • Initialize • an XMLEventReader for the input document • an XMLEventWriter (for System.out ) • an XMLEventFactory for creating modified StartElement and EndElement events • Use them to read all input events, and to write some of them, possibly modified 3.4 Streaming API for XML

  8. StAX example (2/6) • First import relevant interfaces & classes: importjava.io.*; importjavax.xml.stream.*; importjavax.xml.stream.events.*; importjavax.xml.namespace.QName; public class capitalizeTags { public static void main(String[] args) throws FactoryConfigurationError,XMLStreamException,IOException { if (args.length != 1) System.exit(1); InputStream input = new FileInputStream(args[0]); 3.4 Streaming API for XML

  9. StAX example (3/6) • Initialize XMLEventReader/Writer/Factory: XMLInputFactoryxif = XMLInputFactory.newInstance(); xif.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, true); XMLEventReaderxer = xif.createXMLEventReader(input); XMLOutputFactoryxof = XMLOutputFactory.newInstance(); XMLEventWriterxew = xof.createXMLEventWriter(System.out); XMLEventFactoryxef = XMLEventFactory.newInstance(); 3.4 Streaming API for XML

  10. StAX example (4/6) • Iterate over events of the InputStream: while (xer.hasNext() ) { XMLEventinEvent = xer.nextEvent(); if (inEvent.isStartElement()) { StartElementse= (StartElement) inEvent; QNameinQName = se.getName(); String localName = inQName.getLocalPart(); xew.add( xef.createStartElement( inQName.getPrefix(), inQName.getNamespaceURI(), localName.toUpperCase(), se.getAttributes(), se.getNamespaces() ) ); 3.4 Streaming API for XML

  11. StAX example (5/6) • Event iteration continues, to capitalize end tags: } else if (inEvent.isEndElement()) { EndElementee = (EndElement) inEvent; QNameinQName = ee.getName(); String localName = inQName.getLocalPart(); xew.add( xef.createEndElement( inQName.getPrefix(), inQName.getNamespaceURI(), localName.toUpperCase(), ee.getNamespaces() ) ); 3.4 Streaming API for XML

  12. StAX example (6/6) • Output other events, except for comments; Finish when input ends: } else if (inEvent.getEventType() != XMLStreamConstants.COMMENT) { xew.add(inEvent); } } // while (xer.hasNext()) xer.close(); input.close(); xew.flush(); xew.close(); } // main() } // class capitalizeTags 3.4 Streaming API for XML

  13. Efficiency of StreamingAPIs? • An experiment of SAXvsStAX for scanningdocuments • Task: Count and report the number of elements, attributes, characterfragments, and totalcharlength • Inputs: Similarprose-orienteddocuments, of differentsize • repeatedfragments of W3C XML SchemaRec (Part 1) • Tested on OpenJDK 1.6.0 (differentupdates), with • Red Hat Linux 6.0.52, 3 GHz Pentium ,1 GB RAM (”OLD”) • 64 b Centos Linux 5, 2.93 GHz Intel Core 2 Duo, 4GB RAM(”NEW”) 3.4 Streaming API for XML

  14. Essentials of the SAXSolution • Obtain and use a JAXP SAXparser: StringdocFile; // initializedfromcmdline SAXParserFactoryspf = SAXParserFactory.newInstance(); spf.setValidating(validate); //fromcmd option spf.setNamespaceAware(true); SAXParsersp = spf.newSAXParser(); CountHandlerch = new CountHandler(); sp.parse( new File(docFile), ch ); ch.printResult(); // print the statistics 3.4 Streaming API for XML

  15. SAX Solution: CountHandler publicstaticclassCountHandlerextendsDefaultHandler{ // Instancevars for statistics: intelemCount = 0, charFragCount = 0, totalCharLen = 0, attrCount = 0;public void startElement(String nsURI, String locName, String qName, Attributes atts) { elemCount++; attrCount += atts.getLength(); } public void characters(char[] buf, int start,int length){ charFragCount++; totalCharLen += length; } 3.4 Streaming API for XML

  16. Essentials of the StAXSolution • First, initialize: XMLInputFactoryxif = XMLInputFactory.newInstance(); xif.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, true); InputStream input = new FileInputStream( docFile ); intelemCount = 0, charFragCount = 0, totalCharLen = 0, attrCount = 0; • Thenparse the InputStream,using(a) the cursor API, or (b) the eventiterator API 3.4 Streaming API for XML

  17. (a) StAXCursor API Solution (1) XMLStreamReaderxsr = xif.createXMLStreamReader(input); while(xsr.hasNext() ) { inteventType = xsr.next(); switch (eventType) { case XMLEvent.START_ELEMENT: elemCount++; attrCount += xsr.getAttributeCount(); break; 3.4 Streaming API for XML

  18. (a) StAXCursor API Solution (2) case XMLEvent.CHARACTERS: charFragCount++; totalCharLen += xsr.getTextLength(); break; default: break; } // switch } // while (xsr.hasNext() ) xsr.close(); input.close(); 3.4 Streaming API for XML

  19. (b) StAXIterator API Solution (1) XMLEventReaderxer = xif.createXMLEventReader ( input );while (xer.hasNext() ) {XMLEventevent = xer.nextEvent(); if (event.isStartElement()) { elemCount++; Iteratorattrs =event.asStartElement().getAttributes(); while (attrs.hasNext()) { attrs.next(); attrCount++; } } // if (event.isStartElement()) 3.4 Streaming API for XML

  20. (b) StAXIterator API Solution (2) if (event.isCharacters()) { charFragCount++; totalCharLen +=((Characters) event).getData().length(); } } // while (xer.hasNext() ) xer.close(); input.close(); 3.4 Streaming API for XML

  21. Efficiency of SAX vsStAX 3.4 Streaming API for XML

  22. Efficiency of SAX vsStAX (NEW) 3.4 Streaming API for XML

  23. Observations • StAXcursor API is the mostefficient • Overhead of XMLEventobjectsmakesStAXiteratorsome 50 – 80% slower • SAX is on smalldocuments ~ 40 - 100% slowerthan the StAXcursor API • Overhead of DTD validationadds ~5 – 10 % to SAX parsingtime • StAXlosesitsadvantagewithbiggerdocuments: 3.4 Streaming API for XML

  24. Times on LargerDocuments Why? Let'stake a look at memoryusage 3.4 Streaming API for XML

  25. MemoryUsage of SAX vsStAX < 6 MB StAXimplementationhas a memoryleak! (Shouldgetfixed in futurereleases) 3.4 Streaming API for XML

  26. MemoryUsage of SAX vsStAX (NEW) Memory-leakalso in the SAX implementation! 3.4 Streaming API for XML

  27. Circumventing the MemoryLeak • The bugappears to berelated to a DOCTYPE declarationwith an external DTD • Without a DOCTYPE declaration • In firstexperiment, each API useslessthan 6 MB • In secondexperiment, the StAXEventobjectsstillrequireincreasingamounts of memory; Seenext 3.4 Streaming API for XML

  28. SAX vsStAXmemoryneed (w.o. DTD) 3.4 Streaming API for XML

  29. Speed on documentswithout DTD 3.4 Streaming API for XML

  30. Speed on documentswithout DTD (NEW) 3.4 Streaming API for XML

  31. StAX: Summary • Event-based streaming pull-API for XML documents • More convenient than SAX • and often more efficient, esp. the cursor API with small docs • Supports also writing of XML data • A potential substitute for SAX • NB: Sun Java Streaming XML Parser (in JDK 1.6) is non-validating (but the API allows validation, too) • once some implementation bugs (in JDK 1.6) get eliminated 3.4 Streaming API for XML

More Related