220 likes | 518 Views
Streaming API for XML (stAX). Cheng-Chia Chen. XML API styles. Push: SAX, XNI Tree: DOM, JDOM, XOM, ElectricXML, dom4j, Sparta Data binding: Castor, Zeus, JAXB Pull: XMLPULL, StAX, NekoPull Transform: XSLT, TrAX, XQuery. What is pull parsing ?. SAX:push parsing (event driven)
E N D
Streaming API for XML(stAX) Cheng-Chia Chen
XML API styles • Push: • SAX, XNI • Tree: • DOM, JDOM, XOM, ElectricXML, dom4j, Sparta • Data binding: • Castor, Zeus, JAXB • Pull: • XMLPULL, StAX, NekoPull • Transform: • XSLT, TrAX, XQuery
What is pull parsing ? • SAX:push parsing (event driven) • view an XML document as if it is composed of a sequence of events and call preconfigured event-handling methods while sequentially visiting these events. • stAX: pull parsing (token based) • behaves like traditional lexical analyzers (in compiler). • curser-based. • view an XML documents as is composed of a sequence of passive tokens (or events) and the AP can determine when to get the next token of a certain kind via a set of methods like nest(), nextElement() etc. • Traits of Pull Parsing: • Fast, Memory efficient, Streamable, Read-only • Suitable for the need of JAXB and JAX-RPC, which require more flexible context-dependant processing.
Major Classes and Interfaces in stAX • XMLStreamReader: • an interface that represents the parser • cursor-based, event info stored in the parser • XMLEventReader • an interface that represents the parser • event-based, event info stored in the return event • XMLInputFactory: • the factory class for instantiating an XMLStreamReader and XMLEventReader • XMLStreamException: • the generic class for everything other than an IOException that might go wrong when parsing an XML document.
well-formedness checking (full source) try { InputStream in = … XMLInputFactory factory = XMLInputFactory.newInstance(); XMLStreamReader parser = factory.createXMLStreamReader(in); while (true) { int event = parser.next(); // move curser to next XML token if (event == XMLStreamConstants.END_DOCUMENT) { parser.close(); break; } } parser.close(); // If we get here there are no exceptions out.println(" The input is well-formed"); } catch (XMLStreamException ex) { out.println(“ The input is not well-formed"); } catch (IOException ex) { out.println(" IO error”); }
Inteface XMLStreamConstants • Define 15 event codes for XMLStreamReader.next() to tell you what kind of events the parser encounter: • START_DOCUMENT, END_DOCUMENT • START_ELEMENT, END_ELEMENT • ATTRIBUTE, CHARACTERS • CDATA, SPACE // ignorable WS • NAMESPACE, PROCESSING_INSTRUCTION • COMMENT, • ENTITY_REFERENCE • NOTATION_DECLARATION • ENTITY_DECLARATION • DTD • Depending on the read event, different methods are available on the XMLStreamReader for fetching additional infomation about the present event. (state-based)
POssible parsing events • For a well-formed XML document, only the following events can be generated by XMLStreamReader#next() • START_DOCUMENT [XML declaration], • DTD [ NOTATION_DECLARATION, • ENTITY_DECLARATION ] • END_DOCUMENT • START_ELEMENT [ATTRIBUTE, NAMESPACE] • END_ELEMENT • CHARACTERS, CDATA, SPACE // ignorable WS • PROCESSING_INSTRUCTION • COMMENT, • ENTITY_REFERENCE
XML Event Hierarchy • java.xml.stream.XMLStreamConstants j.x.s.events.XMLEvent • StartDocument, • DTD, NotationDeclaration, EntityDeclaration • StartElement, • Attribute Namespace • Characters, Comment, • EntityReference, ProcessingInstruction • EndElement • EndDocument
XMLStreamReader • Event content queries • For element Name • getName(): QName • getLocalName(): String • getNamespaceURI():String • For declared Namespaces • getNamespaceCount(): int • getNamespaceURI(int) • getNamespacePrefis(int) • Inscope Namespace • getNamespaceURI(String prefix) • getNamespaceContext(): NamespaceContext
For attached Attributes • getAttributeCount() : int • getAttributeName(int) : QName • getAttributePrefix(int): String • getAttributeNamespace(int):String • getAttributeLocalName(int):String • getAttributeType(int):String • getAttributeValue(int):String • getAttributeValue(URI, localName):String • For text or character data • hasText() : boolean; getText() : String • getTextCharacters():char[] // readOnly, valid unitl next() • getTextStart() : int; getTextLength() : int • getTextCharacters(int sstart, char[] target, int tstart, int length) // sstart = 0 copy from getTextStart()
interface javax.xml.stream.events.XMLEvent • XMLEvents are value objects that are used to communicate the XML 1.0 InfoSet to the Application. • Events may be cached and referenced after the parse has completed. • Methods • int getEventType(); • Location getLocation(); • QName getSchemaType(); // optional • void writeAsEncodedUnicode(Writer writer) • write this XMLEvent to write as Unicode characters. • Xxxx asXxxx() • Xxxx : StartElement, EndElement, or Characters. • boolean isXxxx() // Xxxx: All events except Comment, DTD, NotationDecl and EntityDecl
StartDocument • startDocument • public String getSystemId(); // default is “” • public String getCharacterEncodingScheme(); // ‘UTF-8” • public boolean encodingSet(); // true if this attr set • public boolean isStandalone(); //default is false • public boolean standaloneSet(); // true if this attr set • public String getVersion(); // 1,0 or 1.1 • EndDocument extends XMLEvent { } • // no special methods • Namespace • String getPrefix(); • String getNamespaceURI(); • bolean isDefaultNamespaceDeclaration();
StartElement and EndElement • StartElement • QName getName(); • Iterator getAttributes(); // of j.x.stream.Attribute • Iterator getNamespaces(); • namespaces declared or undeclared in this start tag of j.x.s.Namespace • Attribute getAttributeByName(QName name); • String getNamespaceURI(String prefix); • query namespace URI for the input prefix • NamespaceContext getNamespaceContext(); • contains all namespaces in scope • EndElement • public QName getName(); • public Iterator getNamespaces(); • // namespaces going out of scope
The NamespaceContext Class package javax.xml.namespace; • NamespaceContext • String getNamespaceURI(String prefix); • String getPrefix(String namespaceURI); • Iterator getPrefixes(String namespaceURI);
javax.xml.nsamespace.QName public class QName {// immutable objects public QName( [String namespaceURI,]String localPart [, String prefix]); • default URI: java.xml.XMLConstants.NS_NULL_URI = “” • default prefix: java.xml.XMLConstants.NS_NULL_PREFIX = “” public String getLocalPart(); public String getPrefix(); public String getNamespaceURI(); public int hashCode(); public boolean equals(Object object); // true iff same URI and same localpart public String toString(); // = ‘{‘ +namespaceURI+ ‘}’ + localPart public static QName valueOf(String qNameAsString); // inverse of toString() }
Location package javax.xml.stream; • Location • int getLineNumber(); • int getColumnNumber(); • int getCharacterOffset(); • String getLocationURI();
XMLEventReader package javax.xml.stream; public interface XMLEventReader extends Iterator { Obejct next(); boolean hasNext() ; boolean remove(Object) public XMLEvent peek(); public String getElementText(); public XMLEvent nextTag(); // skip whitespace characters public Object getProperty(String name) }
Attribute and Characters • Attribute • QName getName(); • String getValue(); • QName getDTDType(); // default is “CDATA” • boolean isSpecified(); // false given in DTD • Characters • String getData(); • boolean isWhiteSpace(); • boolean isCData(); // is a CDATA SECTION • // if Coalescing with other text false • boolean isIgnorableWhiteSpace(); • // isWhiteSpace() && child of element-only element
Comment, PI and Notation • Comment • getText() : String • ProcessingINstruction • getTarget() : String • getData() : String
DTD and NotationDeclaration • DTD • String getDocumentTypeDeclaration(); • // as a string • Object getProcessedDTD(); • // a representation of DTD. • List getNotations(); • // of NotationDeclaration • List getEntities(); // unparsed entities • NotationDeclaration • getName() : String • getPublicId(): Strnig • getSysteId()L String
EntityDeclaration and EntityReference • (general unparsed ?) EntityDeclaration • String getPublicId(); • String getSystemId(); • String getName(); • public String getNotationName(); • public String getReplacementText() • (Unexpanded general) EntityReference • EntityDeclarationgetDeclaration() • Return the declaration of this entity. • StringgetName() • The name of the entity • Event reported only if isReplacingEntityReferences is set to false