240 likes | 362 Views
5. Processing XML. Overview. Parsing XML documents Document Object Model (DOM) Simple API for XML (SAX) Class generation. What's the Problem?. ?. <?xml version="1.0"?> <books> <book> <title>The XML Handbook</title> <author>Goldfarb</author> <author>Prescod</author>
E N D
5 Processing XML
Overview • Parsing XML documents • Document Object Model (DOM) • Simple API for XML (SAX) • Class generation
What's the Problem? ? <?xml version="1.0"?> <books> <book> <title>The XML Handbook</title> <author>Goldfarb</author> <author>Prescod</author> <publisher>Prentice Hall</publisher> <pages>655</pages> <isbn>0130811521</isbn> <price currency="USD">44.95</price> </book> <book> <title>XML Design</title> <author>Spencer</author> <publisher>Wrox Press</publisher> ... </book> </books> ? Book
Parser startDocument Application implementsDocumentHandler startElement startElement endElement endElement endDocument Document Tree Parsing XML Documents Docu-ment DTD /Schema DOM SAX
Parser • Project X (Sun Microsystems) • Ælfred (Microstar Software) • XML4J (IBM) • Lark (Tim Bray) • MSXML (Microsoft) • XJ (Data Channel) • Xerces (Apache) • ...
The Document Object Model XML Document Structure <?xml version="1.0"?> <books> <book> <title>The XML Handbook</title> <author>Goldfarb</author> <author>Prescod</author> <publisher>Prentice Hall</publisher> <pages>655</pages> <isbn>0130811521</isbn> <price currency="USD">44.95</price> </book> <book> <title>XML Design</title> <author>Spencer</author> <publisher>Wrox Press</publisher> ... </book> </books> books book book title author publisher pages isbn The XMLHandbook Goldfarb PrenticeHall 655 ... Prescod
The Document Object Model • Provides a standard interface for access to and manipulation of XML structures. • Represents documents in the form of a hierarchy of nodes. • Is platform- and programming-language-neutral • Is a recommendation of the W3C (October 1, 1998) • Is implemented by many parsers
DOM - Structure Model Document books book book Node title author publisher pages isbn Element The XMLHandbook Goldfarb PrenticeHall 655 ... Prescod NodeList
The Document Interface Method Result docTypeimplementation documentElement getElementsByTagName(String) createTextNode(String) createComment(String) createElement(String) create CDATASection(String) DocumentType DOMImplementation Element NodeList String Comment Element CDATASection
The Node Interface Method Result String String short Node NodeList Node Node Node Node NodeNamedMap Node Node Node Boolean nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes insertBefore(Node new,Node ref) replaceChild(Node new,Node old) removeChild(Node) hasChildNode
Node Types / Node Names Result: NodeType /NodeName Node Node Node Fields Type Name ELEMENT_NODE 1 tagName ATTRIBUTE_NODE 2 name of attribute TEXT_NODE 3 "#text" CDATA_SECTION_NODE 4 "#cdata-section" ENTITY_REFERENCE_NODE 5 name of entity referenced ENTITY_NODE 6 entity name PROCESSING_INSTRUCTION_NODE 7 targetCOMMENT_NODE 8 "#comment"DOCUMENT_NODE 9 "#document"DOCUMENT_TYPE_NODE 10 document type name DOCUMENT_FRAGMENT_NODE 11 "#document-fragment" NOTATION_NODE 12 notation name
The NodeList Interface Method Result length item(int) Int Node
The Element Interface Method Result tagName getAttribute(String) setAttribute(String name, String value) removeAttribute(String) getAttributeNode(String) setAttributeNode(Attr) removeAttributeNode(String) getElementsByTagName String String Attr Attr Attr NodeList
DOM Methods for Navigation parentNode previousSibling nextSibling firstChild lastChild childNodes(length, item()) getElementsByTagName
DOM Methods for Manipulation appendChild insertBefore replaceChildremoveChild createElement createAttribute createTextNode
firstBook DOMObject secondAuthor TextSubnodes firstthereof Root Node Books Text Authors Example books book book author author author Spencer Prescod Goldfarb doc.documentElement.childNodes.item(0).getElementsByTagName("author"). item(1).childNodes.item(0).data
Script <HTML> <HEAD><TITLE>DOM Example</TITLE></HEAD> <BODY> <H1>DOM Example</H1> <SCRIPT LANGUAGE="JavaScript"> var doc, root, book1, authors, author2; doc = new ActiveXObject("Microsoft.XMLDOM"); doc.async = false; doc.load("books.xml"); if (doc.parseError != 0) alert(doc.parseError.reason); else { root = doc.documentElement; document.write("Name of Root node: " + root.nodeName+ "<BR>"); document.write("Type of Root node: " + root.nodeType+ "<BR>"); book1 = root.childNodes.item(0); authors = book1.getElementsByTagName("author"); document.write("Number of authors: " + authors.length + "<BR>"); author2 = authors.item(1); document.write("Name of second author: " + author2.childNodes.item(0).data);} </SCRIPT> </BODY></HTML>
Parser startDocument startElement startElement endElement endElement endDocument SAX - Simple API for XML Docu-ment DTD Application
SAX - Simple API for XML • Event-driven parsing model • "Don't call the DOM, the parser calls you." • Developed by the members of the XML-DEV Mailing List • Released on May 11, 1998 • Supported by many parsers ... • ... but Ælfred is the saxon king.
Procedure • DOM • Creating a parser instance • Parsing the whole document • Processing the DOM tree • SAX • Creating a parser instance • Registrating event handlers with the parser • Parser calls the event handler during parsing
Namespace Support <?xml version="1.0"?> <order xmlns="http://www.net-standard.com/namespaces/order" xmlns:bk="http://www.net-standard.com/namespaces/books" xmlns:cust="http://www.net-standard.com/namespaces/customer" > ... <bk:book> <bk:title>XML Handbook</bk:title> <bk:isbn>0130811521</bk:isbn> </bk:book> .... </order>
DOM Level 2 SAX 2.0 Interface "Node" startElement Method qName uri localName nodeName namespaceURI prefix localName Access to Qualified Elements Node "book" bk:book http://www.net-standard.com/namespaces/books bk book
DTD / Schema 'yacht' Generation Class 01 yacht 05 name 05 details 10 type <?xml?> <yacht yachtid='147'> <name>Mona Lisa</name><image file='yacht147.jpg'/><description> Any text describing this yacht 147</description><details> <type>GULFSTAR 55</type> ength>1700</length> <width>480</width> <draft>170</draft> <sailsurface>112</sailsurface> <motor>84</motor> <headroom>202</headroom> <bunks>8</bunks> </details></yacht> Processing 01 yacht 05 VENTANA 05 details 10 GULFSTAR 55 Object Generation of Data Structures
Summary • To avoid expensive text processing, applications use an XML parser that creates a DOM tree of a document. • The DOM provides a standardized API to access the content of documents and to manipulate them. • Alternatively or additionally, applications can work event-based using the SAX interface, which is provided by many parsers.