Java and XML (DOM and SAX) - PowerPoint PPT Presentation

java and xml dom and sax l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Java and XML (DOM and SAX) PowerPoint Presentation
Download Presentation
Java and XML (DOM and SAX)

play fullscreen
1 / 101
Java and XML (DOM and SAX)
205 Views
Download Presentation
jane
Download Presentation

Java and XML (DOM and SAX)

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Java and XML (DOM and SAX) Some of the material for these slides came from the following sources: “XML a Manager’s Guide” by Kevin Dick “The XML Companion” by Bradley Java Documentation from Sun Microsystems “XML and Java” by Maruyama, Tamura and Uramoto On and Off the internet… Internet Technologies

  2. Java and XML (DOM and SAX) • Parser Operations with DOM and SAX overview • Processing XML with SAX (locally and on the internet) • Processing XML with DOM (locally and on the internet) Internet Technologies

  3. FixedFloatSwap.xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Internet Technologies

  4. FixedFloatSwap.dtd <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > Internet Technologies

  5. Operation of a Tree-based Parser XML DTD Document Tree Tree-Based Parser Application Logic Valid XML Document Internet Technologies

  6. Tree Benefits • Some data preparation tasks require early access to data that is further along in the document (e.g. we wish to extract titles to build a table of contents) • New tree construction is easier (e.g. XSLT works from a tree to convert FpML to WML) Internet Technologies

  7. Operation of an Event Based Parser XML DTD Event-Based Parser Application Logic Valid XML Document Internet Technologies

  8. Operation of an Event Based Parser XML DTD public void startDocument () public void endDocument () public void startElement (…)) public void endElement (…) public void characters (…)) Event-Based Parser Application Logic Valid public void error(SAXParseException e) throws SAXException { System.out.println("\n\n--Invalid document ---" + e); } XML Document Internet Technologies

  9. Event-Driven Benefits • We do not need the memory required for trees • Parsing can be done faster with no tree construction going on Internet Technologies

  10. XML API’s w/jaxpack Internet Technologies

  11. Important SAX interfaces and classes • class InputSource -- A single input source for an XML entity • interface XMLReader -- defines parser behavior (implemented by Xerces’ • SAXParser) • Four core SAX2 handler interfaces: • EntityResolver • DTDHandler • ContentHandler • ErrorHandler Implemented by class DefaultHandler Internet Technologies

  12. Processing XML with SAX • interface XMLReader -- defines parser behavior (implemented by Xerces’ • SAXParser) • XMLReader is the interface that an XML parser's SAX2 driver must implement. • This interface allows an application to set and query features and properties in • the parser, to register event handlers for document processing, and to initiate • a document parse. Internet Technologies

  13. Processing XML with SAX • We will look at the following interfaces and classes and then study an example • interface ContentHandler -- reports on document events • interface ErrorHandler – reports on validity errors • class DefaultHandler – implements both of the above plus two others Internet Technologies

  14. public interface ContentHandler Receive notification of general document events. This is the main interface that most SAX applications implement: if the application needs to be informed of basic parsing events, it implements this interface and registers an instance with the SAX parser using the setContentHandler method. The parser uses the instance to report basic document-related events like the start and end of elements and character data. Internet Technologies

  15. Some methods from the ContentHandler Interface void characters(…) Receive notification of character data. void endDocument(…) Receive notification of the end of a document. void endElement(…) Receive notification of the end of an element. void startDocument(…) Receive notification of the beginning of a document. void startElement(…) Receive notification of the beginning of an element. Internet Technologies

  16. public interface ErrorHandler Basic interface for SAX error handlers. If a SAX application needs to implement customized error handling, it must implement this interface and then register an instance with the SAX parser. The parser will then report all errors and warnings through this interface. For XML processing errors, a SAX driver must use this interface instead of throwing an exception: it is up to the application to decide whether to throw an exception for different types of errors and warnings. Note, however, that there is no requirement that the parser continue to provide useful information after a call to fatalError. Internet Technologies

  17. public interface ErrorHandler Some methods are: void error(SAXParseException exception) Receive notification of a recoverable error. void fatalError(SAXParseException exception) Receive notification of a non-recoverable error. void warning(SAXParseException exception) Receive notification of a warning. Internet Technologies

  18. public class DefaultHandler extends java.lang.Object implements EntityResolver, DTDHandler, ContentHandler, ErrorHandler Default base class for handlers. This class implements the default behaviour for four SAX interfaces: EntityResolver, DTDHandler, ContentHandler, and ErrorHandler. Internet Technologies

  19. FixedFloatSwap.dtd <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap ( Bank, Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Bank (#PCDATA)> <!ELEMENT Notional (#PCDATA)> <!ATTLIST Notional currency (dollars | pounds) #REQUIRED> <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > Input DTD Internet Technologies

  20. FixedFloatSwap.xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ <!ENTITY bankname "Pittsburgh National Corporation"> ] > <FixedFloatSwap> <Bank>&bankname;</Bank> <Notional currency = "pounds">100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Input XML Internet Technologies

  21. Processing // NotifyStr.java // Adapted from XML and Java by Maruyama, Tamura and // Uramoto import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import javax.xml.parsers.*; public class NotifyStr extends DefaultHandler { Internet Technologies

  22. public static void main (String argv []) throws IOException, SAXException { if (argv.length != 1) { System.err.println ("Usage: java NotifyStr filename.xml"); System.exit (1); } XMLReader reader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser"); InputSource inputSource = new InputSource(argv[0]); reader.setContentHandler(new NotifyStr()); reader.parse(inputSource); System.exit (0); } Internet Technologies

  23. public NotifyStr() {} public void startDocument() throws SAXException { System.out.println("startDocument called:"); } public void endDocument() throws SAXException { System.out.println("endDocument called:"); } Internet Technologies

  24. public void startElement(String namespaceURI, String localName, String qName, Attributes aMap) throws SAXException { System.out.println("startElement called: element name =" + localName); // examine the attributes for(int i = 0; i < aMap.getLength(); i++) { String attName = aMap.getLocalName(i); String type = aMap.getType(i); String value = aMap.getValue(i); System.out.println(" attribute name = " + attName + " type = " + type + " value = " + value); } } Internet Technologies

  25. public void characters(char[] ch, int start, int length) throws SAXException { // build String from char array String dataFound = new String(ch,start,length); System.out.println("characters called:" + dataFound); } } Internet Technologies

  26. C:\McCarthy\www\95-733\examples\sax>java NotifyStr FixedFloatSwap.xml startDocument called: startElement called: element name =FixedFloatSwap startElement called: element name =Bank characters called:Pittsburgh National Corporation startElement called: element name =Notional attribute name = currency type = dollars|pounds value = pounds characters called:100 startElement called: element name =Fixed_Rate characters called:5 startElement called: element name =NumYears characters called:3 startElement called: element name =NumPayments characters called:6 endDocument called: Output Internet Technologies

  27. Accessing the swap from the internet <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap [ <!ENTITY bankname "Pittsburgh National Corporation"> ] > <FixedFloatSwap> <Bank>&bankname;</Bank> <Notional currency = "pounds">100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Saved under webapps/sax/fpml/FixedFloatSwap.xml Internet Technologies

  28. The Deployment Descriptor <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN" "http://java.sun.com/j2ee/dtds/web-app_2.2.dtd"> <web-app> <servlet> <servlet-name>SaxExample</servlet-name> <servlet-class>GetXML</servlet-class> </servlet> <servlet-mapping> <servlet-name>SaxExample</servlet-name> <url-pattern>/GetXML/*</url-pattern> </servlet-mapping> </web-app> webapps/sax/WEB-INF/web.xml Internet Technologies

  29. // This servlet file is stored under Tomcat in // webapps/sax/WEB-INF/classes/GetXML.java // This servlet returns a user selected xml file from // webapps/sax/fpml directory // and returns it as a string to the client. import java.io.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*; public class GetXML extends HttpServlet { // Servlet Internet Technologies

  30. public void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { System.out.println("doGet called with " + req.getPathInfo()); String theData = ""; String extraPath = req.getPathInfo(); extraPath = extraPath.substring(1); // read the file try { // open file and create a DataInputStream FileInputStream theFile = new FileInputStream( "D:\\jakarta-tomcat-4.0.1\\webapps\\sax\\fpml\\“ +extraPath); Internet Technologies

  31. InputStreamReader is = new InputStreamReader(theFile); BufferedReader br = new BufferedReader(is); // read the file into the string theData String thisLine; while((thisLine = br.readLine()) != null) { theData += thisLine + "\n"; } } catch(Exception e) { System.err.println("Error " + e); } Internet Technologies

  32. PrintWriter out = res.getWriter(); out.write(theData); System.out.println("Wrote document to client"); //System.out.println(theData); out.close(); } } Internet Technologies

  33. // TomcatNotifyStr.java // Adapted from XML and Java by Maruyama, Tamura and Uramoto import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import javax.xml.parsers.*; public class TomcatNotifyStr extends DefaultHandler { public static void main (String argv []) throws IOException, SAXException { if (argv.length != 1) { System.err.println ("Usage: java NotifyStr filename.xml"); System.exit (1); } // Client Internet Technologies

  34. XMLReader reader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser"); String serverString = "http://localhost:8080/sax/GetXML/"; String fileName = argv[0]; InputSource inputSource = new InputSource(serverString + fileName); reader.setContentHandler(new TomcatNotifyStr()); reader.parse(inputSource); System.exit (0); } Internet Technologies

  35. public TomcatNotifyStr() {} public void startDocument() throws SAXException { System.out.println("startDocument called:"); } public void endDocument() throws SAXException { System.out.println("endDocument called:"); } Internet Technologies

  36. public void startElement(String namespaceURI, String localName, String qName, Attributes aMap) throws SAXException { System.out.println("startElement called: element name =" + localName); // examine the attributes for(int i = 0; i < aMap.getLength(); i++) { String attName = aMap.getLocalName(i); String type = aMap.getType(i); String value = aMap.getValue(i); System.out.println(" attribute name = " + attName + " type = " + type + " value = " + value); } } Internet Technologies

  37. public void characters(char[] ch, int start, int length) throws SAXException { // build String from char array String dataFound = new String(ch,start,length); System.out.println("characters called:" + dataFound); } } Internet Technologies

  38. Being served by the servlet <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap [ <!ENTITY bankname "Pittsburgh National Corporation"> ] > <FixedFloatSwap> <Bank>&bankname;</Bank> <Notional currency = "pounds">100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Internet Technologies

  39. C:\McCarthy\www\95-733\examples\sax>java TomcatNotifyStr FixedFloatSwap.xml startDocument called: startElement called: element name =FixedFloatSwap characters called: startElement called: element name =Bank characters called:Pittsburgh National Corporation characters called: startElement called: element name =Notional attribute name = currency type = CDATA value = pounds characters called:100 characters called: startElement called: element name =Fixed_Rate characters called:5 characters called: startElement called: element name =NumYears characters called:3 characters called: startElement called: element name =NumPayments characters called:6 characters called: characters called: endDocument called: Output Internet Technologies

  40. Let’s Add Back the DTD… <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap ( Bank, Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Bank (#PCDATA)> <!ELEMENT Notional (#PCDATA)> <!ATTLIST Notional currency (dollars | pounds) #REQUIRED> <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > Internet Technologies

  41. And reference the DTD in the XML <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ <!ENTITY bankname "Pittsburgh National Corporation"> ] > <FixedFloatSwap> <Bank>&bankname;</Bank> <Notional currency = "pounds">100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Internet Technologies

  42. We get new output C:\McCarthy\www\95-733\examples\sax>java TomcatNotifyStr FixedFloatSwap.xml startDocument called: startElement called: element name =FixedFloatSwap startElement called: element name =Bank characters called:Pittsburgh National Corporation startElement called: element name =Notional attribute name = currency type = dollars|pounds value = pounds characters called:100 startElement called: element name =Fixed_Rate characters called:5 startElement called: element name =NumYears characters called:3 startElement called: element name =NumPayments characters called:6 endDocument called: How many times did we visit the servlet? Twice. Once for the xml and a second time for the DTD. Internet Technologies

  43. We don’t have to go through a servlet…Tomcat can send the files String serverString = "http://localhost:8080/sax/fpml/"; String fileName = argv[0]; InputSource is = new InputSource(serverString + fileName); But the servlet illustrates that the XML data can be generated dynamically. Internet Technologies

  44. The InputSource Class The SAX and DOM parsers need XML input. The “output” produced by these parsers amounts to a series of method calls (SAX) or an application programmer interface to the tree (DOM). An InputSource object can be used to provided input to the parser. Tree application InputSurce SAX or DOM Events So, how do we build an InputSource object? Internet Technologies

  45. The InputSource Class Some InputSource constructors: InputSource(String pathToFile); InputSource(InputStream byteStream); InputStream(Reader characterStream); For example: String text = “<a>some xml</a>”; StringReader sr = new StringReader(text); InputSource is = new InputSource(sr); : myParser.parse(is); Internet Technologies

  46. But what about the DTD? public interface EntityResolver Basic interface for resolving entities. If a SAX application needs to implement customized handling for external entities, it must implement this interface and register an instance with the SAX parser using the parser's setEntityResolver method. The parser will then allow the application to intercept any external entities (including the external DTD subset and external parameter entities, if any) before including them. Internet Technologies

  47. EntityResolver public InputSource resolveEntity(String publicId, String systemId) { // Add this method to the client above. The systemId String // holds the path to the dtd as specified in the xml document. // We may now access the dtd from a servlet and return an // InputStream or return null and let the parser resolve the // external entity. System.out.println("Attempting to resolve" + "Public id :" + publicId + "System id :" + systemId); return null; } Internet Technologies

  48. Processing XML with DOM • The following examples were tested using Sun’s JAXP • (Java API for XMP Parsing. This is available at • http://www.javasoft.com/ and click on XML Internet Technologies

  49. XML DOM • The World Wide Web Consortium’s Document Object Model • Provides a common vocabulary to use in manipulating • XML documents. • May be used from C, Java, Perl, Python, or VB • Things may be quite different “under the hood”. • The interface to the document will be the same. Internet Technologies

  50. The XML File “cats.xml” <?xml version = "1.0" ?> <!DOCTYPE TopCat SYSTEM "cats.dtd"> <TopCat> I am The Cat in The Hat <LittleCatA> I am Little Cat A </LittleCatA> <LittleCatB> I am Little Cat B <LittleCatC> I am Little Cat C </LittleCatC> </LittleCatB> <LittleCatD/> </TopCat> Internet Technologies