130 likes | 259 Views
Apache Xerces is a collection of libraries that provides robust functionalities for parsing XML documents, promoting XML usage through its dual parser types: DOM (Document Object Model) and SAX (Simple API for XML). While DOM creates a memory-based tree structure ideal for small files, SAX offers an event-driven approach optimized for larger XML files. Xerces supports multiple programming languages, including C++, Java, and Perl, with comprehensive compliance to XML 1.0, SAX 1.0/2.0, and DOM Level 1/2 standards, enabling developers to read, write, parse, and validate XML documents efficiently.
E N D
XercesThe Apache XML Project Yvonne Yao
Introduction • Set of libraries that provides functionalities to parse XML documents • Promotes the use of XML • Why? • Two types of parsers • DOM (Document Object Model) parsers • SAX (Simple API for XML) parsers
DOM & SAX • DOM • Implements the DOM API • Creates a DOM tree in memory • Good for small XML files, or traverse the document back and forth • SAX • Implements SAX API • Event driven interface • Good for huge XML files
Xerces • Implements both DOM and SAX parsers • 3 subprojects • Xerces C++ • Xerces Java • Xerces Perl
Xerces C++ • Current version 2.7.0 • Provides functionalities to read, write, parse, and validate XML documents • Conforms with • XML 1.0 and XML 1.1 • SAX 1.0 and SAX 2.0 • DOM Level 1 and 2
Xerces Perl • Current version 2.7.0 • Implemented using the Xerces C++ API • Provides access to most of the C++ API, except • Functions in the C++ API which have better Perl counterparts (such as file I/O), or • Functions that manipulate internal C++ information that has no role in the Perl module • Conforms to the same set of Standards as Xerces C++
Xerces Java • Xerces Java • Current version 1.4.4 • Conforms with • XML 1.0 • SAX 1.0 and 2.0 • DOM Level 2 • Xerces2 Java • Current version 2.9.0 • Includes Xerces Native Interface, a new framework for building parser components and configurations
Example 1 - DOM <?xml version="1.0" encoding="UTF-8"?> <Personnel> <Employee type="permanent"> <Name>Seagull</Name> <Id>3674</Id> <Age>34</Age> </Employee> <Employee type="contract"> <Name>Robin</Name> <Id>3675</Id> <Age>25</Age> </Employee> <Employee type="permanent"> <Name>Crow</Name> <Id>3676</Id> <Age>28</Age> </Employee> </Personnel>
Example 1 - DOM • Create a DOM object DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); dom = db.parse("employees.xml"); • Get a list of employee elements from DOM Element docEle = dom.getDocumentElement(); NodeList nl = docEle.getElementsByTagName("Employee"); • Get node value from element element.getFirstChild().getNodeValue();
Example 2 - SAX • SAX parsing is event based modeling, it calls a tag handler whenever it encounters a tag public void startElement(String uri, String localName, String qName, Attributes attributes) public void endElement(String uri, String localName, String qName)
Example 2 - SAX • Create a SAX parser SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser sp = spf.newSAXParser(); sp.parse("employees.xml", this); • Create Employee object when <Employee> is found if(qName.equalsIgnoreCase("Employee")) tempEmp = new Employee(); • Set Employee properties when an end tag is found if (qName.equalsIgnoreCase("Name")) tempEmp.setName(tempVal);