1 / 27

Xerces2: The Sequel With No Equal

Xerces2: The Sequel With No Equal. Andy Clark. Introduction. Speaker Worked for IBM Currently unemployed  Parser First developed in IBM’s Tokyo research lab Maintained and expanded in California Donated to Apache Work continues in Toronto. Agenda. Xerces1 Overview

jeff
Download Presentation

Xerces2: The Sequel With No Equal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Xerces2:The Sequel With No Equal Andy Clark ApacheCon US - Las Vegas, Nevada

  2. Introduction • Speaker • Worked for IBM • Currently unemployed  • Parser • First developed in IBM’s Tokyo research lab • Maintained and expanded in California • Donated to Apache • Work continues in Toronto ApacheCon US - Las Vegas, Nevada

  3. Agenda • Xerces1 Overview • Design and problems • Xerces2 Overview • Challenges and design • Q & A ApacheCon US - Las Vegas, Nevada

  4. Xerces1 Overview:Design and Problems Andy Clark ApacheCon US - Las Vegas, Nevada

  5. Design • XML4J/Xerces1 designed for performance • Parser Implementation • Parsing pipeline • Custom reader implementations • StringPool • Defers transcoding of byte buffers until needed • Symbol table for common document strings ApacheCon US - Las Vegas, Nevada

  6. Pipeline Configuration • Intended to be generic Scanner Validator Parser XML API ApacheCon US - Las Vegas, Nevada

  7. Scanner Validator Parser Pipeline Configuration Problems • Hard-coded dependencies on implementation • Inconsistent Interfaces XML API Dependency Different Interfaces ApacheCon US - Las Vegas, Nevada

  8. Entity Handler Scanner UTF-8 Reader scanName scanAttValue scanContent … Reader Stack UCS Reader EBCDIC Reader Generic Reader Custom Readers ApacheCon US - Las Vegas, Nevada

  9. Custom Readers Problems • Duplicated code • Allows more bugs to appear • Bugs are different based on encoding because code is not shared • More complicated ApacheCon US - Las Vegas, Nevada

  10. StringPool addString(StringProducer,int,int):int toString(int):String addString(String):int XML Parser Component Reader String Producer Data Buffer … Data Buffer Deferred Transcoding ApacheCon US - Las Vegas, Nevada

  11. Deferred Transcoding Problems • All components need reference to StringPool • Strings not immediately available to methods • Must make call to StringPool to query String • Memory management is complicated • Responsibility of callee to free resources • Uses more memory ApacheCon US - Las Vegas, Nevada

  12. Xerces2 Overview:Challenges and Design Andy Clark ApacheCon US - Las Vegas, Nevada

  13. Challenges • Requirements • Simple design and implementation • Easy to maintain • More modularity and configurability • Support current and future features • Design Decisions • Always transcode bytes into Unicode characters • Removes StringPool and dependencies • Clean architecture ApacheCon US - Las Vegas, Nevada

  14. Xerces Native Interface (XNI) • “Streaming” Information Set • Similar to SAX • No loss of document information* • Parser configuration and layering • Future extensions • Native pull-parser, tree model, etc. * Does not preserve all document information but communicates more information to the application than DOM or SAX. ApacheCon US - Las Vegas, Nevada

  15. org.apache.xerces.xni org.apache.xerces.xni.parser Augmentations XMLAttributes XMLComponentManager XMLComponent NamespaceContext XMLParserConfiguration XMLPullParserConfiguration XMLDocumentFragmentHandler XMLDocumentFilter XMLDocumentSource XMLDocumentHandler XMLDocumentScanner XMLDTDHandler XMLDTDFilter XMLDTDSource XMLDTDContentModelHandler XMLDTDScanner XMLResourceIdentifier XMLDTDContentModelFilter XMLDTDContentModelSource XMLErrorHandler XMLEntityResolver XMLLocator XMLString QName XMLParseException XMLInputSource XNIException XMLConfigurationException java.lang Interface Package RuntimeException Class Extends ApacheCon US - Las Vegas, Nevada

  16. Scanner Validator Parser XML API Parsing Pipeline • Handlers communicate information between parser components ApacheCon US - Las Vegas, Nevada

  17. Document Scanner Validator Parser DTD Scanner Handler Overview XMLDocumentHandler XML API XMLDTDHandler XMLDTDContentModelHandler ApacheCon US - Las Vegas, Nevada

  18. Symbol Table Grammar Pool Datatype Factory Scanner Validator Entity Manager Error Reporter Configurable Components Parser Layout • Components and Manager Component Manager Regular Components ApacheCon US - Las Vegas, Nevada

  19. Entity Scanner Entity Manager Scanner UTF-8 Reader scanName scanAttValue scanContent … Reader Stack UCS Reader EBCDIC Reader Generic Reader Reader Management ApacheCon US - Las Vegas, Nevada

  20. DOM Parser SAX Parser Scanner Validator Document Parser Parser Configuration • Before * Parser pipeline is part of the document parser base class. * Required duplication to re-configure parser and still take advantage of API generator code. XML ApacheCon US - Las Vegas, Nevada

  21. DOM Parser SAX Parser Scanner Validator Document Parser Parser Configuration XML Parser Configuration • After * Parser pipeline and settings are specified in a separate parser configuration object. * Allows re-use of framework without rewriting existing code. ApacheCon US - Las Vegas, Nevada

  22. API Generators • Different APIs can be generated from same document parser JavaBean Parser … DOM Parser SAX Parser XNI Document Parser ApacheCon US - Las Vegas, Nevada

  23. DOM Parser SAX Parser Document Parser Sample Parser Configuration #1 • HTML parser • Available as NekoHTML download HTML Parser Configuration HTML HTML Scanner Tag Balancer ApacheCon US - Las Vegas, Nevada

  24. DOM Parser SAX Parser Document Parser Sample Parser Configuration #2 • Non-validating parser (for performance) • Available with Xerces download Non-Validating Parser Configuration XML Scanner / Namespace Binder ApacheCon US - Las Vegas, Nevada

  25. DOM Parser SAX Parser Document Parser Sample Parser Configuration #3 • XInclude processing • Not yet implemented XInclude Parser Configuration XML Scanner XInclude Validator ApacheCon US - Las Vegas, Nevada

  26. DOM Parser SAX Parser Document Parser Sample Parser Configuration #4 • Database result set converted to XML • Not yet implemented Database Parser Configuration DB Database Query Validator ApacheCon US - Las Vegas, Nevada

  27. That’s All, Folks! • Question and Answers • Any questions? • Links • http://www.apache.org/~andyc/xml/present/ • http://xml.apache.org/xerces2-j/ • http://www.apache.org/~andyc/neko/ ApacheCon US - Las Vegas, Nevada

More Related