1 / 36

TDX: a High-Performance Table-Driven XML Parser

TDX: a High-Performance Table-Driven XML Parser. Wei Zhang Robert van Engelen. Department of C omputer Science Florida State University. Outline. Motivation Introduction Recent Work Table-Driven XML Parsing – TDX TDX Construction Toolkit Results and Preliminary Conclusion.

halla-chan
Download Presentation

TDX: a High-Performance Table-Driven XML Parser

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TDX: a High-Performance Table-Driven XML Parser Wei Zhang Robert van Engelen Department of Computer Science Florida State University

  2. Outline • Motivation • Introduction • Recent Work • Table-Driven XML Parsing – TDX • TDX Construction Toolkit • Results and Preliminary Conclusion

  3. Motivation • Enhance performance for XML-based Web Services • Provide flexibility • Offer high-level modularity

  4. Roadmap • Motivation • Introduction • Recent Work • Table-Driven XML parsing – TDX • TDX construction Tool Kit • Experiment Results and Preliminary Conclusion

  5. data conversion validation well-formedness Introduction application • Validating XML Parsing • Three stages • Well-formedsness • Validation • Data conversion • Frequent access to schema • Separation introduces overhead and requires frequent access to schema XML XML

  6. Data Conversion Well-formedness Validation Introduction (cont’d) • Schema-specific XML parsing (SSP) • Merging well-formedness and validation • No requirement to frequent access to schema • Separation stage of data conversion in implemented SSP

  7. Roadmap • Motivation • Introduction • Recent Work • Table-Driven XML parsing – TDX • TDX construction Tool Kit • Experiment Results and Preliminary Conclusion

  8. Recent Work • Chiu: “A compiler-based cpproach to schema-specific XML parsing” • Merging parsing and validation by constructing PDA • No namespace support • Conversion from NFA to DFA may result in exponentially growing space requirement

  9. Recent Work(cont'd) • van Engelen: “Constructing finite automata for high-performance web services” • Integrates parsing and validation into one stage by parsing actions encoded by DFA • Cannot process cyclic XML schema

  10. Recent Work(cont'd) • van Engelen: ”The gSOAP toolkit for web services and peer-to-peer Computing Networks ” • Namespace support • Merging parsing and validation • Implementing a recursive-decent parsing • Disadvantages of recursive-descent • Code size and function calling overhead

  11. Roadmap • Motivation • Introduction • Recent Work • Table-Driven XML parsing – TDX • TDX construction Tool Kit • Experiment Results and Preliminary Conclusion

  12. Table-XML Parsing (TDX) • LL(1) grammar can be derived from schema • XML documents can be parsed and validated using LL(1) grammar • Well-formedness (parsing) can be verified through grammar rules • Validation can be accomplished using semantic actions • Application-specific events can also be encoded as semantic actions

  13. Illustrating Example <schema> <element name=“book” type=“bookType”> <complexType name=“bookType”> <sequence> <element name=“title” type=“string”> <element name=“author” type=“string”> </sequence> </complexType> </schema> LL(1) Grammar: s  ‘<book>’ t ‘</book>’ t  t1 t2 t1 ‘<title>’ DATA //imp_s(s.val) ‘</title>’ t2 ‘<author>’ DATA //imp_s(s.val) ‘</author>’

  14. Illustrating Example (cont'd) <book> <title> XML Tech </title> <author> Bob </author> </book> s ‘</book>’ t ‘<book>’ t1 t2 DATA ‘<title>’ ‘</title>’ DATA ‘<author>’ ‘<author>’ imp_s(“XML Tech”) imp_s(“Bob”) (a) An XML Instance (b) Predictive Parsing

  15. Roadmap • Recent Work • Table-Driven XML parsing – TDX • Illustrating example • Architecture • Token generation • Mapping schema to LL(1) • Parsing table • Parsing engine • Scanner/tokenizer • TDX construction Tool Kit • Experiment Results and Preliminary Conclusion

  16. TDX - Architecture Modules Ll(1) Grammar Productions and Actions LL(1) Parsing Table Tokens application Scanner/ Tokenizer (DFA) Token CDATA Parsing Engine (TDX) <XML> Events Error: invalid

  17. Roadmap • Recent Work • Table-Driven XML parsing – TDX • Illustrating example • Architecture • Token generation • Mapping schema to LL(1) • Parsing table • Parsing engine • Scanner/Tokenizer • TDX construction Tool Kit • Experiment Results and Preliminary Conclusion

  18. Token Generation • Defined by • <namespace, tag> • Element name (opening and closing) • Attribute name • some data type • Such as Enumeration • Namespace binding • Identical tag names under different namespaces are represented as different tokens • Normalized tokens

  19. Roadmap • Recent Work • Table-Driven XML parsing – TDX • Illustrating example • Architecture • Token generation • Mapping schema to LL(1) • Parsing table • Parsing engine • Scanner/Tokenizer • TDX construction Tool Kit • Experiment Results and Preliminary Conclusion

  20. Mapping Schema to LL(1) Grammar • Structural constraints are mapped to rules • Validation constraints are mapped to semantic actions • Note that many types of validation constraints are mapped to rules • Such as occurrence, enumeration

  21. Mapping Example(1) <simpleType name=“state”> <restriction base=“string”> <enumeration value=“OFF”/> <enumeration value=“ON”/> </restriction> </simpleType> state“OFF” | “ON” <simpleType name=“value”> <restriction base="integer"> <minInclusive value="10"/> <maxInclusive value="250"/> </restriction> </simpleType> value  DATA//imp_i(char *s)

  22. c’’2 c’2 c’’2 c’’2 Mapping Example(2) <complexType name=“example”> <choice> <element name=“id” type=“id_type” minOccurs=“0”/> <element name=“value” type=“value_type” minOccurs=“2” maxOccurs=“unbounded”/> </choice> </complexType> example c1| c2 c1‘<id>’ id_type ‘</id>’ c1 c’2‘<value>’ value_type ‘</value>’ c2c’2c’2c’’2 <sequence> example  c1c2

  23. Roadmap • Recent Work • Table-Driven XML parsing – TDX • Illustrating example • Architecture • Token generation • Mapping schema to LL(1) • Parsing table • Parsing engine • Scanner/Tokenizer • TDX construction Tool Kit • Experiment Results and Preliminary Conclusion

  24. LL(1) Parsing Table • Constructed from LL(1) grammar • Indexed by nonterminals and terminals • Contains either index of grammar production or error entry

  25. Roadmap • Recent Work • Table-Driven XML parsing – TDX • Illustrating example • Architecture • Token generation • Mapping schema to LL(1) • Parsing table • Parsing engine • Scanner/Tokenizer • TDX construction Tool Kit • Experiment Results and Preliminary Conclusion

  26. Parsing Engine • Schema Independent • Maintains • Parsing table • Production table • Action table • Stack

  27. Roadmap • Recent Work • Table-Driven XML parsing – TDX • Illustrating example • Architecture • Token generation • Mapping schema to LL(1) • Parsing table • Parsing engine • Scanner/Tokenizer • TDX construction Tool Kit • Experiment Results and Preliminary Conclusion

  28. Scanner/Tokenizer • Constructed from schema • Schema provides DFA states information • Element name • Has attribute? • Attribute name • Root element needs special care • Schema information

  29. Scanner/Tokenizer example <book xmlns:x ="http://www.x.org" xmlns:y ="http://www.y.org" targetnamespace ="http://www.x.org"> <title>XML Bible</title> <author> <name> Bob </name> <y:title> professor</y:title> </author> </book> <"www.x.org", "title"> DATA <"www.x.org", "/title"> <"www.y.org", "title">

  30. Roadmap • Motivation • introduction • Recent Work • Table-Driven XML parsing – TDX • TDX construction Tool Kit • Experiment Results and Preliminary Conclusion

  31. TDX Construction Toolkit Service_flex.l flex tab.yy.c Service.wsdl wsdl2TDX Service_TDX.h Service_TDX.c

  32. Roadmap • Motivation • introduction • Recent Work • Table-Driven XML parsing – TDX • TDX construction Tool Kit • Experiment Results and Preliminary Conclusion

  33. Experiment Setup • Compare with • DFA-based Parser • gSOAP 2.7 • eXpat 1.2 • Xerces 2.7.0 • Memory-resident XML message • Elapsed real time using timeofday()

  34. Parsing Performance(1)

  35. Parsing Performance (2)

  36. Conclusion • Enhance parsing speed • Flexible framework • Encoding value-based validation and application-specific events as semantic rules • Combining structural, syntactic and semantic constraints in one pass • High-level of modularity

More Related