1 / 19

XML Data Binding: Encoding for High-Performance Content-Based Event Routing

XML Data Binding: Encoding for High-Performance Content-Based Event Routing. Gail Kaiser Phil Gross Columbia University Programming Systems Lab. Overview. PSL Intro MEET Project Encoding Conversion Efficiency Encoding Size Efficiency Encoding Classification Efficiency.

Download Presentation

XML Data Binding: Encoding for High-Performance Content-Based Event Routing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML Data Binding:Encoding for High-Performance Content-Based Event Routing Gail Kaiser Phil Gross Columbia University Programming Systems Lab

  2. Overview • PSL Intro • MEET Project • Encoding Conversion Efficiency • Encoding Size Efficiency • Encoding Classification Efficiency

  3. Programming Systems Lab • “PSL conducts research on Web technologies, collaborative work, virtual worlds, process/workflow, extended transaction models, software development environments and tools, software engineering, information management, and distributed programming systems” • Lately, lots of XML stuff

  4. PSL XML-related Research • FlexML: Flexible XML • Open-ended XML streams that may include “new” tags • Dynamic schema and semantics discovery and composition • XUES: XML-based Universal Event Service • Event Packager: Data mining over XML structured data • Event Distiller: XML event poset pattern matching • Learning new application-domain events to recognize • DISCUS: Decentralized Information Spaces for Composition and Unification of Services • Rapid and secure application composition using Web Services • Trust Evolution: PGP Trust + KeyNote + real-world business

  5. MEET • Multiply Extensible Event Transport • Content-based multicast routing • Must be efficient enough for embedded and high-performance applications

  6. MEET Motivations • Personal Life Recorder (sensor oriented) • GroupWork Recorder (computer/DB oriented) • Parallel/Grid computing • Distributed simulation • Battlefield C4I • Last, but not least: • Dissertation submission

  7. Machine A Relational Machine B XML Relationship to Other Work • Generally modeling communication like • What actually goes over the line is afterthought • But with N-Way Internet-scale communication • Millions of publishers and subscribers • We can (must!) do better than ASCII text… • Line speed => ≈250 assembly instructions per packet

  8. MEET Extensibility • Want to scale up, to millions of pubs and subs • Want to scale down, to embedded and wireless • No single solution satisfactory at all scales • Composed of hot-swappable subsystems • Router, transports, clock/causality, types, etc.

  9. Why Types • Event data is not just an opaque bag of bits • Subscriptions are Boolean functions over events • Type safety would be nice • What type system to use?

  10. Initial MEET Type Design • Initial design calls for supporting Java, C#, and XML Schema defined objects “out of the box” • XML Schema used as Ur-language/Esperanto for conversions • Subscriptions are arbitrary boolean functions on datatypes • XML Schema is not ideal ur-type • Excessively complex, verbose, etc.

  11. Encodings for Efficiency • Java, C#, XML, ASN.1 have well-defined but proprietary encodings for instances • Would be nice to have an independent encoding scheme with some desirable properties missing from the above • Fast serialization/deserialization • Elimination of redundant information from message sequences • Data organized for rapid classification/routing

  12. Conversion Efficiency • Need to get to and from wire format as fast as possible • Leverage homogeneity to eliminate unnecessary conversions, e.g., network byte order • ECho system from Eisenhauer et. al., Georgia Tech • Using “native data” for ultra-low latency • Necessary for HPC

  13. Size Efficiency • Ideal for single message is self-describing data • With multiple messages of same type, one can pull out redundant type info, e.g., schema • Goal is to go further: If 90% of content of messages is the same, generate a new subtype with fixed values • From self-describing to all-schema is a continuum

  14. Classification Efficiency • When bits start arriving serially at the router, would like to begin cut-through routing as soon as possible • Avoid the curse of IP/IPv6: source address first • Want key routing bits as close to the front as possible • Want data in fixed locations

  15. Fast Classifying: First Things First • In the packet, type info first (after magic) • Would like to represent type codes as bit string with “most significant” info e.g. parent type first, followed by subtype identifier, sub-subtype, etc. • Need access to type hierarchy • Popular classification fields at the front • Need to tag with popularity metadata • “subscribers will want to select on me”

  16. Fast Classifying: Fixed Positions • Would like to avoid scanning through long or variable-length fields • Long/Variable data needs to be in a separate channel/section • Primitives and fixed-length references at the front • References point into data section • Classifier can jump large, uninteresting data quickly

  17. Plus: Schema Format • We’d like the schema format to be amenable to programmatic manipulation and analysis • For instance, when negotiating formats, we’d like to be able to compute how our original format offer differs from the counter-offer • XML Schema is pretty good for this

  18. Conclusions • Efficient instance transfer is an interesting case for data-binding • Special needs for efficiency • But we can negotiate our own format among the communicating parties • Some explicit support for this in a general data-binding solution could help acceptance

More Related