1 / 80

XPipe - An XML Processing Methodology

XPipe - An XML Processing Methodology. XML SIG, NY USA Feb 12, 2002 Sean McGrath CTO Propylon. What is XPipe?. It is an architecture / methodology /framework for developing robust, scaleable, manageable XML processing systems.

neith
Download Presentation

XPipe - An XML Processing Methodology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XPipe - An XML Processing Methodology XML SIG, NY USA Feb 12, 2002 Sean McGrath CTO Propylon

  2. What is XPipe? • It is an architecture / methodology /framework for developing robust, scaleable, manageable XML processing systems. • based on proven mechanical manufacturing techniques. Specifically: • The Assembly Line Principle • Component assembly and component re-use

  3. What is XPipe? • An open source project hosted on Sourceforge • http://xpipe.sourceforge.net • A contribution to the blossoming meme of using pipeline based processing to tame the burgeoning complexity of XML transformations • (If you do not find XML transformation complicated, you are not sufficiently well informed.) • (And no, XSLT does not solve all your problems)

  4. What is XPipe? • A way of thinking about systems that focuses on structured dataflows rather than Object APIs • It is also: • A Scandinavian sewage treatment technology • An exhaust pipe system for high performance engines • A VT100 based strategy game for DECs VAX/VMS Operating System

  5. Contents of this talk • The XPipe philosophy • Major functional elements • Some examples • The XGrid and Commoditized XML Processing • Some anticipated objections (and answers) • Relationship to other technologies

  6. Contents of this talk • Current status • Current problems • Future plans • Some (contentious) musings • Something cold to drink

  7. XPipe Philosophy • XML is all about (potentially) complex, hierarchical data structures

  8. XPipe Philosophy Cars are complex, hierarchical structures Henry Ford’s Model T Ford Assembly Line – 1914

  9. XPipe Philosophy Lunch is a complex, hierarchical structure Lunch Assembly Line. NY, 2002

  10. XPipe Philosophy We are complex, hierarchical structures

  11. XPipe philosophy • What have these scenes got it common? • Complex construction of cars, tuna melts and tendons made possible and efficient through • assembly line manufacturing • re-usable component processes and component materials • Why not apply this approach to XML “manufacturing”?

  12. XPipe philosophy • Why does the assembly line approach work? • Transformation task decomposition • Re-usable transformation components • Transformation decomposition is the key to complexity management. Just ask: • Henry Ford • Herbert Simon (The Two Watchmakers – “The Architecture of Complexity”) • George Miller (7+/-2) • Adam Smith (An Inquiry into the Nature And Causes of the Wealth of Nations,1776) • Any electrical or chemical engineer.

  13. XPipe philosophy • Component re-use is the key to productivity • Ask any form of engineer (electrical, chemical etc.) apart from software engineers… • Component re-use remains a holy grail in software engineering • XPipe is yet another attempt…

  14. XPipe philosophy • A lot of data processing for the forseable future will consist of XML to XML transformation • A lot of non-XML data processing can consist of XML to XML transformations with the addition of top and tail transformations • Mantra • Get data into XML as quickly as possible • Keep it in XML until the last possible minute • Bring all your XML tools to bear on solving the data processing problem

  15. XPipe philosophy Input XML Output XML Top Transformation Tail Transformation Non-XML Input Non-XML Output

  16. XPipe philosophy • The philosophy hinges on the fact that every complex XML transformation can be broken down into a series of smaller ones than can be chained together

  17. XPipe philosophy • Only so many ways to re-arrange an XML tree structure • A finite number of fundamental transformations, from which all higher order transformations can be derived

  18. XPipe philosophy • Transformation Decomposition leads to • a series of small, manageable, “stand alone” problems with an XML input “spec” and an XML output “spec”. • Can build, test, use and then re-use these transformation components • Very team development friendly • High cohesion, loose coupling – just like the professor advised

  19. XPipe philosophy • Pipeline approach means you can mix ‘n’match black-box components that internally use whatever paradigm best suited the problem • Lexical • SAX • DOM • XSLT • XDuce, Pyxie, Haskell, AF-NG…

  20. Sample XPipe DB /CMS Character Set Mods Add Doctype + validate + strip doctype Lexical Re-arrange Elements Validation Lexical DOM Stats + FTP Schematron/ RelaxNG/ Rhino SQL Replace Jython XHTML Generate Java XSLT

  21. XPipe philosophy • Assertion : developers would use a component based approach to XML processing if they did not have to write the plumbing (orchestration, exception handling) themselves • “Gee, this problem is complex. Maybe I’ll do it in multiple stages! Gee, now I have to orchestrate the stages somehow. Batch files/shell scripts/driver program – all ugly and error prone. Maybe I’ll just write a single program after all…”

  22. XPipe philosophy • “Professional developers spend 50 percent of their time writing plumbing” – Adam Bosworth • XPipe aims to look after the plumbing letting developers concentrate on the interesting stuff

  23. Philosophy Summary • Preambles • Make things as complex as necessary but not more complex than necessary • Solve all the worlds problems – but only one at a time • Don’t even think about performance until it is too late – then it will look after itself • Only increase complexity linearly w.r.t. functionality and only in “elevator pitch sized” functionality quanta

  24. Philosophy Summary – 1#2 • Data processing == data transformation w.r.t. time. • XML is the current runaway winner in the self-descriptive data stakes and a very good QDDL (Quiescent Data Description Language)

  25. Philosophy summary – 2#2 • Inside every complex XML transformation is a sequence of simpler XML transformations trying to get out – a Pipe • Decomposed transformation = new transformations + already componentized transformations -> Component Reuse • Inside every graph transformation (read “workflow” or “business process model”) is a combination of simple Pipes trying to get out

  26. Leveled architetecture – levels build on one another but any level is usable independently of higher levels XPipe Philosophy Out Level 2 - XRigs In Out Level 1 - XPipes In Out Level 0 - XComponents In Out

  27. Major Functional Elements – XComponents In Out • Developed in any language that runs on the Java Virtual Machine (Jython, Java, XSLT, Rhino (JavaScript) etc.) • All XComponents are standalone programs of the form • [Name] [InputXML] [OutputXML] [ErrorXML] [Optional Args]

  28. Major Functional Elements - XComponents • XComponents described in XML form. An XComponent consists of: • Metadata (keywords etc.) • Documentation • Pre and Post Conditions • Unit Tests (input,output XML stream pairs + Pre/Post Conditions) • Code (Java / Jython / XSLT / Exec)

  29. Major Functional Elements – XPipes In Out • A linear assembly of XComponents that together achieve some useful transformation function • Described in XML • Documentation • Metadata (keywords etc.) • Pre/Post conditions • Unit Tests (input,output XML stream pairs + Pre/Post Conditions) • References to XComponents (URIs) which are resolved when the XPipe is installed/executed

  30. Major Functional Elements – XRigs Out In In Out • An assembly of XPipes that together achieve some useful transformation function • Described in XML • Documentation • Metadata (keywords etc.) • Pre/Post conditions • Unit Tests (input,output XML stream pairs + Pre/Post Conditions) • References to XPipes (URIs) which are resolved when the XRig is installed/executed

  31. Major Functional Elements • Unit Testers • XComponent, XPipe and XRig level Test Harnesses • Executives • XComponent, XPipe and XRig level Execution Environments (on-the-fly, disk install, compiled, web service…) • (Executing an Xcomponent is identical to executing an XPipe of arity 1, is identical to executing an XRig of arity 1…)

  32. Major Functional Elements • Executives • Uniprocessor Execution • Executed on 1 CPU, possibly with separate threads for each instantiated X* • Multiprocessor Execution (Vapor) • XML based protocol to implement “Job Shop” work distribution over a P2P network (XJCL)

  33. Major Functional Elements – XPipe Monitor (Vapor)

  34. Major Functionality Elements – Miscellany (Vapor) • Whizzy GUI Component and Pipe Editors • XComponent Creators • “Wrap” Java, XSLT etc. into XComponent compliant XML, Ant build target • XComponent Proxies – “pretend” to be a simple XComponent but invoke some external functionality – from Windows DLL to SOAP end-point • XPipe masquerading as XComponent – this could be a very powerful paradigm

  35. Major Functionality Elements – Miscellany (Vapor) • Compilers / Packers • Pack XPipes/XRigs into standalone XPipes/XRigs for distribution (with or without an executive) • Compile pure XSLT XPipe into a self contained translet (self contained or as an XComponent) • “Compile away”/optimize intermediate files via a variety of tricks (Jackson Inversion, Java IO hook, shadow marshalling etc.)

  36. Simple XComponent examples • Fundamental Operation – Rename Element • Rename • Input : <foo>baz</foo> • Output: <bar>baz</bar> foo bar baz baz

  37. Simple XComponent examples • Fundamental Operation - Peel • Input : <foo><bar>baz</bar></foo> • Output: <foo>baz</foo> foo foo bar baz baz

  38. Simple XComponent examples • Compound Operation - Matryoshka • Input: • <foo><bar>baz</bar></foo> • Output: • <foo></foo><bar></bar>baz foo bar foo bar baz baz

  39. Simple XComponent examples • KlingonCloak • Input: • <foo><bar>baz</bar></foo> • Output: • <tag name=“foo”><tag name=“bar”>baz</tag></tag> foo tag type=“foo” bar tag type=“bar” baz baz

  40. Sample XComponents • Once you start thinking in terms of Pipes – components appear everywhere: • Regular fragmentations • Doctype changer • Namespace normalizer • Character set transcoder • Hash generator • Architectural Forms • RelaxNG/Schematron etc • A validator can be thought of as a component in an XPipe that mirrors its input on its output

  41. Sample XComponents • Reading a file is an XML to XML transformation • <file>lewisscarrol.xml</file> • <poem><line>Twas brillig, and the slithy tomes, did gyre and gimbal in the wave</line>…</poem>

  42. Sample XComponents • Arithmetic is an XML to XML transformation • <expr>1 + 2</expr> • <res>3</res>

  43. Sample XComponents • Unix pipe utilities e.g. tr • hello world • HELLO WORLD

  44. Sample XComponents • Conditionals are XML to XML transformation “tee junctions” triggered by XPaths if XPath TRUE branch In if XPath if XPath FALSE branch

  45. Validation as an XComponent XML A XML A’ RelaxNG Schematron Jython/Java/JACL XComponent Input Output Validation Log Error

  46. Some related open technologies • | - Unix Pipes • SAX Filters • TRAX • XBeans • Cocoon • axKit • Ant • JXTA • Translets • TupleSpaces

  47. The XGrid • Grid Technologies – computational power “on tap” (http://www.gridforum.org) • The XGrid – computational power “on tap” to execute XPipes/XRigs

  48. The XGrid Out In Out DMZ

  49. Some objections (with some answers) • It will be slow • No it won’t - Premature optimization is the root of all evil! • Speed is a three headed monster. I’m old enough to have left the X axis and currently heading for Y through Z The 3 Axes to Speed

  50. Some objections (with some answers) • It will be slow (cont.) • Massive Parallelism will kill all von Neumann throughput arguments • Documents per second, not seconds per document – throughput is the true measure of XML processing speed • Document fulcra – Locality of reference (Denning) applies to XML processing (more on this later) • A myriad of “compile time” optimizations on XPipes possible • Keep the architecture simple – and speed will sort itself out

More Related