140 likes | 251 Views
An Algorithm for Streaming XPath Processing with Forward and Backward Axes. Charles Barton, Philippe Charles, Deepak Goyal, Mukund Raghavchari IBM T. J. Watson Research Center, Marcus Fontoura, Vanja Josifovski IBM Almaden Research Center Published at ICDE 2003 Presented by
E N D
An Algorithm for Streaming XPath Processing with Forward and Backward Axes Charles Barton, Philippe Charles, Deepak Goyal, Mukund Raghavchari IBM T. J. Watson Research Center, Marcus Fontoura, Vanja Josifovski IBM Almaden Research Center Published at ICDE 2003 Presented by Amir Bar-or, Technion
Overview • Background Information • Evolution of query processing • XML processing • Example Document • Used Concepts • X-tree • X-dag • XAOS • Algorithm Filtering Events • Building Matching-Structures • Emitting Output • Walk through • Experimental results
Transactional • Low to medium update rate • Disk resident data • Transactional • Instant • Accurate • Static optimizations • Index Classical • Transactional • Low to medium update rate • Disk resident data • Transactional/Non transactional • Continuous • Accurate • Static optimizations • Index Publish subscribe The evolution of query processing Update model Query model
Non - transactional • High update rate • Data is too big and cannot be stored efficiently on disks. • Non - Transactional • Continuous • Approximated • Dynamic optimizations • Limited Buffering Streaming The evolution of query processing Update model Query model The close relatives of streaming algorithms are the one-pass algorithms.
XML processing XML parser • Dom approach • Build in-core representations • Process as needed by standard API • Disadvantages: • Scalability– cannot process large documents • Locality– multiple traversals • Algorithm inefficiencies– API’s perform unnecessary traversals • SAX approach • Use a streaming event base API for on the fly parsing of XML • Disadvantages: • Programmability : low level event handling • Lack of support for Xpath, (especially with parent/ ancestor axes) Build DOM tree Process DOM tree (XPath,XQuery,..)
Caoz Aproach Caoz (chaos): an acronym for XML Analysis, Optimization,and Stuff. XPath Expression Results XML Doc XML Parser Filter Match Specialized XPath processor Parsing events: SAX,DOM,Custom
Background Information • Restricted XPath Set: • loc path: / step • predicate: [ ] • nodetest • axis specifier: ancestor, parent, child, descendant
Example document Nodename (id, level) Root(0,0) <X> <Y> <Z> <V/> <V/> <W> <W/> </ W> </ Z> <U/> </ Y> <Y> <Z> <W/> </ Z> </ Y> </ X> X(1,1) Y(2,2) Y(9,2) Z(3,3) U(8,3) Z(10,3) V(4,4) V(5,4) W(6,4) W(11,4) W(7,5)
X-Tree /descendant:: Y[ child:: U]/ descendant:: W[ ancestor:: Z/ child:: V] XPath expression is transformed into a rooted tree, the X- tree • Vertices of a X- tree are called X- nodes • Nodetests in the expression are translated into X- nodes • Unique incoming edges. labeled with the specified axis • One X- node is marked as 'Output X- node' Root Root descendent Y child descendent U W ancestor Z child V
X-Dag • X-Dag is generated from the X-tree by reformatting the reverse axis into forward axis: • Reverse direction • Ancestor Descendant • Parent Child • Handle Orphan nodes • Add descendent axe from Root to orphan nodes
X-tree X-dag Root Root descendent descendent descendent Y Z Y child descendent descendent child child descendent U W U W V ancestor Z child V /descendent::Y[child::U]/descendent::W[ancestor::Z/child::V]
Matching • A matching for an x-tree X is a partial mapping from the x-nodes to the elements of document D where • All mapped vertices satisfy the node test • The edge between two mapped vertices describes the relationship between the mapped elements in the document • A total matching exists if all the nodes of the x-tree are mapped. • It is easy to show that an element e is in the result of the evaluation of xpath expression iff there is a total matching for the corresponding x-tree. The same argument can be proven for an x-dag. • A total matching of an x-tree node v, is composed of total matching at each of the children of v. • This is not true for an x-dag node.
X-tree X-dag Root Root descendent descendent descendent Y Z Y child descendent descendent child child descendent U W U W V ancestor Z child V <Y> <U/> <W/> <Z> <W/> <V/> </ Z> </ Y> <Y> <Z> <W> </W> <Q/> </ Z> </ Y> /descendent::Y[child::U]/descendent::W[ancestor::Z/child::V]
Non - transactional • High update rate • Data is too big and cannot be stored efficiently on disks. • Non - Transactional • Continuous • Approximated • Dynamic optimizations • Limited Buffering Streaming XAOS properties Update model Query model