a query algebra for fragmented xml stream data l.
Skip this Video
Loading SlideShow in 5 Seconds..
A Query Algebra for Fragmented XML Stream Data PowerPoint Presentation
Download Presentation
A Query Algebra for Fragmented XML Stream Data

Loading in 2 Seconds...

play fullscreen
1 / 16

A Query Algebra for Fragmented XML Stream Data - PowerPoint PPT Presentation

  • Uploaded on

A Query Algebra for Fragmented XML Stream Data. Sujoe Bose Leonidas Fegaras David Levine Vamsi Chaluvadi University of Texas at Arlington. Processing Streamed XML Data. Most web servers are pull-based: A client submits a request, the server returns the requested data.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'A Query Algebra for Fragmented XML Stream Data' - oro

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a query algebra for fragmented xml stream data

A Query Algebra for FragmentedXML Stream Data

Sujoe Bose

Leonidas Fegaras

David Levine

Vamsi Chaluvadi

University of Texas at Arlington

processing streamed xml data
Processing Streamed XML Data

Most web servers are pull-based:

A client submits a request, the server returns the requested data.

This doesn’t scale very well for large number of clients and large query results.

Alternative method: pushed-based dissemination

  • The server broadcasts/multicasts data in a continuous stream
  • The client connects to multiple streams and evaluates queries locally
  • No handshaking, no error-correction
  • All processing is done at the client side
  • The only task performed by the server is slicing, scheduling, and broadcasting data:
    • Critical data may be repeated more often than no-critical data
    • Invalid data may be revoked
    • New updates may be broadcast as soon as they become available.
a framework for processing xml streams
A Framework for Processing XML Streams
  • The server slices an XML data source into XML fragments. Each fragment:
    • is a filler that fills a hole
    • may contain holes which can be filled by other fragments
    • is wrapped with control information, such as its unique hole ID, the path that reaches this fragment, etc.
  • The client opens connections to streams and evaluates XQueries against these streams
    • For large streams, it’s a bad idea to reconstruct the streamed data in client’s memory
      • need to process fragments as soon they become available from the server
    • There are blocking operators that require unbounded memory:
      • Sorting
      • Joins between two streams or self-joins
      • Group-by with aggregation.
the fragmented hole filler model
The Fragmented Hole-Filler Model



<name> Wal-Mart </name>


<stream:hole id="10" tsid="5"/>

<stream:hole id="20" tsid="5"/>





<stream:filler id="10" tsid="5">


<name> PDA </name>

<make> HP </make>

<model> PalmPilot </model>

<price currency="USD">315.25<price>



<stream:filler id="20" tsid="5">


<name> Calculator </name>

<make> Casio </make>

<model> FX-100 </model>

<price currency="USD">50.25<price>



an algebra for stored xml data
An Algebra for Stored XML Data

Based on the nested-relational algebra:

v(T) access the XML data source T using v

pred(X) select fragments from X that satisfy pred

v1,….,vn(X) project

X  Y merge

X predY join

predv,path (X)unnest (retrieve descendents of elements)

pred,h (X)apply h and reduce by 

gs,predv,,h(X) group-by gs, apply h to each group,

and reduce each group by


v(T) = { < v = T > }

pred(X) = { t | t  X, pred(t) }

v1,….,vn(X) = { <v1=t.v1,…,vn=t.vn> | t  X }

X  Y = X ++ Y

X predY = { tx ty | tx X, ty  Y, pred(tx,ty) }

predv,path(X)={ t  <v=w> | t  X, w  PATH(t,path), pred(t,w) }

pred,h (X)= /{ h(t) | t  X, pred(t) }

gs,predv,,h (X) = …

example 1 cont
Example #1 (cont.)


$b/publisher=“Addison-Wesley” and $b/@year > 1991





example 2
Example #2

for $u in document(“users.xml”)//user_tuple

return <user> { $u/name }

{ for $b in document(“bids.xml”)//bid_tuple[userid=$u/userid]/itemno

$i in document(“items.xml”)//item_tuple[itemno=$b]

return <bid> { $i/description/text() } </bid>

sortby(.) }



sort, elem(“bid”,$i/description/text())


sort($u/name), elem(“user”,$u/name++)
















xpath expressions
XPath Expressions
  • Path evaluation is central to the algebra:

PATH: ( XML-data, simple-XPath )  set(XML-data)

  • Some rules for stored XML data:

PATH(<A>x</A>,A/path) = PATH(x,path)

PATH(<A>x</A>,A) = { <A>x</A> }

PATH(x1 x2,path) = PATH(x1,path)  PATH(x2,path)

PATH(x,path) =  otherwise

  • Predicates have existential semantics

$v/A/B = “text”  x  PATH(v,A/B): x = “text”

the streamed xml algebra
The Streamed XML Algebra

Much like the stored XML algebra, but works on streams.

A stream  takes the forms:

  • t ; ’ a fragment t followed by the rest of the stream ’
  • eos end-of-stream

Each stored XML algebraic operator has a streamed counterpart

eg, pred(t ; ) = t ; pred() if pred is true for t

pred(t ; ) = pred() otherwise

pred(eos) = eos

but …

we may not be able to validate pred due to holes in t

streamed algebra semantics
Streamed Algebra Semantics
  • To keep the suspended fragments, each streamed algebraic operator has
    • one state 0 for the output and
    • optional state(s) 1/2 for the input(s)
  • The result of PATH may now be unspecified:

PATH(<hole id=“m” …>,path) = PATH(1 (m),path) if m 1

= {  } otherwise

  • When in predicates,  requires 3-value logic
  • Incomplete fragments are suspended when necessary, eg:

pred(t ; ) = t ; pred() if truePATH(t,pred)

pred(t ; ) = pred() otherwise

0  0 {t} if PATH(t,pred)


Much like main-memory symmetric join

  • states:
    • 0 all suspended output tuples due to unfilled holes
    • 1 all tuples from left stream
    • 2 all tuples from right stream
  • a tuple from left stream:

(t1;1) pred2 = { t1 t2 | t22, truePATH(t1 t2,pred) }; (1pred2)

1  1  t1

0  0  { t1 t2 | t22, PATH(t1 t2,pred) }

  • a tuple from right stream:

1pred (t2;2 ) = { t1 t2 | t11, truePATH(t1 t2,pred) }; (1pred2)

2  2  t2

0  0  { t1 t2 | t11, PATH(t1 t2,pred) }

reconstructing the xml data
Reconstructing the XML Data

: set(int  XML-data) is an environment that binds filler ids to XML.

x   replaces holes with fillers in x using the environment :

<A> x </A>   = <A> x   </A>

(x1 x2) = (x1 ) (x2 )

<hole id=“m” …>   = [m] if m

x   = x otherwise

R() returns a pair (a,), where and a is [0] (the reconstructed data):

if R() = (a,) then

R(<filler id=“m” x>; ) =

R(eos) = (,)

Basically, R(t ; ) = f(R())


(x , ) if m=0

(a’, ’) if m0 where ’={(m,x )}  [m/x]

equivalence between stored streamed algebras
Equivalence Between Stored & Streamed Algebras

If we reconstruct the XML document from the streamed fragments and evaluate a query using the stored algebra, we get the same result as when we use the equivalent streamed algebra over the streamed XML fragments and reconstruct the result.


XML document

stored XML algebra



streamed XML algebra

XML fragments

XML fragments

Proof sketch: We prove R(p())=p(R()) inductively, where p is the stream version of p. If truePATH(t,pred), then R(p(t;))=R(t;p())=f(R(p()))=f(p(R()))

=p(f(R())) =p(R(t;)) …

  • Fragmented XML data are easier to handle and synchronize than an infinitely long stream
  • Associating holes with fillers takes care of out-of-sequence transmission, repetitions, replacements, and removals
  • Our streamed algebra has similar operators but different semantics than our stored algebra
  • Our algebra can capture most non-recursive XQueries
  • Our future work includes
    • the development of main-memory algorithms for processing XML data streams under memory and power constraints
    • The development of a comprehensive approach to optimizing XQueries that utilizes our main-memory algorithms.