1 / 30

A Unified Model for XQuery Evaluation over XML Data Streams

A Unified Model for XQuery Evaluation over XML Data Streams. Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003. Need for Stream Processing. New environment Data sources are everywhere Data requests are everywhere New applications Sensor networks

bvernon
Download Presentation

A Unified Model for XQuery Evaluation over XML Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

  2. Need for Stream Processing • New environment • Data sources are everywhere • Data requests are everywhere • New applications • Sensor networks • Analysis of XML web logs • Selective dissemination of XML information (e.g., news)

  3. Token-by-Token access manner Pattern retrieval + Filtering/Restructuring <biditems> <book> <title> Dream Catcher </title> … FOR $b in doc (biditems.xml)//book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return<Inexpensive>$t</Inexpensive> timeline Token: not a direct counterpart of a tuple Specific Challenges for XML Streams <biditems> <book year=“2001"> <title>Dream Catcher</title> <author><last>King</last><first>S.</first></author> <publisher>Bt Bound </publisher> <price> 20 </price> </book> …

  4. Two Computation Paradigms • Automata-based [yfilter02, x-scan01, xsm02, xsq03, xpush03…] • Algebraic [niagara00, …] This project intends to integrate both paradigms into one

  5. Automata Paradigm: • Auxiliary structures for: • Buffering data • Evaluating predicates • Restructuring buffered data • … FOR $b in stream(biditems.xml) //book LET $p = $b/price/text(), $t = $b/title WHERE $p < 30 RETURN <Inexpensive>$t</Inexpensive> //book/title title 4 * book 1 2 price Text() 3 5 //book //book/price/text()

  6. Tagger Navigate //book, title Selection push-down enabled Tagger Select price < 30 Select price < 30 Navigate //book, title Navigate //book, price Navigate //book, price Algebraic Computation FOR $b in doc (biditems.xml) //book LET $p = $b/price/text(), $t = $b/title WHERE $p < 30 RETURN <Inexpensive>$t</Inexpensive> book book book title author publisher price Text Text Text last first Text Text Navigate //book, /title

  7. Observations • Automata paradigm • Good and long studied for pattern retrieval on tokens • Patches needed for complex filtering and restructuring • Algebraic paradigm • Good and long studied for expressing and optimizing query plans on sets oftuples • Tokenized inputs not accommodated yet Either paradigm has deficiencies Both patterns complement each other

  8. Research Challenges • How to integrate the two models? • How to optimize a query within the integrated query model?

  9. Raindrop Approach:Uniform Modeling in an Algebraic Framework

  10. Uniform Algebraic Plan Query answer Algebraic Plan XML data stream

  11. Uniform Algebraic Plan Tuple-based plan Query answer Tuple stream Token-based plan (automata plan) XML data stream

  12. Modeling the Automata in Algebraic Plan:Black Box[xscan] vs. White Box FOR $b in stream(biditems.xml) //book LET $p = $b/price/text(), $t = $b/title WHERE $p < 30 RETURN <Inexpensive>$t</Inexpensive> $b := //book $p := $b/price $t := $b/title SJoin //book Xscan Extract //book/price Extract //book/title White Box Black Box

  13. A Unified Process at the Logical View FOR $b in doc (biditems.xml) //book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return <Inexpensive> $t </Inexpensive> Tuple-based plan Token-based plan (automata plan)

  14. SJoin //book Extract $p, //book/price Extract $t, //book/title A Unified Process at the Logical View FOR $b in doc (biditems.xml) //book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return <Inexpensive> $t </Inexpensive> Tuple-based plan

  15. Navigate //book, //book/title Select //book/price >5 0 SJoin //book Extract //book/price Extract //book/title A Unified Process at the Logical View FOR $b in doc (biditems.xml)//book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return<Inexpensive>$t</Inexpensive>

  16. The Algebra Core Relational-like XML-Specific SJ

  17. Extract Operator Extract //book/title * book title 1 1 2 <bib> <book> <title> Dream Catcher </title> … </book>…

  18. Structural Join Operator FOR $b in doc (biditems.xml)//book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return<Inexpensive>$t</Inexpensive> SJoin //book Extract //book/title Extract //book/price * title 3 book 1 2 price 4 <biditems> <book> <title> Dream Catcher </title> … </book>…

  19. Optimization via Query Rewriting

  20. In or Out? Tuple-based Plan Query answer Pattern retrieval Tuple stream Token-based plan (automata plan) XML data stream

  21. Plan Alternatives Tagger Tagger Navigate book/title Select price < 30 Select price<30 Navigate /price SJoin //book Extract //book Extract //book/title Extract //book/price The pull-out plan The push-in plan

  22. <book>…… </book> <title>…</title> <price>…</price> <book>…… </book> <title>…</title> <price>…</price> <book>…… </book> <title>…</title> <book>…… </book> <title>…</title> <book year=“2001"> <title>Dream Catcher</title> <author> <last> King </last> <first> S. </first> </author> <publisher> Bt Bound </publisher> <price> 20 </price> </book> <book>…… </book> * title 4 book 1 2 price * book 3 1 2 Out of Automata(/title, /price) Pattern Retrieval Alternatives SJ t2 t2 t10 t10 In Automata (/title, /price)

  23. Experiment: Selectivity = 5% Selectivity = 90%

  24. Related Work

  25. 0,0,0 *r=er|r++ *r=sr|r++ *r=<a>|w(x,sx),w(x,<a>),r++,x”++ 1,0,0 *r!=<a>|r++ *r=</a>|w(x,</a>),w(x,ex),r++,xs=x 2,1,0 *r!=</a>&*r!=</b>|w(x,*r),r++,x”++ *r=<b>|w(x,<b>),r++ 2,2,1 *true|xm=x’, w(o,<res>),w(o,<b>),x’++ !AE(x”)&*x”=ex|xs=x” 2,2,2 *r!=</a>&*r!=</b>|w(x,*r),w(o,*r),x”++,r++ *r=</b>|w(x,</b>),w(o,</b>),r++,x”++ 2,1,3 AE(x’)&*r!=</a>|w(x,*r),w(o,*r),r++,x”++ !AE(x’)&*x’!=ex|w(o,*x’),x’++ AE(x’)&*r=</a>|w(x,</a>),w(o,</a>),w(x,ex),r++,x’++ 1,1,3 !AE(x’)&x’!=ex|w(o,*x’),x’++ !AE(x”)&x”=</b>|w(o,</b>),x”++ 1,2,2 1,1,0 !AE(x”)&*x”!=</b>|w(o,*x”),x”++ !AE(x”)&*x”!=<b>&*x”!=ex|x”++ !AE(x”)&*x”=<b>|x”++ 1,2,1 True|xm=x’,w(o,<res>),w(o,<b>),x’++ Camp 1: Complete Automata Model [XSQ, XSM, XPush] For $x in $R/a return for $Y in $X/b return <res>$Y, $X </res>

  26. Camp 1: Complete Automata Model [XSQ, XSM, XPush] • All details are presented on the same level (and low level!) • Hard to understand • Not suitable for optimizing at different levels • Little has been studied for using automata as query processing paradigm

  27. $b $p $t Camp 2: Automata-Algebra Loosely Coupled Model [Tukwila, YFilter] • Fixed interface for automata computation (all pattern retrieval pushed down) • No opportunity of pushing/pulling computation into/from automata • Bloated, black box operator • Algebraic rewriting impossible for internal optimization Automata Plan $b := //book $p := //book/price $t := //book/title

  28. Contributions • Combining automata and algebra leads to a powerful query processing model • Modeling: • Uniform, simple logical view – better understandability • Optimization: • Uniform rewriting – more optimization opportunities (e.g., pushin/pullout) • Optimization necessity is verified by experiments

  29. http://davis.wpi.edu/dsrg/raindrop/ Project Overview Publications Talks Email: suhong@cs.wpi.edu

  30. Experiment 2 Number of patterns = 2 Number of patterns = 20

More Related