1 / 25

RAINDROP: XML Stream Processing Engine

RAINDROP: XML Stream Processing Engine. Murali Mani, WPI @UPenn, DB seminar June 08, 2006. Partially Supported by NSF grant IIS 0414567. Acknowledgements. NSF for the financial support Joint work with several others Prof. Elke A. Rundensteiner

bernad
Download Presentation

RAINDROP: XML Stream Processing Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RAINDROP: XML Stream Processing Engine Murali Mani, WPI @UPenn, DB seminar June 08, 2006 Partially Supported by NSF grant IIS 0414567

  2. Acknowledgements • NSF for the financial support • Joint work with several others • Prof. Elke A. Rundensteiner • Graduate students – Hong Su, Ming Li, Mingzhu Wei, Shoushen Wang, Jinhui Jian • Undergraduate students – Drew Ditto, Bogomil Tselkov • … DSRG, WPI

  3. Applications • Need for efficient stream data processing • Monitor patient data in real time • Sensor networks – fire detection; battle field deployment; traffic congestion • Others – news delivery, monitor network traffic, … DSRG, WPI

  4. Token-by-Token access manner Pattern retrieval + Filtering + Restructuring timeline for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return <InterestedAuction> $a, $e </InterestedAuction> <auction> <privacy> No <open_auctions> XML Stream Processing <open_auctions> <auction> <privacy>No</privacy> <description> Calendar of <emph>French Impressionism</emph>by<emph>Monet </emph> </description> <initial> $20 </initial> </auction> … DSRG, WPI

  5. privacy auctions auction 1 0 2 3 5 4 for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return <Auction>$a, $e </Auction> description emph Option 1: Automata-Based Pattern Retrieval • Additional Data Structures for • Buffering • Filtering • Restructuring • … When patterns are retrieved depends on the data DSRG, WPI

  6. Tagger Tagger Rewrite by “pushing down selection” Navigate $a, /privacy->$p Select $e = “French Impressionism” Select $e=“French Impressionism” Navigate $a, /privacy-> $p Navigate $a, /description/emph->$e Navigate $a,/description/emph->$e Rewritten Logic Plan Logic Plan for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return <Auction> $a, $e </Auction> Tagger Navigate-Scan $a, /privacy -> $p Choose low-level implementation alternatives Select $e = “French Impressionism” Navigate-Index $a, /description/emph -> $e Physical Plan Option 2: “DOM” Based Pattern Retrieval When patterns are retrieved depends on other patterns DSRG, WPI

  7. Which paradigm is better? Minimal pushdown plans win over maximal pushdown when selectivity < 50% DSRG, WPI

  8. Problem • How to provide the framework to choose between these paradigms? • Model both paradigms uniformly as algebraic operators. • Use a cost model to choose optimal plan given data statistics. DSRG, WPI

  9. privacy auctions auction 1 0 2 3 5 4 for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return <Auction> $a, $e </Auction> description emph Automaton as TokenNav StructuralJoin $a Select $e=“French …” Select non-empty($b) Extract $a Extract $b Extract $e TokenNav $a, /privacy->$b TokenNav $a,/desc/emph->$e TokenNav $s, /auctions/auction->$a DSRG, WPI

  10. auctions auction 1 0 2 for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return <Auction> $a, $e </Auction> DOM Navigation as NodeNav Select $e=“French …” Select non-empty($b) NodeNav $a, /privacy->$b NodeNav $a,/desc/emph->$e Extract $a TokenNav $s, /auctions/auction->$a DSRG, WPI

  11. Exploring the Search Space • A pattern can be retrieved inside the automaton or outside the automaton • However there are dependencies for $a in …/a, $b in $a/…, $c in $b/… NodeNav for $b => NodeNav for $c TokenNav for $b => TokenNav/NodeNav for $c DSRG, WPI

  12. Run-time Optimization • Statistics unknown before data arrives • Statistics could change over time • We need techniques for efficient statistics monitoring, search space exploration and plan migration (safe points for migration) DSRG, WPI

  13. New Query plan Initial Query plan Run-time Optimization statistics • Create an initial plan • Run initial plan and collect statistics at same time • Generate new plan using statistics collected • Pause receiving stream • Migrate to new plan • Resume receiving stream Query plan executor Stream Query Optimizer New Query plan Plan Migrator DSRG, WPI

  14. Executing a Raindrop Plan DSRG, WPI

  15. Key Ideas • Minimum Memory requirements • Discard data early • Output data early DSRG, WPI

  16. privacy auctions auction 1 0 2 3 5 4 for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return <Auction> $a, $e </Auction> description emph In-Time Structural Join StructuralJoin $a Select $e=“French …” Select non-empty($b) Extract $a Extract $b Extract $e TokenNav $a, /privacy->$b TokenNav $a,/desc/emph->$e TokenNav $s, /auctions/auction->$a DSRG, WPI

  17. root 0 1 2 3 for $r in /root return <root> <a>$r/a</a> <b>$r/b</b> </root> Better than In-Time Structural Join StructuralJoin $r Extract $b Extract $a a TokenNav $r, /a->$a b TokenNav $r, /b->$b “a” tokens need not be stored TokenNav $s, /root->$r DSRG, WPI

  18. root 0 1 2 3 for $r in /root where $r/a = “value” return <root> <b>$r/b</b> </root> Evaluating Predicates StructuralJoin $r Extract $b Select $a=“value” a Extract $a b TokenNav $r, /b->$b TokenNav $r, /a->$a Once $a=“value” is satisfied, “b” tokens need not be stored TokenNav $s, /root->$r DSRG, WPI

  19. root 0 1 2 3 for $r in /root return <root> <a>$r/a</a> <b>$r/b</b> </root> Using schema knowledge root -> (a*, b*) StructuralJoin $a Extract $b Extract $a a TokenNav $r, /a->$a b TokenNav $r, /b->$b “a”, “b” tokens need not be stored TokenNav $s, /root->$r DSRG, WPI

  20. root 0 1 2 3 for $r in /root where $r/a = “value” return <root> <b>$r/b</b> </root> Using Schema Knowledge for Predicates root -> (b*, a*, c) StructuralJoin $r Extract $b Select $a=“value” a Extract $a b TokenNav $r, /b->$b TokenNav $r, /a->$a Once “c” is seen and $a=“value” is not yet satisfied, “b” tokens can be discarded TokenNav $s, /root->$r DSRG, WPI

  21. Conclusions • Raindrop integrates automaton and “DOM” navigation into one algebraic framework. • Cost-based optimization possible. • Execution minimizes memory requirements. DSRG, WPI

  22. Ongoing Work • Load shedding in XML stream processing. • Utilizing Dynamic schema changes for optimization. DSRG, WPI

  23. Fragment of XQuery supported • FLWR expressions (no conditionals/user defined functions) • Path expressions use only forward axes (child, descendant, descendant or self, attribute) • Predicates supported are of the form: pathExpr relOp constant DSRG, WPI

  24. Issues with correlated queries for $r in /root return <root> for $a in $r/a return <a>$r/b</a> </root> DSRG, WPI

  25. Visit our XQuery engine over XML stream project (RAINDROP) website http://davis.wpi.edu/dsrg/raindrop/ DSRG, WPI

More Related