1 / 7

Eddies for Continuous Queries

This project aims to improve the efficiency of processing continuous queries over endless data streams by introducing Eddies, a novel approach that reduces memory burden and shares common work between modules. The project also focuses on intra-query scheduling and introduces new operators to deal with continuous data streams.

steveb
Download Presentation

Eddies for Continuous Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Eddies for Continuous Queries Sam Madden CS286 Project S01

  2. Motivation • Want many queries over continuous streams of data • Current Eddies • Thread per query • Scanner per source • Share common work between modules • Reduce memory burden • Intra-query scheduling • (Not focusing on joins – Need new operators to deal with endless streams)

  3. Data Structures • One Eddy per Telegraph Instance • Only Source-module for each source (over all queries) • One Filter per Source field (over all queries) • Per-Source State • Source -> Reachable modules • Query -> Completion bitmask • Per-Tuple State • Output query mask • Per Query State: • Output queues • Aggregate information

  4. Tuple Flow • Tuple Arrives • Tagged with source id • Routing policy chooses a filter to route to, based on modules reachable from source • Filter marks query state as “output” for tuples which don’t pass • Tuple output to queries which have completed, using source • If more filters to check, tuple re-inserted into eddy • Works for Joins Too (Somewhat Inefficiently?) • Extend reachability graph across joins • Project out unused sources when tuples are output

  5. Combining Filters • Given a Filter F over some field S.a, with n predicates generalized to be over ranges [a,b] (plus not-equals) • Interval tree for >, >=, <, <= predicates, inserting from interval (a,], [a,], [- , b), or [- , b]. (O(log n)) • When a tuple arrives, find intervals which it itersects. (O(n)) • For = and , use a hash table • For , output all tuples except those in table • Saves routing, tuple parsing cost • Simplifies optimization space

  6. Routing Policy • Random policy routes to each module with equal probability • Ticket policy: from Eddy paper • Route to modules with highest selectivity • Estimate selectivity based on ratio of in/out tuples • Use back-pressure to adjust delivery rates • Multi-query Ticket policy • Estimate selectivity based on ratio of (number of applied predicates /number of passed predicates) • Based on Shankar’s implementation: back pressure not applied properly

  7. Preliminary Results • Simple, four query test: from s select s.index where s.a > 30 from s select s.index where s.b > 30 and s.a > 30 from s select s.index where s.c > 30 and s.b> 30 and s.a > 30 from s select s.index where s.d > 30 and s.c > 30 and s.b > 30 and s.a > 30 • Becomes five modules: one scanner and four filters

More Related