1 / 16

Characterizing Memory Requirements for Queries over Continuous Data Streams

Characterizing Memory Requirements for Queries over Continuous Data Streams. Arvind Arasu, Brian Babcock*, Shivnath Babu, Jon McAlister, Jennifer Widom. Stanford University. *Speaker. Continuous Data Streams. Network traffic data Transaction logs Call records, Web logs, ... Financial data

ena
Download Presentation

Characterizing Memory Requirements for Queries over Continuous Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Characterizing Memory Requirements for Queries over Continuous Data Streams Arvind Arasu, Brian Babcock*, Shivnath Babu, Jon McAlister, Jennifer Widom Stanford University *Speaker

  2. Continuous Data Streams • Network traffic data • Transaction logs • Call records, Web logs, ... • Financial data • Sensor networks • Scientific data • Astronomy, Biology, ...

  3. A DBMS for Data Streams? • Lots of existing work in data streams • Mostly special-purpose applications • We’re building a general-purpose “data stream management system” (DSMS) http://www-db.stanford.edu/stream/

  4. Relations stored on disk Tuples read once & discarded Data access patterns controlled by RDBMS Data must be processed as it arrives Query answers are relations Query answers are streams RBDMS DSMS

  5. Query Execution Model 1. Client registers query Client …and answers returned to client ? 2. Tuples arrive on streams... ...are read and discarded... S T Limited-size “scratch space” available Memory DSMS

  6. Our Problem: Given a data stream query, determine how much memory is required to evaluate it.

  7. Queries We Consider • SPJ Queries: L(P (S1 x S2 x … x Sn)) • Projection is either duplicate-preserving or duplicate-eliminating • Selection predicates are conjunctions of: • Si.A Op Sj.B -or- Si.A Op k • Op{>, > , =, <, < } • All attributes are integers

  8. An example with no joins SELECT cust_id FROM orders WHERE amt > 5 DISTINCT • Requires boundedmemory • Remembercust_ids from1000-9999 AND cust_id > 1000 AND cust_id < 9999 • Requires no “scratch” memory • Each tuple is independent • Tuples in the answer are streamed away • Requires unbounded memory • All cust_ids must be remembered

  9. An example with an equijoin SELECT R.prod_id FROM orders O, returns R WHERE O.order_num = R.order_num AND R.prod_id >= 100 AND R.prod_id < 199 AND O.order_num > 1000 AND O.order_num < 1103

  10. prod_id MAX(amt) prod_id MIN(amt) 100 45 100 17 101 21 101 12 ... ... 299 2 299 36 An example with an inequality SELECT FROM orders O, inventory I WHERE O.amt > I.qty AND O.prod_id >= 100 AND O.prod_id < 300 DISTINCT O.prod_id O.prod_id

  11. “Locally Totally Ordered” Queries • LTO Queries: SPJ queries with additional predicates applied • For each stream, stipulate a total order for all attributes in the stream & all constants • Only allow tuples whose attribute values follow that ordering • All SPJ queries can be written as a union of LTO queries

  12. Example of an LTO query Stream S: (A, B) Stream T: (C, D) SELECT S.A, T.C FROM S, T WHERE S.B > 12 SELECT S.A, T.C FROM S, T WHERE S.B > 12 AND S.A = S.B AND T.D < T.C AND T.C < 12

  13. Example: SELECT T.C FROM S, T WHERE S.A < 5 AND 5 < T.C AND S.A < T.C { S.A < 5, 5 < T.C} => S.A < T.C, so S.A < T.C is not a necessary inequality. MinRef and MaxRef For each stream S in the query: MinRef(S) = { S.A : S.A < T.B is a necessary inequality in the predicate}

  14. Bounded-Memory Conditions 1. All attributes in the projection list must be bounded. 2. All attributes participating in equijoins must be bounded. 3. In each stream S, |MinRef(S)| + |MaxRef(S)|: = 0, for SELECT < 1, for SELECT DISTINCT

  15. S: (A,B) T: (C,D,E) (0, 0) (-1, 1) (-2, 2) … (-c, c) ... (1-c, 1+c, 10) An unbounded example SELECT DISTINCT T.E FROM S, T WHERE T.E = 10 AND S.A < T.C AND S.B < T.D

  16. Conclusion • We consider SPJ queries over data streams • We identify which queries can and cannot be evaluated using bounded memory • For queries than can, we provide an execution strategy based on synopses. • For queries that cannot, we provide examples of “bad” input streams. Full paper at http://www-db.stanford.edu/??? E-mail: babcock@cs.stanford.edu

More Related