1 / 15

Panel on Stream Query Languages The Aurora View

Panel on Stream Query Languages The Aurora View. Stan Zdonik Brown University. Aurora Queries. We do not have an SQL-like language. We have a GUI for dataflow diagrams . Boxes = operators Arrows = streams Rationale: CSE is tough for thousands of queries. Workflow is more natural.

haley
Download Presentation

Panel on Stream Query Languages The Aurora View

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Panel onStream Query LanguagesThe Aurora View Stan Zdonik Brown University

  2. Aurora Queries • We do not have an SQL-like language. • We have a GUI for dataflow diagrams. • Boxes = operators • Arrows = streams • Rationale: • CSE is tough for thousands of queries. • Workflow is more natural. • Easier for users to extend what’s been done. • Best to understand implementation first.

  3. Aurora Operators • Very relational in spirit. • Filter, Map, Union, Join, Aggregate • Adds Windows (everyone seems to agree). … with some wrinkles that we will get to. • Adds a few operators. • Wsort • Resample

  4. Simple Aggregation A B C 1, 1, 1 1, 1, 1 1, 1, 3 1, 1, 2 Aggregate Agg(init,incr,final) Window(on C, size = 2 offset = 1) GroupBy A,B 1, 2, 2 1, 2, 1 A B C 1, 2, 1 1, 1, 1 . . . • init:called when window opens • incr: called for each new value • final: called when window closes • One or more open window per group. • Size and Offset given in: • #tuples, attribute interval, or time interval Generalized aggregate

  5. Query 1 Generate the stream of packets whose length is greater than twice the average packet length over the last 1 hour. (pID, length, time) Join Match ( length > 2 * avgLen and time=time2) Map f(t): (t.ID, t.length, t.time) Aggregate agg(init,incr,final) Window(on time, size = 1 hr, offset=1 tuple) State = (sum int, num int, endtime int)) init = {sum :=0, num :=0} incr (p) ={sum := sum+p.length; num:=num+1; endtime := p.time} final= emit (time2=endtime, avgLen=sum/num)

  6. Query 2 Create an alert when more than 20 type 'A' squirrels are in Jennifer's backyard. Assume squirrels report every p sec (sID1, region, time) Join Match (sID1=sID2) Filter region = JWY and type = “A” ST (sID2, type) Aggregate agg (count) Window(on time, size=p sec, offset=p sec) Filter count > 20

  7. Query 3 Stream an event each time 3 different squirrels within a pairwise distance of 5 meters from each other chirp within 10 seconds of each other. (sID, loc, time) Join Match (1.sID not= 2.sID and dist(1.loc, 2.loc) < 5 m) Window (on time, size = 5 sec, offset = 1 tuple) Join Match (dist(1.1.loc, 2.loc) < 5 m and dist(1.2.loc, 2.loc) < 5 m and 1.1.sID not= 2.sID and 1.2.sID not= 2.sID) Window ( on time, size = 5 sec, offset = 1 tuple) 1 1 (sID, loc, time) 2 (sID, loc, time) 2

  8. Super-bonus Query Create a log of flow information from a stream of packets. A flow (simple definition) from a source S to a destination D ends when no packet from S to D is seen for at least 2 minutes after the last packet from S to D. The next packet from S to D starts a new flow. The flow log contains the source, destination, count of packets, and total length of packets for each flow. Are you kidding!!!!

  9. Actually, it’s Pretty Easy 2 min S D Aurora Aggregate Aggr = (init1, incr1, final1) Window (size = 2 tuples, offset = 1) GroupBy (src, dest) Aggregate Aggr = (init2, incr2, final2) Window (on flow#, size = 1, offset = 1) GroupBy (src, dest) (pID, src, dest, length, time) State1 = (flow#: int, first packet, second packet) ) State2 = (count int, len int) init1 = {flow# :=0;first:=null;second:=null} init2 = {count :=0; len := 0} Incr1(p) ={first:=second, second:=p; if second.time-first.time > 2 then flow# := flow# + 1} incr2 (p) ={count =: count + 1 len := len + p.length} final2 = emit (src,dest,len, count) final1= emit (second.src,second.dest, second.length, second.time, flow#)

  10. … but this is not enough! • What if it was really important that I know about the squirrels within 1 minute of the intrusion? => Queries need Quality-of-Service support. In fact, QoS is an integral part of the declarative spec. of the query.

  11. …but it gets worse! • Networks (e.g., mobile) can arbitrarily delay or lose tuples. => Operators can’t block arbitrarily waiting. A corollary of latency-based Qos.

  12. …and worse! • Tuples may not arrive at an operator in sort order. • The network can reorder them • Operators themselves can shuffle them. • Priority scheduling might force them out of order. • This complicates things. • windows • aggregates

  13. Our Solution • Problem has to do with when to close windows. • Tradeoff: Latency (QoS) vs. Accuracy • Define additional parameters on windows that determine termination. • might result in lost data.

  14. time 1 1 1 1 1 1 1 timeout interval (time) slack time 1 1 1 2 1 1 timeout interval (#tuples) Our Solution (cont.) • For blocking (late tuples) => Timeout • For disorder (early tuples) => Slack

  15. Status • Now: • users supply values for timeout and slack. • As in examples, not always needed. • Goal: • automatically insert / adjust these values based on QoS specs.

More Related