1 / 10

A Q uery A D atabase S ystem for O rder

A Q uery A D atabase S ystem for O rder. Dennis Shasha joint work with Alberto Lerner {lerner,shasha}@cs.nyu.edu. Idea. Whatever can be done on a table can be done on an ordered table ( arrable ). Not vice - versa. A Q uery – query language on arrables Expresses many queries easily

gautam
Download Presentation

A Q uery A D atabase S ystem for O rder

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AQuery A Database System for Order Dennis Shashajoint work withAlberto Lerner {lerner,shasha}@cs.nyu.edu

  2. Idea • Whatever can be done on a table can be done on an ordered table (arrable). Not vice-versa. • AQuery – query language on arrables • Expresses many queries easily • Easy to optimize

  3. Packets pID a b c d e f g len 10 5 12 32 5 7 10 time 3 4 7 10 14 22 23 Query 1 • Find the packets whose length is greater than twice the average packet length over the last 1 hour SELECT *FROM Packets ASSUMING ORDER timeWHERE len > 2*avgs(range(3600,time),len)) vectors ordering vector for this query Semantics are column-oriented as opposedto row-oriented

  4. Vector Expressions ... WHERE len > 2*avgs(range(3600,time),len)) time 3 4 7 10 14 22 23 v1 0 1 2 3 3 1 2 v2 10 7.5 9 14.7 13.5 6 7.3 len 10 5 12 32 5 7 10 WHERE F F F T F F F 3 positionsand thecurrent * v1=range(3600,time)windows’ ranges v2=avgs(v1,len)last hour avg len len > 2*v2filter out false positions * Using range 10 here for the sake of the example

  5. Query 2 • Find when more than 20 type ‘A’ squirrels were at Jennifer’s backyard. Suppose a flag +1 signals squirrel entry, and –1, exit. SquirrelSensor sID ... region ... time ... flag ... SquirrelType sID ... type ... SELECT time[index(sums(flag)>20)]FROM SquirrelSensor SS, SquirrelType ST ASSUMING ORDER time WHERE SS.sID=ST.sID AND ST.type = ‘A’ AND SS.region = ‘JWyard’

  6. Vector Indexing SELECT time[index(sums(flag)>20)] ... squirrelin squirrelout flag +1 +1 -1 +1 +1 ... +1 +1 -1 -1 sums 1 2 1 2 3 ... 20 21 20 19 >20 F F F F F ... F T F F time time[i] Jennifer’s backyard i-th positionis true

  7. Query 3 • Find when 3 different squirrels within a pair-wise distance of 5 meters from each other chirp within 10 seconds of each other SquirrelChirps sID ... loc ... time ... SELECT S1.sID, S1.loc, S2.sID, S2.loc, S3.sID, S3.locFROM SquirrelChirps S1, SquirrelChirps S2, SquirrelChirps S3 WHERE S1.sID<>s2.sID AND S1.sID<>s3.sID AND s2.sID<>s3.SID AND S1.time-S2.time < 10 AND S1.time-S3.time < 10 AND S2.time-S3.time < 10 AND distance(S1.loc,S2.loc)<5 AND distance(S1.loc,S3.loc)<5 AND distance(S2.loc,S3.loc)<5

  8. Vector Fields r gbya (r) Flatten(gbya (r)) a z y y x z x y b 1 2 3 4 5 6 7 b 1 5 2 3 7 4 6 a z z y y y x x b 1 5 2 3 7 4 6 a z y x 4 6 Non-grouped columns become vectorfields respecting order. Flatten brings the arrable back to 1NF

  9. Query 4 – use of vector fields • Create a log of flow information. A flow from src to dest ends after a 2-minutes silence Packets pID ... src ... dest ... len ... time ... SELECT source, dest, count(*), sum(len)FROM Packets ASSUMING ORDER time GROUP BY source, dest, sums(deltas(time)) > 120)

  10. And Streams? • AQuery has no special facilities for streaming data, but it is expressive enough. • Idea for streaming data is to split the tables into tables that are indexed with old data and a buffer table with recent data. • Optimizer works over both transparently.

More Related