1 / 42

Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565. 1. 1. 0. 0. 1. 0. 1. 1. 1. 0. 1. Streams Here, There, Everywhere!. Network Traffic Engineering. Call Record Analysis. Sensor Data Analysis. Medical, Financial Monitoring. Etc, etc, etc.

burt
Download Presentation

Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

  2. 1 1 0 0 1 0 1 1 1 0 1 Streams Here, There, Everywhere! Network Traffic Engineering. Call Record Analysis. Sensor Data Analysis. Medical, Financial Monitoring. Etc, etc, etc.

  3. Problem Definition • Data Stream Environment • One Pass • Data element is a value • Φ-quantile ( [0,1) ) The element with rank Ceiling (ΦN) of an ordered sequence of N data elements.

  4. N = 16 sort 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 11, 11, 11, 12 0.5 quantile returns element ranked 8 ( 0.5*16) which is 8 0.75 quantile returns element ranked 12 (0.75*16) which is 10

  5. 3 Models • Data Stream Model • Computing Φ-quantile for all the data items seen so far • Sliding Window Model • Computing Φ-quantile against the N most recent elements in a data stream seen so far • n of N Model • For any n of N, computing Φ-quantile among the n most recent elements in a data stream seen so far

  6. Sliding Window Model • Most Recent N Elements Time Increases ….1 0 1 0 0 0 1 0 1 1 1 1 1 1 0 0 0 1 0 1 0 0 1 1… Window Size = N Current Time

  7. Sliding window model Window size = 12 , 0.5-quantile returns 10 at time t11 0.5-quantile returns 6 at time t15

  8. n-of-N model N = 12, 0.5-quantile returns 8 at time t11 for n = 8, 0.5-quantile returns 3 at time t15 for n = 4

  9. Applications - Sliding Window Model in Data Streams • Useful for Network Traffic Management, Sensor Data. • To find out Top Ranked Web pages from Most Recently accessed N pages • In the financial market, investors are often interested in finding out the most recent N bids.

  10. Previous Work on Approximating Quantiles in One Scan of Data • G. S. Manku, S. Rajagopalan, and B. G. Lindsay. Approximate medians and other quantiles in one pass and with limited memory [1/є log²єN] • G. S. Manku, S. Rajagopalan, and B. G. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. • M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. [1/є log єN] {GK Algorithm} • GK Algorithm  MOST EFFICIENT OWING TO LEAST SPACE USAGE + does not require advance knowledge of N

  11. Definitions • -Quantile:A -quantile ((0,1]) of an ordered sequence of N data elements is the element with rank N . • Quantile Query: Given , find the data element with rank N among all elements in the stream. • Variation: N recent elements (sliding window model). • (-approximate):Find the element with rank r within the interval [r-N, r+N].

  12. Computation of Quantile Summaries over Sliding Windows – 2 Methods • Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream, Xuemin Lin, Hongjun Lu, Jian Xu, Jeffrey Xu Yu, 2004 IEEE • Approximating frequency counts and quantiles using sliding window model, Arvind Arasu, Gurmeet Singh Manku,Stanford University, 2004

  13. Computation of Quantile Summaries over Sliding Windows – LLXY04 • GK Algorithm + Concept Of Aging (Computing quantiles over a Sliding Window of Most Recent N Elements) • Under sliding window model, a summary is maintained for the most recently seen N data elements. • Eliminate exact out-dated elements requires a space of O(N).

  14. e-approximate • A quantile summary for a data sequence is e- approximate if, for any given rank r, it returns a value whose rank r’ is guaranteed to be within the interval [r -εN , r + εN ] Example : A data stream with 100 elements, 0.5 – quantile with ε= 0.1 returns a value v. The true rank of v is within [40,60]

  15. Quantile Sketch • Data structure • { (vi , ri– ,ri+) : 1 ≦ i ≦ m} • A value vi is one of the element seen so far • ri–is the lower bound on the rank of vi • ri+is the upper bound on the rank of vi • vi <= vi+1 , for1 ≦ i ≦ m - 1 • ri– <= ri+1– , for 1 ≦ i ≦ m – 1 • ri– < =ri <= ri+, where riis the rank of vi

  16. Example Quantile sketch consisting of 6 tuples {(1,1,1), (2,2,9), (3,3,10), (5,4,10), (10,10,10), (12,16,16)}

  17. e - approximate sketch • Theorem • 1. r1+≦εN + 1, • 2.rm–≧ (1-ε)N, • 3. for 2≦ i ≦ m, • Sketch S ise - approximate, That is for each Φ(0,1] , there is a (vi , ri– ,ri+) in S such that

  18. Query Quantile sketch consisting of 6 tuples ε= 0.25 {(1,1,1), (2,2,9), (3,3,10), (5,4,10), (10,10,10), (12,16,16)} 0.5 – quantile return the viof rank 8 , εN = 4 Find the first tuple to satisfy the rule, and return vi (4,4,10) => return 4

  19. One-Pass summary for sliding windows • Continuously divide a stream into the buckets based on the arrival ordering of data elements • The capacity of each bucket is • For each bucket, we maintain an - approximate continuously by GK-algorithm • Once a bucket is full its - approximate sketch is compressed into an - approximate sketch • The oldest bucket is expired if currently the total number of elements is N+1

  20. the most recent N elements Current bucket …. expired bucket GK Compressed - approximate sketch in each bucket Summary Technique

  21. Current bucket Current bucket Current bucket -approximate sketch -approximate sketch -approximate sketch -approximate sketch Expire Example N = 8 , ε= 1 , = 4 1 2 3 4 5 6 7 8 9 Full , compress

  22. Compress • Compress an - approximate sketch into e-approximate sketch • Memory space is most • Why not use - approximate sketch in each bucket directly? • Compress technique takes about half of the number of tuples given by - approximate sketch

  23. Merge • There are h data stream Di,and each Dihas Ni data elements. Suppose each Si is an e-approximate sketch of Di. • Smerge is a sketch of • |Smerge| = • Suppose each Si is an e-approximate sketch. Then, Smerge is also ane-approximate sketch

  24. 1, 2, 3, 4, 5, 6, 7, 8, 9 Current Expired ε=1 and N = 8 Another Problem Approximate sketch The first tuple inSmerge is , but the rank of 5 is 4. Smerge is not an - approximate sketch

  25. Lift • To solve the pervious problem, we use a “lift” operation to lift the value of by for each tuple i • If S is an - approximate sketch, then Slift is an e-approximatesketch • That is why the bucket size is and we maintain - approximate sketch of each bucket summary

  26. Smerge Query Step1. merge the local sketch Current bucket Step2. lift Smerge lift Slift Step3. for a given rank r = ,find the first tuple in Slift such that , return vi

  27. Space – Sliding Window LLXY ‘04 • O(1/є²+(log (є²N)/є)) Reason: • Sketch in each bucket produced by the GK algorithm takes O (log (є²N)/є) space which will be compressed to O(1/є) once the bucket is full • O(1/є) buckets

  28. Performance Studies • Sliding window model • Compare with the ARS-algorithm • Avg Errors • Space Consumption • Distributions • n-of-N model • Compare with the heuristic algorithm nN’ • Avg Errors • Space Consumption • Query performance

  29. Conclusion • This work presented is among the first attempts to develop space efficient, one pass, deterministic quantile summary algorithms with performance guarantees under the sliding windowmodel of data streams

  30. Approximating quantiles using sliding window model - Manku’s Approximating Quantiles: • GK Algorithm + Concept of Aging • Improves over [ LLXY `04 ] • [LLXY `04] space: O(1/є²+(log (є²N)/ є)) • Manku’s Space: O(1/є(log (1/є log N))) • The space complexity is achieved by minimising the space used for maintaining the state at any point in time,e-approximate quantiles, for any (0; 1]) over the current contents of the sliding window can be computed using the maintained state. • The goal is to minimize the space required for maintaining the state.

  31. N Overview

  32. N Overview

  33. N Overview

  34. N Overview

  35. N Overview

  36. N Overview

  37. N Overview

  38. N Overview

  39. log ( ) 1 є N єN є є є 1 0 2 4 Details = O(єN)

  40. 1 1 1 1 є є є є ( ( ) ) x log log Space Requirement O(1/є(log (1/є log N))) Space required for GK Algorithm = 1/є log єN Space required for level-ℓblocks: N N 1 = = x є N єN / ( 1 ) log ℓ ℓ є Size of a quantile sketch Number of “active” blocks 1/є log єN = O(1/є(log (1/є log N)))

  41. Conclusion • The work presented is better than the first method with respect to space. • This paper also provides a randomized quantile finding algorithm with further improvement in space.

  42. Any Question?

More Related