CS 361A (Advanced Data Structures and Algorithms). Lecture 15 (Nov 14, 2005) Hashing for Massive/Streaming Data Rajeev Motwani. Hashing for Massive/Streaming Data. New Topic Novel hashing techniques + randomized data structures Motivated by massive/streaming data applications Game Plan
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
CS 361A (Advanced Data Structures and Algorithms)
Lecture 15 (Nov 14, 2005)
Hashing for Massive/Streaming Data
(Synopsis Data Structures)
Memory: poly(1/ε, log N)
Query/Update Time: poly(1/ε, log N)
N: # items so far, or window size
Top-k most frequent elements
Find elements that
occupy 0.1% of the tail.
Mean + Variance?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Find all elements
with frequency > 0.1%
What is the frequency
of element 3?
What is the total frequency
of elements between 8 and 14?
Analytics on Packet Headers – IP Addresses
How many elements have non-zero frequency?
… and thereby understand model definitions
(e.g., when counting distinct words in web crawl)
(need to hold 1 block from each group in memory)
Theorem: For any , E has relative error
with probability at least .
P[ X(i)=V | X(1)=X(2)=…=X(i-1)=V ]
Theorem: Deterministic algorithms need M = Ω(n log m)