1 / 20

Presented by: Sailesh Kumar

An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese. Presented by: Sailesh Kumar. Bloom Filter. Store a set S = { x 1 , x 2 , x 3 ,… x n } on some universe U , so that we are able to answer queries of the form:

mora
Download Presentation

Presented by: Sailesh Kumar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Improved Constructionfor Counting Bloom FiltersFlavio BonomiMichael MitzenmacherRina PanigrahySushil SinghGeorge Varghese Presented by: Sailesh Kumar

  2. Bloom Filter • Store a set S = {x1,x2,x3,…xn} on some universe U, so that we are able to answer queries of the form: • Is x a member of S • Bloom Filter is a technique that can answer this • Small amount of space independent of element size • Constant query time • False positive probability (some probability of a wrong answer) • Alternative to hashing with some interesting trade-offs

  3. 1 1 H1 1 X H2 H3 H4 Hk 1 1 Bloom Filter m-bit Array Bloom Filter

  4. 1 1 1 H1 1 Y H2 H3 1 H4 Hk 1 1 1 Bloom Filter m-bit Array

  5. 1 1 1 H1 1 X H2 H3 1 H4 Hk 1 1 1 Bloom Filter match m-bit Array

  6. 1 1 1 H1 1 W H2 H3 1 H4 Hk 1 1 1 Bloom Filter Match (false positive) m-bit Array

  7. How many Hash Functions? • k = no. of hash functions • n = Total no. of elements • m = no. of bits in the array • Objective is to pick k so that we minimize the false positive prob. • It is fairly simple to derive that k = (ln 2)m/n • For opt. k, fpp is approx. (0.6185)m/n

  8. How many Hash Functions? m/n = 8 Opt k = 8 ln 2 = 5.5

  9. Counting Bloom Filter • Bloom filters do not support deletes • Use counting Bloom filter • Use counters instead of bits in the array • Instead of setting the bits, increment the counters • During query, if (counter > 0) implies the bit is set

  10. 1 1 H1 1 X H2 H3 H4 Hk 1 1 Counting Bloom Filter m-counter Array Bloom Filter

  11. 1 2 1 H1 1 2 Y H2 H3 1 H4 Hk 1 Bloom Filter 1 1 Deletes are straightforward: Just decrement the counters 1 m-counter Array

  12. Improved Counting Bloom Filter • 4-bit counters ensures wvhp that counters do not overflow • 4x increase in space compared to Bloom filter • Construct an alternative Bloom filter that is 2 times compact than CBF • Based upon d-left hashing and fingerprinting technique • We need to understand d-left hashing and fingerprinting

  13. Fingerprinting • Temporarily assume that we have a perfect hash function h • Use some random function to compute c-bit fingerprints • F() : U -> [2c] • False positive prob. = 1/2c • 2x compact than Bloom filter • Not easy to compute the perfect hash function h • Use near perfect hashing (d-left) Element 1 Element 2 Element 3 Element 4 Element 5 h Fingerprint(4) Fingerprint(5) Fingerprint(2) Fingerprint(1) Fingerprint(3)

  14. d-left hashing • Use d equal sized tables • Use d different hash functions and chose bucket from each table • A bucket can store multiple elements • Store the element into least loaded bucket (break tie to left) • Interesting properties: • Very small maximum load O(log log n) • Maximum load is close to average load even for small d such as 4 • 80% space utilization with d=4

  15. Improved Counting Bloom Filter • Use d-left hashing • d hash tables each containing B buckets • Note that a bucket contains multiple cells; a cell can store a fingerprint and a small counter • In order to store an element, we compute its fingerprint • Fingerprint consists of two components • Bucket index – [1, B] • Remainder – [1, R], thus log2R bits, stored explicitly • We use separate bucket index for each table but identical remainders • Use d-left insertion policy; augment fingerprint with counters; if fingerprint matches, then increment the counter

  16. Improved Counting Bloom Filter 5 7 7 Element x H(x) = (3, 7), (4, 7) : we store element in first table Element y H(y) = (1, 5), (5, 5) : we store element in first table Element z H(z) = (1, 7), (4, 7) : we store element in second table Now, if we try to delete x, we do not know whether fingerprint in table 1 or table 2 has to be removed

  17. Improved Counting Bloom Filter • Solve the problem by breaking the hash operating into 2 phases • 1st phase: compute a single true fingerprint • 2nd phase: to obtain d locations, use permutations P1, … Pd • A permutation of a set is a one-to-one map of the set onto itself • This simple modification enables proper delete operations

  18. Improved Counting Bloom Filter • Claim. When deleting an element in the set, only one remainder corresponding to the element will exist in the table. • Proof: • Suppose not. Then there is some element x ∈ S whose remainder is stored in table j to be deleted and at the same time another element y ∈ S such that Pi(fx) = Pi(fy) for i = j. • Since the Pi are permutations, we must have that fx= fy, so x and y share the same true fingerprint. • Let x was inserted before y; in this case, when y is inserted, the counter in table j associated with the remainder of x would be incremented, contradicting our assumption.

  19. Simulation Results • Target is fpp < 0.002 • dlCBF configuration • d = 4 tables with 2048 buckets each • Each bucket has 8 cells • Target load = 0.75 (6 items per bucket) • 14-bit fingerprint, r. • 2-bit counter to handle identical fingerprints • Total size of structure = 220 bits. Total items = 3x214 • CBF configuration • 13.5 counters per element (9 hash function) • For 3x214 elements, we will need 2.5x220 bits, 2.5 times dlCBF

  20. Questions?

More Related