Optimal Fast Hashing

Optimal Fast Hashing YossiKanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Hebrew Univ., Israel)

Hash Tables for Networking Devices Hash tables and hash-based structures are often used in high-speed devices Heavy-hitter flow identification Flow state keeping Flow counter management Virus signature scanning IP address lookup algorithms

Hash tables • In theory, hash tables are particularly suitable: O(1) memory accesses per operation (element insertion/query/deletion) for reasonable load • But in practice, there is a big difference between an average of 1.1 memory accesses per operation, and an average of 4 • Why not only 1 memory access? • Collisions

Hash Tables for Networking Devices • Collisions are unavoidable  wasted memory accesses • For load≤1, let a and d be the average and worst-case time (number of memory accesses) per element insertion 3 2 1 Objective: Minimize a and d 1 2 3 4 5 6 7 8 9 Memory

Why We Care • On-chip memory: memory accesses  power consumption • Off-chip memory: memory accesses  lost on/off-chip pin capacity • Datacenters: memory accesses  network & server load • Parallelism does not help reduce these costs • d serial or parallel memory accesses have same cost

Traditional Hash Table Schemes 9 6 7 3 1 2 3 4 5 6 7 8 9 Memory 4 1 5 2 8 Example 1: linked lists (chaining)

Traditional Hash Table Schemes 6 1 2 3 4 5 6 7 8 9 Memory 4 1 5 3 2 8 Example 1: linked lists (chaining) Example 2: linear probing (open addressing) Problem: the worst-case time cannot be bounded by a constant d

High-Speed Hardware 6 h 7 3 1 5 2 8 9 4 Memory CAM 1 2 3 4 5 6 7 8 9 • Enable overflows: if time exceeds d → overflow list • Can be stored in expensive CAM • Otherwise, overflow elements = lost elements • Bucket contains h elements • E.g.: 128-bit memory word h=4 elements of 32 bits • Assumption: Access cost (read & write word) = 1 cycle

Possible Settings Static setting - Insertions and queries only Dynamic setting – Insertions, deletions, and queries. Generalized setting – Balancing between the buckets’ load.

Problem Formulation Given average aand worst-case d of memory accesses per operation, Minimize overflow rate  6 h 7 3 1 5 2 8 9 4 Memory CAM 1 2 3 4 5 6 7 8 9

Example: Power of d-Random Choices 12 10 11 6 h 7 3 1 5 2 8 9 4 Memory CAM 1 2 3 4 5 6 7 8 9 • d hash functions: pick least loaded bucket. • Break ties u.a.r. [Azaret al.] • Intuition: can reach low … but average time a = worst-case time d  wasted memory accesses

Other Examples • d-left [Vöcking] • Same as d-random, but break ties to the left. • Cuckoo [Paghet al.] • Whenever collision occurs, moves stored elements to their other choices. • Typically, uses much more than d memory accesses on average.

Outline • Static Case • Overflow Lower Bound • Optimal Schemes: SIMPLE, GREEDY, MHT. • Dynamic Case • Comparison with Static Case. • Overflow Lower Bound • Overflow Fraction Depending on d.

Overflow Lower Bound Objective: given any online scheme with average a and worst-case d, find lower-bound on overflow . No scheme can achieve (capacity region) [h=4, load=n/(mh)=0.95, fixed d]

Overflow Lower Bound • Result: closed-form lower-bound formula • Given n elements in m buckets of height h: • Valid also for non-uniform hashes • For n=m and h=1, we get simply • Defines a capacity region for high-throughput hashing

Lower-Bound Example For 3% overflow rate, throughput can be at most 1/a = 2/3 of memory rate [h=4, load=n/(mh)=0.95]

Overflow Lower Bound Example: d-left scheme: low overflow , but high average memory access rate a [h=4, load=n/(mh)=0.95, m=5,000]

The SIMPLE Scheme 10 11 6 h 7 3 1 5 2 8 9 4 Memory CAM 1 2 3 4 5 6 7 8 9 • SIMPLE scheme: single hash function • Looks like truncated linked list

Performance of SIMPLE Scheme The lower bound can actually be achieved for a=1 [h=4, load=0.95, m=5,000]

The GREEDY Scheme 12 10 11 d=2 6 h 7 3 1 5 2 8 9 4 Memory CAM 1 2 3 4 5 6 7 8 9 Using uniform hashes, try to insert each element greedily until either inserted or d

Performance of GREEDY Scheme The GREEDY scheme is always optimal until aco [d=4, h=4, load=0.95, m=5,000]

Performance of GREEDY Scheme Overflow rate worse than 4-left, but better throughput (1/a) [d=4, h=4, load=0.95, m=5,000]

The MHT Scheme 10 11 6 h 7 3 1 9 2 8 4 5 Memory CAM 1 2 3 4 5 6 7 MHT (Multi-Level Hash Table) [Broder&Karlin]: d successive subtables with their d hash functions 1st Subtable 2nd Subtable 3rd Subtable

Performance of MHT Scheme • Optimality of MHT until cut-off point aco(MHT) • Proof that subtable sizes fall geometrically • Confirmed in simulations Overflow rate close to 4-left, with much better throughput (1/a) [d=4, h=4, load=0.95, m=5,000]

Dynamic vs. Static • Dynamic hash tables are harder to model than the static ones [Kirsch et al.] • But past studies show same asymptotic behaviorwith infinite buckets(insertions only vs. alternations) • traditional hashing using linked lists – maximum bucket size of approx. log n / log logn[Gonnet, 1981] • d-random, d-left schemes – maximum bucket size of log logn / log 2+O(1) [Azar et al.,1994; Vöcking, 1999] • As a designer, using the static model seems natural. • Even if real-life devices have finite buckets

Degradation with Finite Buckets Remove 1 • Finite buckets are used. • Surprising result: degradation in performance Element “2” is lost although its corresponding bucket is empty 2 1 H(2) = 3 H(1) = 3 2 1 1 1 2 3 4 1 2 3 4 Finite Infinite

Comparing Static and Dynamic • Static setting: insertions only • n = number of elements • m = number of buckets • Dynamic setting: alternations between element insertions and deletions of randomly chosen elements. • fixed load of c = n / (mh) • Fair comparison • Given an average number of memory accesses a, minimize overflow fraction .

Overflow Lower Bound • Overflow lower bound ofwhere r = ach. • Also holds for non-uniformly distributed hash functions (under some constraints). • The lower bound is tight (Simple, Greedy)

Numerical Example • For h=1 and c=1 (100% load) we get a lower bound of 1/(1+a). • To get an overflow fraction of 1%, one needs at least 99 memory accesses per element. • Infeasible for high-speed networking devices • Compared to a tight upper bound of e-a in the static case. [Kanizo et al., INFOCOM 2009] • need ~4.6 memory accesses.

Overflow Fraction Depending on d • So far, we relaxed the constraint on d. • We considered n elements with an average of a memory accesses, as na distinct elements. • To take into account d, we must consider each element along with its own hash values.

Graph Theory Approach • Consider a bipartite graph. • Left vertices = Elements • Right vertices = Buckets (assume h=1). • Edge = The bucket is oneof the element’s d choices

Graph Theory Approach We get a random bipartite graph where each left vertex has degree d. Expected maximum size matching = Expected number of elements that can be inserted to the table, that is, a lower bound. We derived an explicit expression for d=2. Upper bound can be achieved by Cuckoo hashing (equivalent to finding maximum size matching).

Summary We found lower and upper bounds on the achievable overflow fraction both for the static and dynamic cases. Static models are not necessarily exact with dynamic hash tables. Improved lower bound for d=2 and a characterization of the performance of Cuckoo hashing.

Thank you.

Optimal Fast Hashing

Optimal Fast Hashing

Presentation Transcript

Hashing

Hashing

Hashing

Hashing

Hashing

Near Space-Optimal Perfect Hashing Algorithms

Hashing

Optimal Fast Hashing

Near Space-Optimal Perfect Hashing Algorithms

Near-Optimal Space Perfect Hashing Algorithms

Algorithmic Improvements for Fast Concurrent Cuckoo Hashing

Hashing

Hashing

Hashing

HASHING

Hashing

Hashing

Hashing, Hashing Tables

Hashing

Hashing

Hashing