History-Independent Cuckoo Hashing

History-IndependentCuckoo Hashing Gil Segev Moni Naor Udi Wieder Weizmann InstituteIsrael Microsoft Research Silicon Valley

Election Day Carol Alice Alice Bob • Elections for class president • Each student whispers in Mr. Drew’s ear • Mr. Drew writes down the votes Carol • Problem:Mr. Drew’s notebook leaks sensitive information • First student voted for Carol • Second student voted for Alice • … Alice Alice May compromise the privacy of the elections Bob

Election Day • What about more involved applications? • Write-in candidates • Votes which are subsets or rankings • …. Carol Alice Alice Bob Alice 1 1 • A simple solution: • Lexicographically sortedlistof candidates • Unary counters Bob 1 Carol 1

Learning From History Alice Bob Carol • The two levels of a data structure • “Legitimate” interface • Memory representation • History independence The memory representation should not reveal information that cannot be obtained using the legitimate interface • A simple example: sorted list • Canonical memory representation • Not really efficient...

Typical Applications • Incremental cryptography [BGG94, Mic97] • Voting [MKSW06, MNS07] • Set comparison & reconciliation [MNS08] • Computational geometry [BGV08] • ...

Our Contribution A HI dictionary that simultaneously achieves the following: • Efficiency: • Lookup time – O(1) worst case • Update time – O(1) expected amortized • Memory utilization 50% (25% with deletions) • Strongest notion of history independence • Simple and fast

Notions of History Independence Naor and Teague (2001) following Macciancio (1997) • Weak history independence • Memory revealed at the end of an activity period • Any two sequences of operations S1 and S2 that lead to the same content induce the same distribution on the memory representation • Strong history independence • Memory revealed several timesduring an activity period • Any two sets of breakpoints along S1 and S2 with the same content at each breakpoint, induce the same distributions on the memory representation at all these points • Completely randomizing memory after each operation is not good enough

Notions of History Independence • We consider strong history independence • Canonical representation (up to initial randomness) implies SHI • Other direction shown to hold for reversible data structures [HHMPR05] • Weak & strong are not equivalent • WHI for reversible data structures is possible without a canonical representation • Provable efficiency gaps [BP06] (in restricted models) 9

SHI Dictionaries Memory utilization Update time Lookup time Deletions Practical? Naor & Teague ‘01 O(1) expected O(1) worst case 99% (mem. util. < 50%) Blelloch & Golovin ‘07 O(1) expected O(1) expected 99% (mem. util. < 50%) ? Blelloch & Golovin ‘07 O(1) expected O(1) worst case < 9% < 25%(< 50%) O(1) expected O(1) worst case This work

Our Approach • Cuckoo hashing [PR01]:A simple & practical scheme with worst caseconstant lookup time • Force a canonical representation on cuckoo hashing • No significant loss in efficiency • Avoid rehashing by using a small stash • What happens when hash functions fail? • Rehashing is problematic in SHI data structures • All hash functions need to be sampled in advance (theoretical problem) • When an item is deleted, may need to roll back on previous functions • We use a secondary storage to reduces the failure probability exponentially [KMW08]

Cuckoo Hashing • Tables T1 and T2 with hash functions h1 and h2 • Store x in one of T1[h1(x)] and T2[h2(x)] Insert(x): • Greedily insert in T1 or T2 • If both are occupied then store x in T1 • Repeat in other table with the previous occupant T1 T2 T1 T2 V V Successful insertion Z Y Z X Y W W X

Cuckoo Hashing • Tables T1 and T2 with hash functions h1 and h2 • Store x in one of T1[h1(x)] and T2[h2(x)] Insert(x): • Greedily insert in T1 or T2 • If both are occupied then store x in T1 • Repeat in other table with the previous occupant T1 T2 V Failure –rehash required U Z Y X

The Cuckoo Graph • Set S ½ U containing n keys • h1, h2 : U! {1,...,r} S is successfully stored Every connected componenthas at most one cycle Main theorem: If r ¸ (1 + ²)n and h1,h2are log(n)-wise independent,then failure probability is £(1/n) Bipartite graph with sets of size r Edge (h1(x), h2(x)) for every x2S

The Canonical Representation • Assume that S can be stored using h1 and h2 • We force a canonical representation on the cuckoo graph • Suffices to consider a single connected component • Assume that S forms a tree in the cuckoo graph. Typical case • One location must be empty. The choice of the empty location uniquely determines the location of all elements a b c d e Rule: h1(minimal element) is empty

The Canonical Representation • Assume that S can be stored using h1 and h2 • We force a canonical representation on the cuckoo graph • Suffices to consider a single connected component • Assume that S has one cycle • Two ways to assign elements in the cycle • Each choice uniquely determines the location of all elements a b c d e Rule: minimal element in cycle lies in T1

The Canonical Representation • Updates efficiently maintain the canonical representation • Insertions: • New leaf: check if new element is smaller than current min • new cycle: • Same component… • Merging two components… • All cases straight forward • Deletions: • Find the new min, split component,… • Requires connecting all elements in the component with a sorted cyclic list • Memory utilization drops to 25% • All cases straight forward • Update time < size of component = expected (small) constant

Rehashing • What if S cannot be stored using h1 and h2 ? • Happens with probability £(1/n) • Can we simply pick new functions? • Rear, but very bad worst case performance • Canonical memory implies we need to sample all hash functions in advance (theoretical problem) • Whenever an item is deleted, need to check whether we must role back to previous hash functions • A bad item which is repeatedly inserted and deleted would cause a rehash every operation!

Using a Stash • Whenever an insert fails, put a ‘bad’ item in a secondary data structure • Bad item: smallest item that belongs to a cycle • Secondary data structure must be SHI in itself • Theorem [KMW08]: Pr[|stash| > s] < n-s • In practice keeping the stash as a sorted list is probably the best solution • Effectively the query time is constant with (very) high probability • In theory the stash could be any SHI with constant lookup time • A deterministic hashing scheme, where the elements are rehashed whenever the content changes [AN96, HMP01]

Conclusions and Problems • Cuckoo hashing is a robust and flexible hashing scheme • Easily ‘molded’ into a history independent data structure • We don’t know how to do this for CH with more than 2 hash functions and/or more than 1 element per bucket • Better memory utilization, better performance, but.. • Expected size of connected component is not constant • Full performance analysis

History-Independent Cuckoo Hashing

History-Independent Cuckoo Hashing

Presentation Transcript

Hashing

Hashing

Hashing

Hashing

The Cuckoo Clock

Backyard Cuckoo Hashing:

CS 221 Guest lecture: Cuckoo Hashing

Cuckoo

Yellow- billed Cuckoo

Yellow-billed Cuckoo

Cuckoo Hashing : Hardware Implementations

Hashing

Cuckoo Hashing and CAMs

Algorithmic Improvements for Fast Concurrent Cuckoo Hashing

Hashing

History-Independent Cuckoo Hashing

Cuckoo Hashing and CAMs

—the cuckoo clock.”

Hashing, Hashing Tables