1 / 19

History-Independent Cuckoo Hashing

History-Independent Cuckoo Hashing. Gil Segev. Moni Naor. Udi Wieder. Weizmann Institute Israel. Microsoft Research Silicon Valley. Election Day. Carol. Alice. Alice. Bob. Elections for class president Each student whispers in Mr. Drew’s ear Mr. Drew writes down the votes. Carol.

mona-joyner
Download Presentation

History-Independent Cuckoo Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. History-IndependentCuckoo Hashing Gil Segev Moni Naor Udi Wieder Weizmann InstituteIsrael Microsoft Research Silicon Valley

  2. Election Day Carol Alice Alice Bob • Elections for class president • Each student whispers in Mr. Drew’s ear • Mr. Drew writes down the votes Carol • Problem:Mr. Drew’s notebook leaks sensitive information • First student voted for Carol • Second student voted for Alice • … Alice Alice May compromise the privacy of the elections Bob

  3. Election Day • What about more involved applications? • Write-in candidates • Votes which are subsets or rankings • …. Carol Alice Alice Bob Alice 1 1 • A simple solution: • Lexicographically sortedlistof candidates • Unary counters Bob 1 Carol 1

  4. Learning From History Alice Bob Carol • The two levels of a data structure • “Legitimate” interface • Memory representation • History independence The memory representation should not reveal information that cannot be obtained using the legitimate interface • A simple example: sorted list • Canonical memory representation • Not really efficient...

  5. Typical Applications • Incremental cryptography [BGG94, Mic97] • Voting [MKSW06, MNS07] • Set comparison & reconciliation [MNS08] • Computational geometry [BGV08] • ...

  6. Our Contribution A HI dictionary that simultaneously achieves the following: • Efficiency: • Lookup time – O(1) worst case • Update time – O(1) expected amortized • Memory utilization 50% (25% with deletions) • Strongest notion of history independence • Simple and fast

  7. Notions of History Independence Naor and Teague (2001) following Macciancio (1997) • Weak history independence • Memory revealed at the end of an activity period • Any two sequences of operations S1 and S2 that lead to the same content induce the same distribution on the memory representation • Strong history independence • Memory revealed several timesduring an activity period • Any two sets of breakpoints along S1 and S2 with the same content at each breakpoint, induce the same distributions on the memory representation at all these points • Completely randomizing memory after each operation is not good enough

  8. Notions of History Independence • We consider strong history independence • Canonical representation (up to initial randomness) implies SHI • Other direction shown to hold for reversible data structures [HHMPR05] • Weak & strong are not equivalent • WHI for reversible data structures is possible without a canonical representation • Provable efficiency gaps [BP06] (in restricted models) 9

  9. SHI Dictionaries Memory utilization Update time Lookup time Deletions Practical? Naor & Teague ‘01 O(1) expected O(1) worst case 99% (mem. util. < 50%) Blelloch & Golovin ‘07 O(1) expected O(1) expected 99% (mem. util. < 50%) ? Blelloch & Golovin ‘07 O(1) expected O(1) worst case < 9% < 25%(< 50%) O(1) expected O(1) worst case This work

  10. Our Approach • Cuckoo hashing [PR01]:A simple & practical scheme with worst caseconstant lookup time • Force a canonical representation on cuckoo hashing • No significant loss in efficiency • Avoid rehashing by using a small stash • What happens when hash functions fail? • Rehashing is problematic in SHI data structures • All hash functions need to be sampled in advance (theoretical problem) • When an item is deleted, may need to roll back on previous functions • We use a secondary storage to reduces the failure probability exponentially [KMW08]

  11. Cuckoo Hashing • Tables T1 and T2 with hash functions h1 and h2 • Store x in one of T1[h1(x)] and T2[h2(x)] Insert(x): • Greedily insert in T1 or T2 • If both are occupied then store x in T1 • Repeat in other table with the previous occupant T1 T2 T1 T2 V V Successful insertion Z Y Z X Y W W X

  12. Cuckoo Hashing • Tables T1 and T2 with hash functions h1 and h2 • Store x in one of T1[h1(x)] and T2[h2(x)] Insert(x): • Greedily insert in T1 or T2 • If both are occupied then store x in T1 • Repeat in other table with the previous occupant T1 T2 V Failure –rehash required U Z Y X

  13. The Cuckoo Graph • Set S ½ U containing n keys • h1, h2 : U! {1,...,r} S is successfully stored Every connected componenthas at most one cycle Main theorem: If r ¸ (1 + ²)n and h1,h2are log(n)-wise independent,then failure probability is £(1/n) Bipartite graph with sets of size r Edge (h1(x), h2(x)) for every x2S

  14. The Canonical Representation • Assume that S can be stored using h1 and h2 • We force a canonical representation on the cuckoo graph • Suffices to consider a single connected component • Assume that S forms a tree in the cuckoo graph. Typical case • One location must be empty. The choice of the empty location uniquely determines the location of all elements a b c d e Rule: h1(minimal element) is empty

  15. The Canonical Representation • Assume that S can be stored using h1 and h2 • We force a canonical representation on the cuckoo graph • Suffices to consider a single connected component • Assume that S has one cycle • Two ways to assign elements in the cycle • Each choice uniquely determines the location of all elements a b c d e Rule: minimal element in cycle lies in T1

  16. The Canonical Representation • Updates efficiently maintain the canonical representation • Insertions: • New leaf: check if new element is smaller than current min • new cycle: • Same component… • Merging two components… • All cases straight forward • Deletions: • Find the new min, split component,… • Requires connecting all elements in the component with a sorted cyclic list • Memory utilization drops to 25% • All cases straight forward • Update time < size of component = expected (small) constant

  17. Rehashing • What if S cannot be stored using h1 and h2 ? • Happens with probability £(1/n) • Can we simply pick new functions? • Rear, but very bad worst case performance • Canonical memory implies we need to sample all hash functions in advance (theoretical problem) • Whenever an item is deleted, need to check whether we must role back to previous hash functions • A bad item which is repeatedly inserted and deleted would cause a rehash every operation!

  18. Using a Stash • Whenever an insert fails, put a ‘bad’ item in a secondary data structure • Bad item: smallest item that belongs to a cycle • Secondary data structure must be SHI in itself • Theorem [KMW08]: Pr[|stash| > s] < n-s • In practice keeping the stash as a sorted list is probably the best solution • Effectively the query time is constant with (very) high probability • In theory the stash could be any SHI with constant lookup time • A deterministic hashing scheme, where the elements are rehashed whenever the content changes [AN96, HMP01]

  19. Conclusions and Problems • Cuckoo hashing is a robust and flexible hashing scheme • Easily ‘molded’ into a history independent data structure • We don’t know how to do this for CH with more than 2 hash functions and/or more than 1 element per bucket • Better memory utilization, better performance, but.. • Expected size of connected component is not constant • Full performance analysis

More Related