CS 221 Guest lecture: Cuckoo Hashing

CS 221Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Learning Goals • Describe the cuckoo hashing principle • Analyze the space and time complexity of cuckoo hashing • Apply the insert and lookup algorithms in a cuckoo hash table • Construct the graph for a cuckoo table

Remember Graphs? • A set of nodes • A set of edges • Here:

Graph Cycles • A graph cycle is a path of edges such that the first and last vertices are the same

Recall Hashing • A hash function • Takes the target • Hashes x to a bucket • Perfect hashing is ideal: • O(1) lookup • O(1) insert • Perfect hashing is not realistic!

Cuckoo Hashing: the idea • Remember the cuckoo bird? • Shares a nest with other species… • …then kicks the other species out! • Same idea with cuckoo hashing • When we insert , we “kick out” what occupies the nest, • Then finds a new, alternate home

Why is this cool? • Perfect hashing guarantees • O(1) lookup, O(1) insert • Cuckoo hashing guarantees • O(1) lookup • O(1) insert** • Other hashing strategies can’t guarantee this! • Also, it’s an option for your final project ** There’s a caveat here, but we’ll see it later

Cuckoo Hashing: Two Nests • Suppose we have TWO hash tables • they each have a hash function • we prefer, but if we have to move we’ll go to • if we’re in and have to move, we’ll go back to • This is our collision strategy for cuckoo hashing • Different from linear probing/open addressing • Different from trees

Cuckoo Hashing: Example • We want to insert • There are no conflicts anywhere x

Cuckoo Hashing : Example • Now we want to insert • There are no conflicts anywhere x y

Cuckoo Hashing : Example • To insert , • Move to y x oh no! z

Cuckoo Hashing : Example • Now we insert into y x NOW we’re fine! z

Cuckoo Hashing : Example • The final table after inserting in order y x z

Why two tables? • Two tables, one for each hash function • Simple to visualize, simple to implement • But, why two? • One table works just as well! • Just as simple to implement (all one table)

One Table Example • Let’s insert again, with • Again, preferred x

One Table Example • Now insert • No conflicts, no problem x y

One Table Example • Now insert • But, another conflict with : oh no! x y z

One Table Example • First, move to x y z

One Table Example • Now we move to x z y

One Table Example • Final table after inserting in order x z y

Graph Representation • How can we represent our table? • Why not a graph? • Nodes are every possible table entry • Edges are inserted entries • This is a directed graph • Direction from current location TO alternate location

Graph Example • Remember our one-table example? 1 2 x 1 2 z y 3 3 4 4

Infinite Insert • Suppose we insert something, and we end up in an infinite loop • Or, “too many” displacements • Some pre-defined maximum based on table size

Example: Loops • Remember our one-table example? x 1 1 2 2 z y 3 3 4 4

Example: Loops • Let’s insert : no conflicts still x 1 1 2 2 z y 3 3 4 w 4

Example: Loops • Now let’s insert : displace x 1 1 2 2 z y 3 3 4 a w 4

Example: Loops • Now is placed, and is displaced (put in 4) a 1 1 2 2 x y 3 3 4 z w 4

Example: Loops • Now is placed, and is displaced (put in 3) a 1 1 2 2 x y 3 3 4 w z 4

Example: Loops • Notice what happens to the graph • We keep going and going and going…. 1 2 3 4

Analysis: Loops • Remember infinite loops in a new insert? • In the graph, this is a closed loop • We might forever re-do the same displacements • The probability of getting a loop increases dramatically once we’ve inserted elements • N is the number of buckets (size of table) • This is from the research on cuckoo hashing

Analysis: Loops • What can we do once we get a loop? • Rebuild, same size (ok solution) • Double table size (better solution) • We’ll need new hash functions for both

Analysis • Lookup has O(1) time • At MOST two places to look, ever • One location per hash function • Insert has amortized O(1) time • Think of this as “in the long run” • In practice we see O(1) time insert • You’ll see amortized analysis in CPSC 320 • Remember the “grass and trees” analysis?

Lookup: The Code Return the position of (either or ) Otherwise, return false lookup(x) return T[h1(x)] = x or T[h2(x)] = x

Insert: The Code Given a table (array) T and item to insert: insert(x) if lookup(x) return; // if it’s already here, done pos <- h1(x); // store h1(x) for i <- 1 to M // loop at most M times if T[pos] empty T[pos] <- x return; // if T[pos] empty, done swap x and T[pos]; // put x in T[pos] if pos = h1(x) // now we’re displacing pos <- h2(x) else pos <- h1(x) rehash(); // if we couldn’t stop, rehash insert(x); // then insert currently displaced end

Analysis: Load Factor • What is load? • The average fill factor (% full) the table is • What about cuckoo hash tables? • For two hash functions, load factor • Remember loops? • For three hash functions, we get • That’s pretty great, actually!

More hash functions • What would this look like? • We would have three tables (simple case) • One hash function per table • Or, we would have two alternates (one table)

More hash functions • What would this look like? • Each entry has TWO alternates, not one x z y

More hash functions • When something comes in new (insert) • Put it in • If it’s displaced, check • If that’s full, go to • To lookup, we just look in or • Still constant time!

Even better load? • Currently we’ve only put one item per bucket • What if we had two cells per bucket? x,w z y,a

Even better load? • Currently we’ve only put one item per bucket • What if we had two cells per bucket? • What about collision strategies? • Round-robin (cells take turns swapping out) • FIFO (oldest resident gets kicked out)

Even better load?

Links & Resources • http://en.wikipedia.org/wiki/Cuckoo_hashing • http://www.ru.is/faculty/ulfar/CuckooHash.pdf • http://www.it-c.dk/people/pagh/papers/cuckoo-undergrad.pdf • No neat animations on the internet…yet! • Possible personal project? • Brownie points? • Pre-coop project?

CS 221 Guest lecture: Cuckoo Hashing

CS 221 Guest lecture: Cuckoo Hashing

Presentation Transcript

Hashing

Which is the Cuckoo's Egg?

Hashing

Password Hashing

The Cuckoo Clock

Backyard Cuckoo Hashing:

DATA MINING LECTURE 6

ΜΑΘΗΜΑ 10 ο

Lecture 18 Nov 3 Goals: hashing

SPECIAL GUEST LECTURE Dr Rajendra S. Apte, M.D., Ph.D.

Cuckoo Hashing and CAMs

Chap 12 Indexing and Hashing

MIS 424 Guest Lecture

Algorithmic Improvements for Fast Concurrent Cuckoo Hashing

Lecture 18-19: Concurrency Control

Hashing Table Professor Sin-Min Lee Department of Computer Science