1 / 24

CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man. Steve Wolfman Winter Quarter 2000. Today’s Outline. Discuss the midterm Hashing and Hash Tables. Dictionary operations create destroy insert find delete Stores values associated with user-specified keys

boaz
Download Presentation

CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 326: Data StructuresLecture #14Whoa… Good Hash, Man Steve Wolfman Winter Quarter 2000

  2. Today’s Outline • Discuss the midterm • Hashing and Hash Tables

  3. Dictionary operations create destroy insert find delete Stores values associated with user-specified keys values may be any (homogenous) type keys may be any (homogenous) comparable type Zasha interesting ID, but not enough ooomph! Bone More oomph, less high scoring Scrabble action Wolf the perfect mix of oomph and Scrabble value Reminder: Dictionary ADT insert • Darth • - formidable find(Wolf) • Wolf • - the perfect mix of oomph • and Scrabble value

  4. Implementations So Far insert find delete • Unsorted list O(1) O(n) O(n) • Trees O(log n) O(log n) O(log n) • Special case: O(1) O(1) O(1) integer keys between 0 and k How about O(1) insert/find/delete for any key type?

  5. Hash Table Goal 0 “Zasha” We can do: a[2] = “Andrew” We want to do: a[“Steve”] = “Andrew” 1 “Nic” 2 “Steve” Andrew Andrew 3 “Brad” … … k-1 “Ed”

  6. Zasha f(x) Steve Nic Brad Ed Hash Table Approach But… is there a problem is this pipe-dream?

  7. Hash function: maps keys to integers result: can quickly find the right spot for a given entry Unordered and sparse table result: cannot efficiently list all entries, Zasha f(x) Steve Nic Brad Ed Hash Table Dictionary Data Structure

  8. Hash Table Terminology hash function Zasha f(x) Steve Nic collision Brad Ed keys load factor  = # of entries in table tableSize

  9. What should the hash function be? What should the table size be? How should we resolve collisions? Hash Table CodeFirst Pass Value & find(Key & key) { int index = hash(key) % tableSize; return Table[tableSize]; }

  10. A Good Hash Function… • is easy (fast) to compute (O(1) and practically fast). • distributes the data evenly (hash(a) % size  hash(b) % size). • uses the whole hash table (for all 0  k < size, there’s an i such that hash(i) % size = k).

  11. Good Hash Function for Integers • Choose • tableSize is prime • hash(n) = n • Example: • tableSize = 7 insert(4) insert(17) find(12) insert(9) delete(17) 0 1 2 3 4 5 6

  12. Good Hash Function for Strings? • Let s = s1s2s3s4…s5: choose • hash(s) = s1 + s2128 + s31282 + s41283 + … + sn128n • Problems: • hash(“really, really big”) = well… something really, really big • hash(“one thing”) % 128 = hash(“other thing”) % 128 Think of the string as a base 128 number.

  13. Making the String HashEasy to Compute • Use Horner’s Rule int hash(String s) { h = 0; for (i = s.length() - 1; i >= 0; i--) { h = (si + 128*h) % tableSize; } return h; }

  14. Making the String HashCause Few Conflicts • Ideas?

  15. Good Hashing: Multiplication Method • Hash function is defined by size plus a parameter A hA(k) = size * (k*A mod 1) where 0 < A < 1 • Example: size = 10, A = 0.485 hA(50) = 10 * (50*0.485 mod 1) = 10 * (24.25 mod 1) = 10 * 0.25 = 2 • no restriction on size! • if we’re building a static table, we can try several As • more computationally intensive than a single mod

  16. Good Hashing:Universal Hash Function • Parameterized by prime size and vector: a = <a0 a1 … ar> where 0 <= ai < size • Represent each key as r + 1 integers where ki < size • size = 11, key = 39752 ==> <3,9,7,5,2> • size = 29, key = “hello world” ==> <8,5,12,12,15,23,15,18,12,4> ha(k) =

  17. Universal Hash Function: Example • Context: hash strings of length 3 in a table of size 131 let a = <35, 100, 21> ha(“xyz”) = (35*120 + 100*121 + 21*122) % 131 = 129

  18. Universal Hash Function • Strengths: • works on any type as long as you can form ki’s • if we’re building a static table, we can try many a’s • a random a has guaranteed good properties no matter what we’re hashing • Weaknesses • must choose prime table size larger than any ki

  19. Alternate Universal Hash Function • Parameterized by k, a, and b: • k * size should fit into an int • a and b must be less than size Hk,a,b(x) =

  20. Alternate Universe Hash Function: Example • Context: hash integers in a table of size 16 let k = 32, a = 100, b = 200 hk,a,b(1000) = ((100*1000 + 200) % (32*16)) / 32 = (100200 % 512) / 32 = 360 / 32 = 11

  21. Universal Hash Function • Strengths: • if we’re building a static table, we can try many a’s • random a,b has guaranteed good properties no matter what we’re hashing • can choose any size table • very efficient if k and size are powers of 2 • Weaknesses • still need to turn non-integer keys into integers

  22. Collisions • Pigeonhole principle says we can’t avoid all collisions • try to hash without collision m keys into n slots with m > n • try to put 10 pigeons into 5 holes • What do we do when two keys hash to the same entry? • open hashing: put little dictionaries in each entry • closed hashing: pick a next entry to try shove extra pigeons in one hole!

  23. To Do • Form your team and start Project III • Read chapter 5 in the book

  24. Coming Up • More hash tables • Disjoint-set union-find ADT • Fourth Quiz (February 10th)

More Related