1 / 44

Lecture 12: Collisions

CSC 213 – Large Scale Programming. Lecture 12: Collisions. Today’s Goal. Today’s Goal. Review when, where, & why we use Map s Why Sequence -based approach causes problems How hash can help solve these problems What is inappropriate and incorrect about hash jokes

shiloh
Download Presentation

Lecture 12: Collisions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 213 – Large Scale Programming Lecture 12: Collisions

  2. Today’s Goal

  3. Today’s Goal • Review when, where, & why we use Maps • Why Sequence-based approach causes problems • How hash can help solve these problems • What is inappropriate and incorrect about hash jokes • Discover hash’s problems & what must be done • What would happen if keys hashed to same index • Ways of handling situation so that hash still works • To remove data, using null may not be best option • Dark secrets of hashing, exposed at lecture’s end

  4. Map Performance • In many situations can be matter of life-or-death • 911 Operators immediatelyneed addresses • Google’s search performance in TB/s • O(log n) time too slow for these uses • Would love to use arrays • Convertkeyto intwith hash function • With result of hash, have index in table to examine put,remove&getonly O(1) time

  5. Hash Table • Array locations either: • null • Reference to Entry • Marker value* • Table will contain gaps • Better when spread out • Hash keyto index • Always start with hash

  6. Ideal World • key hashed to unique index • Hash and done, Entry is there

  7. Ideal World • key hashed to unique index • Hash and done, Entry is there And then… You wake up

  8. Collisions • Occurs when 2 keys hash to same index • Ideal hash spreads keys out evenly across table • As nice side effect, this limits collisions • Small table size important also, since RAM limited • Unfortunately, no such thing as ideal hash • Must handle collisions to get O(1) efficiency buzz

  9. Bad Hash • Perfect hash does not exist • Cannot know all keys beforehand • Clustered around a few indices • Or find all keys hashed to same index • Handling bad hash is a necessary • Even given Entryalways check key • Store multiple Entryswith same hash • (Shot of adrenaline restarts heart)

  10. Bucket Arrays • Make hash table an array of linked list Nodes • First node aliased by the array location • Whenever we have collision, we “chain” Entrys • Create new Nodeto store the Entry • The linked list will have new Node at its front

  11. Bucket Arrays • But what if have really bad hash? • Hashes to same index in every situation • All Entrys now found in single linked list • O(n) execution times would now be required

  12. Bucket Arrays • But what if have really bad hash? • Hashes to same index in every situation • All Entrys now found in single linked list • O(n) execution times would now be required • (Also get bad case of the munchies)

  13. Collisions • Normally, table holds one Entry per index • Need to be smarter when keys collide • Efficiency mattersimportantcritical • If we do not care, use Sequence-based approach • Several common schemes used to provide speed • Each of these schemes has strengths & weaknesses • Silver bullets do not exist in CSC, must balance needs • If all-powerful answers desired, try Religious Studies

  14. Linear Probing • Musical chairs uses this algorithm • At index where keyhashed examine Entry • Circle through array until empty index found • Algorithm is very simple • But creates clusters of Entrys

  15. Linear Probe Example h(x) = xmod13Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 22 31 15 18 44 20 32 76 0 1 2 3 4 5 6 7 8 11 12 9 10

  16. Linear Probe Example h(x) = xmod13Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 22 31 15 18 44 20 32 76 0 1 2 3 4 5 6 7 8 11 12 9 10

  17. Linear Probe Example h(x) = xmod13Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 22 31 15 18 44 20 32 76 0 1 2 3 4 5 6 7 8 11 12 9 10

  18. Linear Probe Example h(x) = xmod13Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 22 31 15 18 44 20 32 76 0 1 2 3 4 5 6 7 8 11 12 9 10

  19. Linear Probe Example h(x) = xmod13Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 22 31 15 18 44 20 32 76 0 1 2 3 4 5 6 7 8 11 12 9 10

  20. Probing Reaction Oh, ****Adding to hash table still O(n)

  21. Quadratic Probe • Avoids primary clustering problems • But does create secondary clustering (no one cares) • Quadratic probe still simple (like linear probe) • Examine Entry , k, where key is hashed • Check(k+j2) % length:k+1,k+4,k+9, k+16, … • Continue probing until unused array slot found • Guaranteed to work when: • Need to get around -- table size is prime number • Under 50% full so many open slots exist

  22. Quadratic Probe Example h(x) = xmod13Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 31 15 18 44 20 32 76 22 0 1 2 3 4 5 6 7 8 11 12 9 10

  23. Quadratic Probe Example h(x) = xmod13Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 31 15 18 44 20 32 76 22 0 1 2 3 4 5 6 7 8 11 12 9 10

  24. Quadratic Probe Example h(x) = xmod13Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 31 15 18 44 20 32 76 22 0 1 2 3 4 5 6 7 8 11 12 9 10

  25. Quadratic Probe Example h(x) = xmod13Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 31 15 18 44 20 32 76 22 0 1 2 3 4 5 6 7 8 11 12 9 10

  26. Quadratic Probe Example h(x) = xmod13Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 31 15 18 44 20 32 76 22 0 1 2 3 4 5 6 7 8 11 12 9 10

  27. Quadratic Probing Reaction Darn it to heck.Adding to hash table still O(n)

  28. Double Hashing • Solve bad hash with even more hash • Use 2nd hash function very different from first • 2nd hash function not allowed to return zero • Re-hash key using 2nd function after the collision • Check index equal to sum of two hash functions • Re-add 2nd hash to this sum to continue probing • Guaranteed to work when • Still must get around -- table size is prime number

  29. Double Hash Example h(x) = xmod13h2(x) = 5-(xmod5)Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 31 15 18 44 20 32 76 22 0 1 2 3 4 5 6 7 8 11 12 9 10

  30. Double Hash Example h(x) = xmod13h2(x) = 5-(xmod5)Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 31 15 18 44 20 32 76 22 0 1 2 3 4 5 6 7 8 11 12 9 10

  31. Double Hash Example h(x) = xmod13h2(x) = 5-(xmod5)Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 31 15 18 44 20 32 76 22 0 1 2 3 4 5 6 7 8 11 12 9 10

  32. Double Hash Example h(x) = xmod13h2(x) = 5-(xmod5)Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 31 15 18 44 20 32 76 22 0 1 2 3 4 5 6 7 8 11 12 9 10

  33. Double Hash Example h(x) = xmod13h2(x) = 5-(xmod5)Now add: 44h(44) =5 20h(20) =7 22h(22) =9 31h(31) =5 31 15 18 44 20 32 76 22 0 1 2 3 4 5 6 7 8 11 12 9 10

  34. Double Probing Reaction Sweet! Double hashing keeps putO(n)

  35. Probing and Searching • Search index where key hashed • If cannot place Entryat index • The array must keep being probed • Stop only at usableindex • May need to probe every index! • Searching takes O(n)even with hash • May need to reallocate & rehash table • Worst case O(n)put even with perfect hash

  36. Post-Removal Operations • What happens when we remove an Entry? • Set index to nullin most structures • Consider if we call remove(44) 15 18 44 20 32 76 22 31 0 1 2 3 4 5 6 7 8 11 12 9 10

  37. Post-Removal Operations • What happens when we remove an Entry? • Set index to nullin most structures • Consider if we call remove(44) 15 18 20 32 76 22 31 0 1 2 3 4 5 6 7 8 11 12 9 10

  38. Post-Removal Operations • What happens when we remove an Entry? • Set index to nullin most structures • Consider if we call remove(44) • get(31) called, what would happen? 15 18 20 32 76 22 31 0 1 2 3 4 5 6 7 8 11 12 9 10

  39. Post-Removal Operations • What happens when we remove an Entry? • Set index to nullin most structures • Consider if we call remove(44) • get(31) called, what would happen? • First check index it is hashed to 15 18 20 32 76 22 31 0 1 2 3 4 5 6 7 8 11 12 9 10

  40. Post-Removal Operations • What happens when we remove an Entry? • Set index to nullin most structures • Consider if we call remove(44) • get(31) called, what would happen? • First check index it is hashed to • Checks first probe indexed… 15 18 20 32 76 22 31 0 1 2 3 4 5 6 7 8 11 12 9 10

  41. Post-Removal Operations • What happens when we remove an Entry? • Set index to nullin most structures • Consider if we call remove(44) • get(31) called, what would happen? • First check index it is hashed to • Checks first probe indexed… & stops at null 15 18 20 32 76 22 31 0 1 2 3 4 5 6 7 8 11 12 9 10

  42. *Marker Value Explained • Mark cleared indices in hash table • Since collision could have happened, continue search • Index can be used to store new Entry • Ways to show that array index is clear • Entry with null key could be used if one is careful • Could try and make keywhich is never used • Use staticfinal field of type Entry

  43. Why Use Hash Table & Probes? • Hash tables can require O(n) complexity • Provide O(1) time if you are really good • Ultimately depends on hash function used • Choose wisely and be rich

  44. Before Next Lecture… • Get updated lab project into SVN directory • No need to e-mail, I will collect directories at 5PM • Finish working on week #4 assignment • Due at usual time tomorrow afternoon/evening • Start thinking of your design for the project • Due Friday a preliminary copy of this design • Read sections 9.3 - 9.3.1 & 9.3.3 of the book • What should we do if many values for 1 key?

More Related