1 / 66

Hashing and Hash Tables

Hashing and Hash Tables. Binhai Zhu Computer Science Department, Montana State University. Motivation. What are the dictionary operations?. Motivation. What are the dictionary operations? (1) Insert (2) Delete (3) Search (most of the time, we will be focusing on search).

trumand
Download Presentation

Hashing and Hash Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing and Hash Tables Binhai Zhu Computer Science Department, Montana State University

  2. Motivation • What are the dictionary operations?

  3. Motivation • What are the dictionary operations? (1) Insert (2) Delete (3) Search (most of the time, we will be focusing on search)

  4. Objective • Searching takes Θ(n) time in the worst case (when the data is unorganized). • Even using binary search it takes Θ(log n) time when the data are sorted. • Our Objective?

  5. Objective • Searching takes Θ(n) time in the worst case (when the data is unorganized). • Even using binary search it takes Θ(log n) time when the data are sorted. • Our Objective? O(1) time on average using hashing, under a reasonable assumption.

  6. Definitions • A hash table is a generalization of an array (direct addressing is allowed), so let’s first talk about direct-address table. • Universe of keys U={0,1,2,…,m-1}, no two elements have the same key. • To represent a dynamic set, we use an array, or direct address table T[0..m-1], in which each position (slot) corresponds to the key in the universe.

  7. Definitions T • To represent a dynamic set, we use an array, or direct address table T[0..m-1], in which each position (slot) corresponds to a key in the universe. satellite data 0 / key 1 / 2 2 U (universe of keys) 3 3 1 0 / 4 2 9 3 K (actual keys) 4 5 5 8 6 / 5 7 / 8 8 9 /

  8. T • With a direct address table T[0..m-1], how do we search an element x with key k? satellite data 0 / key 1 / 2 2 U (universe of keys) 3 3 1 0 / 4 2 9 3 K (actual keys) 4 5 5 8 6 / 5 7 / 8 8 9 /

  9. T • With a direct address table T[0..m-1], how do we search an element x with key k? Direct-Address-Search(T,k): return T[k] satellite data 0 / key 1 / 2 2 U (universe of keys) 3 3 1 0 / 4 2 9 3 K (actual keys) 4 5 5 8 6 / 5 7 / 8 8 9 /

  10. T • With a direct address table T[0..m-1], how do we search/insert/delete an element x with key k? Direct-Address-Search(T,k): return T[k] satellite data 0 / key 1 / 2 2 U (universe of keys) 3 3 1 0 / 4 2 9 3 K (actual keys) 4 5 5 8 6 / 5 7 / 8 8 9 /

  11. T • With a direct address table T[0..m-1], how do we search/insert/delete an element x with key k? Direct-Address-Search(T,k): return T[k] Direct-Address-Insert(T,x): T[key[x]] ← x Direct-Address-Delete(T,x): T[key[x]] ← Nil satellite data 0 / key 1 / 2 2 U (universe of keys) 3 3 1 0 / 4 2 9 3 K (actual keys) 4 5 5 8 6 / 5 7 / 8 8 9 /

  12. T • With a direct address table T[0..m-1], how do we search/insert/delete an element x with key k? Direct-Address-Search(T,k): return T[k] Direct-Address-Insert(T,x): T[key[x]] ← x O(1) time! Direct-Address-Delete(T,x): T[key[x]] ← Nil satellite data 0 / key 1 / 2 2 U (universe of keys) 3 3 1 0 / 4 2 9 3 K (actual keys) 4 5 5 8 6 / 5 7 / 8 8 9 /

  13. T • With a direct address table T[0..m-1], how do we search/insert/delete an element x with key k? Direct-Address-Search(T,k): return T[k] Direct-Address-Insert(T,x): T[key[x]] ← x Problem? Direct-Address-Delete(T,x): T[key[x]] ← Nil satellite data 0 / key 1 / 2 2 U (universe of keys) 3 3 1 0 / 4 2 9 3 K (actual keys) 4 5 5 8 6 / 5 7 / 8 8 9 /

  14. Hash Table T • With direct addressing, an element with key k is inserted in slot h(k). h is called a hash function. • h maps the universe U of keys into the slots of a hash table T[0..m-1]. h : U → {0,1,…,m-1} 0 / 1 8 / 2 U (universe of keys) 3 / 1 0 4 2 2 9 3 K (actual keys) 4 3 5 8 6 / 5 7 5 8 / 9 /

  15. Hash Table T • With direct addressing, an element with key k is inserted in slot h(k). h is called a hash function. • h maps the universe U of keys into the slots of a hash table T[0..m-1]. h : U → {0,1,…,m-1} 0 / 1 / / 2 U (universe of keys) 3 / 1 0 4 2 2 9 3 K (actual keys) 4 3 5 If h(5)=h(8) 8 6 / 5 X Collision! 7 5 8 8 / 9 /

  16. Collision T • Two keys hash to the same slot --- collision. • While collision is hard to avoid, if we design the hash function carefully we can at least decrease the chance for collision (and in some cases may avoid collision). 0 / 1 / / 2 U (universe of keys) 3 / 1 0 4 2 2 9 3 K (actual keys) 4 3 5 If h(5)=h(8) 8 6 / 5 X Collision! 7 5 8 8 / 9 /

  17. Collision Resolution by Chaining T • Two keys hash to the same slot --- collision. • While collision is hard to avoid, if we design the hash function carefully we can at least decrease the chance for collision (and in some cases may avoid collision). 0 / 1 / / 2 U (universe of keys) 3 / 1 0 4 2 2 9 3 K (actual keys) 4 3 5 If h(5)=h(8) 8 6 / 5 7 5 8 8 / 9 /

  18. Collision Resolution by Chaining T Chained-Hash-Insert(T,x): insert x at the head of list T[h(key[x])] Chained-Hash-Search(T,k): 0 / 1 / / 2 U (universe of keys) 3 / 1 0 4 2 2 9 3 K (actual keys) 4 3 5 If h(5)=h(8) 8 6 / 5 7 5 8 8 / 9 /

  19. Collision Resolution by Chaining T Chained-Hash-Insert(T,x): insert x at the head of list T[h(key[x])] Chained-Hash-Search(T,k): search for an element with key k in list T[h(k)] 0 / 1 / / 2 U (universe of keys) 3 / 1 0 4 2 2 9 3 K (actual keys) 4 3 5 If h(5)=h(8) 8 6 / 5 7 5 8 8 / 9 /

  20. Collision Resolution by Chaining T Chained-Hash-Insert(T,x): insert x at the head of list T[h(key[x]) Chained-Hash-Search(T,k): search for an element with key k in list T[h(k)] Chained-Hash-Delete(T,x): 0 / 1 / / 2 U (universe of keys) 3 / 1 0 4 2 2 9 3 K (actual keys) 4 3 5 If h(5)=h(8) 8 6 / 5 7 5 8 8 / 9 /

  21. Collision Resolution by Chaining T Chained-Hash-Insert(T,x): insert x at the head of list T[h(key[x])] Chained-Hash-Search(T,k): search for an element with key k in list T[h(k)] Chained-Hash-Delete(T,x): delete x from the list T[h(key[x])] Time? 0 / 1 / / 2 U (universe of keys) 3 / 1 0 4 2 2 9 3 K (actual keys) 4 3 5 If h(5)=h(8) 8 6 / 5 7 5 8 8 / 9 /

  22. Collision Resolution by Chaining T 11 33 0 Example: Let h(k)= k mod 11, insert 5,28,19,15,20,33,12,17,39,11 into T[0..10]. 1 12 / 2 3 / 15 4 5 5 39 17 28 6 7 / 19 8 20 9 10 /

  23. Hash function • A hash function which causes no collision is called perfect hash function. • A good hash function is one which satisfies simple uniform hashing --- each key is equally likely to hash to any of the m slots. (It is difficult to check this condition though.) • Now let’s see some example for hash functions. Assume that all the keys can be represented as natural numbers.

  24. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803...

  25. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803… Example. K = 123456, m=10000. h(k) = └10000(123456 x 0.61803… mod 1)┘

  26. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803... Example. K = 123456, m=10000. h(k) = └10000(123456 x 0.61803… mod 1)┘ = └10000(76300.0041151… mod 1)┘

  27. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803… Example. K = 123456, m=10000. h(k) = └10000(123456 x 0.61803… mod 1)┘ = └10000(76300.0041151… mod 1)┘ = └10000 x 0.0041151…)┘

  28. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803. Example. K = 123456, m=10000. h(k) = └10000(123456 x 0.61803 mod 1)┘ = └10000(76300.0041151… mod 1)┘ = └10000 x 0.0041151…)┘ = └41.151…┘

  29. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803. Example. K = 123456, m=10000. h(k) = └10000(123456 x 0.61803 mod 1)┘ = └10000(76300.0041151… mod 1)┘ = └10000 x 0.0041151…)┘ = └41.151…┘ = 41

  30. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803. • Folding: The key is divided into several parts. These parts are combined or folded together and are transformed in a certain way to create the target address. Example 1. Shift folding: 123-456-789 (SSN) 123+456+789 = 1368 1368 mod 1000 = 368.

  31. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803. • Folding: The key is divided into several parts. These parts are combined or folded together and are transformed in a certain way to create the target address. Example 1. Shift folding: 123-456-789 (SSN) 123+456+789 = 1368 1368 mod 1000 = 368. Example 2. Boundary folding: 123-456-789 (SSN) 123+654+789 = 1566 1566 mod 1000 = 566.

  32. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803. • Folding: The key is divided into several parts. These parts are combined or folded together and are transformed in a certain way to create the target address. • Mid-square function: key is squared and the middle part of the result is taken as the address. Example. k=3121, 31212 = 9740641, so h(k) =

  33. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803. • Folding: The key is divided into several parts. These parts are combined or folded together and are transformed in a certain way to create the target address. • Mid-square function: key is squared and the middle part of the result is taken as the address. Example. k=3121, 31212 = 9740641, so h(k) = 406. You can also encode the square into binary representation and take the middle part.

  34. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803. • Folding: The key is divided into several parts. These parts are combined or folded together and are transformed in a certain way to create the target address. • Mid-square function: key is squared and the middle part of the result is taken as the address. • Extraction: Only a part of the key is used to compute the address. Example: 123456789, first 4 digits 1234, last 4 digits 6789 first 2 digits of 1234 ◦ last digits of 6789 we have 1289

  35. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803. • Folding: The key is divided into several parts. These parts are combined or folded together and are transformed in a certain way to create the target address. • Mid-square function: key is squared and the middle part of the result is taken as the address. • Extraction: Only a part of the key is used to compute the address. • Radix Transformation: k is transformed into another number base Example: 34510 = 4239 , then 423 mod 100 = 23.

  36. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803. • Folding: The key is divided into several parts. These parts are combined or folded together and are transformed in a certain way to create the target address. • Mid-square function: key is squared and the middle part of the result is taken as the address. • Extraction: Only a part of the key is used to compute the address. • Radix Transformation: k is transformed into another number base Example: 34510 = 4239 , then 423 mod 100 = 23. 26410 = 3239, then 323 mod 100 =23.

  37. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803. • Folding: The key is divided into several parts. These parts are combined or folded together and are transformed in a certain way to create the target address. • Mid-square function: key is squared and the middle part of the result is taken as the address. • Extraction: Only a part of the key is used to compute the address. • Radix Transformation: k is transformed into another number base Example: 34510 = 4239 , then 423 mod 100 = 23. 26410 = 3239, then 323 mod 100 =23. Collision is hard to avoid in the worst case!

  38. Famous Examples of Hash Functions • Division: h(k) = k mod m, m should be a prime number, better close to a power of 2. • Multiplication: h(k) = └m(kA mod 1)┘, A=(√5 – 1)/2=0.61803. • Folding: The key is divided into several parts. These parts are combined or folded together and are transformed in a certain way to create the target address. • Mid-square function: key is squared and the middle part of the result is taken as the address. • Extraction: Only a part of the key is used to compute the address. • Radix Transformation: k is transformed into another number base

  39. Open Addressing • In some applications, it is hard to dynamically allocate additional space for handling the chaining. • So it is natural to come up with a different way to handle collision in which all elements are stored in the hash table itself. Then, instead of following pointers, we simply compute the sequences of slots to be examined. Let’s use insertion as an example.

  40. Open Addressing • In some applications, it is hard to dynamically allocate additional space for handling the chaining. • So it is natural to come up with a different way to handle collision in which all elements are stored in the hash table itself. Then, instead of following pointers, we simply compute the sequences of slots to be examined. Let’s use insertion as an example. To perform insertion using open addressing, we successively examine or probe the hash table until we find an empty slot to put the element. Moreover, the sequence of positions probed depends on the key being inserted; i.e., h: U x {0,1,…,m-1} → {0,1,…,m-1}

  41. Open Addressing • To perform insertion using open addressing, we successively examine or probe the hash table until we find an empty slot to put the element. Moreover, the sequence of positions probed depends on the key being inserted; i.e., h: U x {0,1,…,m-1} → {0,1,…,m-1} Apparently, for every key k, the probe sequence <h(k,0), h(k,1),…,h(k,m-1)> is a permutation of <0,1,…,m-1> so that every position in the hash table is eventually considered as a slot for a new key as the table fills up. Now, for simplicity, assume k=x, and there is no deletion.

  42. Open Addressing Hash-Insert(T,k) 1. i ← 0 2. repeat j ← h(k,i) • if T[j] == Nil • then T[j] ← k • return j • else i ← i + 1 7. until i=m 8. error “hash table overflow”

  43. Open Addressing T 0 1 Hash-Insert(T,k) 1. i ← 0 2. repeat j ← h(k,i) • if T[j] == Nil • then T[j] ← k • return j • else i ← i + 1 7. until i=m 8. error “hash table overflow” 2 3 4 5 6 7 8 Example. Insert keys 10,22,31,4,15,28,17,88,59 into T. h(k,i)=[h’(k)+i] mod m, h’(k)=k mod m. 9 10

  44. Open Addressing T 0 h(10,0)=(10+0) mod 11 = 10 1 Hash-Insert(T,k) 1. i ← 0 2. repeat j ← h(k,i) • if T[j] == Nil • then T[j] ← k • return j • else i ← i + 1 7. until i=m 8. error “hash table overflow” 2 3 4 5 6 7 8 Example. Insert keys 10,22,31,4,15,28,17,88,59 into T. h(k,i)=[h’(k)+i] mod m, h’(k)=k mod m. 9 10 10

  45. Open Addressing T 0 22 h(10,0)=(10+0) mod 11 = 10 h(22,0)= 0 h(31,0)=9 h(4,0)=4 h(15,0)=(4+0) mod 11 =4 1 Hash-Insert(T,k) 1. i ← 0 2. repeat j ← h(k,i) • if T[j] == Nil • then T[j] ← k • return j • else i ← i + 1 7. until i=m 8. error “hash table overflow” 2 3 4 4 5 6 7 8 Example. Insert keys 10,22,31,4,15,28,17,88,59 into T. h(k,i)=[h’(k)+i] mod m, h’(k)=k mod m. 31 9 10 10

  46. Open Addressing T 0 22 h(10,0)=(10+0) mod 11 = 10 h(22,0)= 0 h(31,0)=9 h(4,0)=4 h(15,0)=(4+0) mod 11 =4 h(15,1)=(4+1) mod 11 =5 1 Hash-Insert(T,k) 1. i ← 0 2. repeat j ← h(k,i) • if T[j] == Nil • then T[j] ← k • return j • else i ← i + 1 7. until i=m 8. error “hash table overflow” 2 3 4 4 5 15 6 7 8 Example. Insert keys 10,22,31,4,15,28,17,88,59 into T. h(k,i)=[h’(k)+i] mod m, h’(k)=k mod m. 31 9 10 10

  47. Open Addressing T 0 22 1 88 Hash-Insert(T,k) 1. i ← 0 2. repeat j ← h(k,i) • if T[j] == Nil • then T[j] ← k • return j • else i ← i + 1 7. until i=m 8. error “hash table overflow” 2 3 4 4 5 15 28 6 17 7 8 59 • Example. Insert keys 10,22,31,4,15,28,17,88,59 into T. • h(k,i)=[h’(k)+i] mod m, • h’(k)=k mod m. 31 9 10 10

  48. Open Addressing T 0 22 1 88 Hash-Search(T,k) 1. i ← 0 2. repeat j ← h(k,i) • if T[j] == k • then return j • i ← i + 1 6. until T[j]=Nil or i=m 7. return Nil i = 0 j ← h(15,0)=4 T[j] != 15 i = 1 j ← h(15,1)=5 T[j] = 15 return 5 2 3 4 4 5 15 28 6 17 7 8 59 Example. Search 15 in T. h(k,i)=[h’(k)+i] mod m, h’(k)=k mod m. 31 9 10 10

  49. Open Addressing T 0 22 1 88 How about deletion? You can simply use Hash-Search to find the key first. Then what? 1. i ← 0 2. repeat j ← h(k,i) • if T[j] != Nil and T[j]==k • then T[j] ← Nil? exit • i ← i + 1 6. until T[j]=Nil or i=m 2 3 4 4 5 15 28 6 17 7 8 59 Example. Delete 4,15 in T. h(k,i)=[h’(k)+i] mod m, h’(k)=k mod m. 31 9 10 10

  50. Open Addressing T 0 22 1 88 How about deletion? You can simply use Hash-Search to find the key first. Then what? 1. i ← 0 2. repeat j ← h(k,i) • if T[j] != Nil and T[j] == k • then T[j] ← Nil?, exit • i ← i + 1 6. until T[j]=Nil or i=m Delete 15: i = 0 j ← h(15,0)=4 T[j] = Nil exit 2 3 4 Nil 5 15 28 6 17 7 8 59 Example. Delete 4,15 in T. h(k,i)=[h’(k)+i] mod m, h’(k)=k mod m. 31 9 10 10

More Related