1 / 63

HASH TABLE

HASH TABLE. Hashing. is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Example. a group of people could be arranged in a database like this:. Allen, Jane. Moore, Sarah. Smith, Dan. H A S H I N G.

eve-whitney
Download Presentation

HASH TABLE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HASH TABLE

  2. Hashing • is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Example a group of people could be arranged in a database like this: Allen, Jane Moore, Sarah Smith, Dan

  3. H A S H I N G Hash Table 7864 Allen, Jane Moore, Sarah 9802 1990 Smith, Dan HASH VALUES HASH KEYS HASH FUNCTION

  4. Hash Table • stores things and allows 3 operations: insert, search and delete. • associated with a set of records

  5. Bob Miller 34 John Smith Sally Wood 21 H John Smith 29 5

  6. Each slot of a hash table is called a bucket and hash values are called bucket indices. 7864 Allen, Jane BUCKET BUCKET INDEX

  7. HASH FUNCTION • Mapping of the keys to indices of a hash table • 2 compositions Hash code map: key integer Compression map: integer [0, N-1]

  8. DIVISION Map a key k into one of m slots by using this function: h(k) = k mod m Example:     If table size m = 12                key k = 100     than        h(100) = 100 mod 12                   = 4

  9. MID-SQUARE FUNCTION • The key is squared and the mid part is used as the address. Ex. k=3121 then 31212=9740641 thus h(3121)= 406

  10. Folding • Key is divided into several parts • 2 types 1. shift folding 2. boundary folding

  11. Shift Folding Ex. (SSN) 123-45-6789 1. Divide into 3 parts: 123, 456 and 789. 2. Add them. 123+456+789=1368 3. h(k)=k mod M where M = 1000 h(1368) = 1368 mod 1000 = 368 1. Divide into five parts: 12, 34, 56, 78 and 9. 2. Add them. 12 + 34 + 56 + 78 + 9 = 189 3. h(k)=k mod M where M = 1000 h(189) = 189 mod 1000 = 189

  12. Extraction • Only a part of the key is used to compute the address. Ex. (SSN) 123-45-6789 • 1st 4 digits = 1234 • Last 4 digits = 6789 • 1st 2 combined with the last 2 = 1289(address)

  13. Hash Method : Folding • Chopping the Key in Two Parts • Add the Two Parts to Generate the Hash • Leading Digit will be Ignored • Example • Key 3205 7148 2345 • Parts 32 05 71 48 23 45 • H(x) 37 19 68 • Option Rotate the Second Digit • Parts 32 50 71 84 23 54 • H(x) 82 55 77

  14. Radix Transformation • K is transformed into another number base 21210=2559 M = 100 • H(k) = k mod M H(255) = 255 mod 100 = 55

  15. divide 212 by 9. • 9 divides into 212 23 times with remainder 5. 212= 9(23)+ 5 • 9 divides into 23 twice with remainder 5. 23= 9(2)+5 • 212= 9(9(2)+ 5)+ 5 = 2(92)+ 5(9)+ 5.

  16. Hash Collision • different keys happen to have same hash value

  17. Bob Miller 34 Jane Depp 18 Collision! Sally Wood 21 2 John Smith 29

  18. Collision Resolution • There are two kinds of collision resolution: 1 – Chaining makes each entry a linked list so that when a collision occurs the new entry is added to the end of the list. 2 – Open Addressing uses probing to discover an empty spot.

  19. Collision Resolution – Open Addressing • the table is probed for an open slot when the first one already has an element. • Linear probing in which the interval between probes is fixed — often at 1. • Quadratic probing in which the interval between probes increases linearly (hence, the indices are described by a quadratic function). • Double hashing in which the interval between probes is fixed for each record but is computed by another hash function.

  20. Linear Probing • is a scheme in resolving hash collisions of values of hash functions by sequentially searching the hash table for a free location • two values - one as a starting value and one as an interval between successive values  • newLocation = (startingValue + stepSize) % arraySize H(x,i) = (H(x) + i)(mod M)

  21. Linear Probing - Example 0 1 2 3 4 5 6 7 8 9 Insert 15, 17, 8 0 1 2 3 4 5 6 7 8 9 15 17 8 H(15)=15 mod 10 = 5 H(17)=17 mod 10 = 7 H(8)=8 mod 10 = 8

  22. 0 1 2 3 4 5 6 7 8 9 75 15 35 17 8 25 Insert 35 Insert 25 H(1,5)=(1 + 5) mod 10 = 6 H(25)=25 mod 10 = 5 H(1,5)=(1 + 5) mod 10 = 6 H(1,6)=(1 + 6) mod 10 = 7 H(35)=35 mod 10 = 5 H(1,8)=(1 + 8) mod 10 = 9 H(1,7)=(1 + 7) mod 10 = 8 Insert 75 H(1,9)=(1+9) mod 10 = 0 H(75)=75 mod 10 = 5 H(1,5)=(1+5) mod 10 = 6 H(1,6)=(1+6) mod 10 = 7 H(1,7)=(1+7) mod 10 = 8 H(1,8)=(1+8) mod 10 = 9

  23. Has anyone spotted the flaw in the linear probing technique? Think about this: what would happen if we now inserted 85, then 95, then 55?

  24. Each one would probe exactly the same positions as its predecessors. This is known as clustering. It leads to inefficient operations, because it causes the number of collisions to be much greater than it need be.

  25. Quadratic Probing • eliminates primary clustering • p(K, i) = c1i2 + c2i + c3 • p(K, i) = i2 (i.e., c1 = 1, c2 = 0, and c3 = 0)

  26. Quadratic Probing - Example • Example: • Table Size is 11 (0..10) • Hash Function: h(x) = x mod 11 • Insert keys: • 20 mod 11 = 9 • 30 mod 11 = 8 • 2 mod 11 = 2 • 13 mod 11 = 2  2+12=3 • 25 mod 11 = 3  3+12=4 • 24 mod 11 = 2  2+12, 2+22=6 • 10 mod 11 = 10 • 9 mod 11 = 9  9+12, 9+22 mod 11, 9+32 mod 11 =7

  27. not all hash table slots will be on the probe sequence • Using p(K, i) = i2 gives particularly inconsistent results • If all slots on that cycle happen to be full, this means that the record cannot be inserted at all!

  28. Double Hashing •  increment P, not by a constant but by an amount that depends on the Key.  P = (1 + P) mod TABLE_SIZE P = (P + INCREMENT(Key)) mod TABLE_SIZE

  29. Double Hashing - Example • P = (P + INCR(Key)) mod TABLE_SIZE • Suppose INCR(Key) = 1 + (Key mod 7) • Adding 1 guarantees it is never 0! • Insert 15, 17, 8:

  30. Insert 35: • P = H(35) = 5. • P = (5 + (1 + 35 mod 7)) mod 10 = 6. • Insert 25:P = H(25) = 5. • P = (5 + (1 + 25 mod 7)) mod 10 = 0

  31. 10 9 8 7 6 5 4 3 2 1 0 Let’s try! Insert 75: P = (P + INCR(Key)) mod TABLE_SIZE Suppose INCR(Key) = 1 + (Key mod 7)

  32. Chaining/Separate Chaining • uses an array as the primary hash table • an array of lists of entries

  33. Chaining One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil. 0 1 nil 2 nil 3 4 nil 5 : Key: 9903030 name: tom score: 73 HASHMAX nil

  34. 29 16 14 99 127 129 16 127 99 29 14 29 129

  35. Coalesced Hashing is a collision resolution method that uses pointers to connect the elements of a synonym chain. • A hybrid of separate chaining and open addressing. • Linked lists within the hash table handle collisions. • This strategy is effective, efficient and very easy to • implement.

  36. A5, A2, A3 B5, A9, B2 B9, C2 Insert: A2 A2 A2 A3 A3 A3 C2 A5 A5 A5 B9 B2 B2 A9 A9 B5 B5

  37. Insert: A5, A2, A3 B5, A9,B2 B9,C2 A2 A2 A2 A3 A3 A3 A5 A5 A5 C2 A9 A9 B9 B2 B2 B5 B5

  38. Bucket Addressing using additional space A bucket can be defined as a block of space that can be used to store multiple elements that hash to the same position.

  39. Insert: A5, A2, A3, B5, A9, B2, B9 A2 B2 A3 A5 B5 A9 B9

  40. DELETION • Deleting a record must not hinder later searches. • The search process must still pass through the newly emptied slot to reach records whose probe sequence passed through this slot.  • It should not mark the slot as empty. • Freed slot should be available to a future insertion. TOMBSTONE

  41. 25 28 83 75 35 Insert: Collision Probing Sequence: 25 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 1 2 3 4 5 6 7 0 4 5 6 7 0 1 2 3 75 Delete: 35 83 Match Found! TOMBSTONE 75 28

  42. A1, A4, A2,B4,B1 A4 A2 Delete: Insert: A1 B1 A2 B1 B4 A4 B4

  43. Perfect Hash Functions • Quick to compute • Distributes keys uniformly throughout the table • Very rare(birthday paradox) • No collisions • Perfect hash functions are rare.

  44. A Perfect Hash Function for Strings • R. J. Cichelli gave an algorithm for finding perfect hash functions for strings. • He proposes the hash function: h(s)=size+g(s.charAt(0))+ g(s.charAt(size-1))%n where size = s.length(). • The function g is to be constructed so that h(s) is unique for each string s.

More Related