1 / 81

Map ADT, Dictionary ADT, and Hash Tables

Chapter 9. Map ADT, Dictionary ADT, and Hash Tables. Map. Container that stores key-value pairs called entries Entries must be stored in a way that allows them to be located with the key “Mapping” refers to the association between a key and a value Not necessary to store the entries in order

Download Presentation

Map ADT, Dictionary ADT, and Hash Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 9 Map ADT, Dictionary ADT,and Hash Tables

  2. Map Container that stores key-value pairs called entries Entries must be stored in a way that allows them to be located with the key “Mapping” refers to the association between a key and a value Not necessary to store the entries in order Unordered dictionary Ordered dictionary Map (Goodrich, 364–365)‏

  3. Map ADT How are the data related? Stored by an identifier called a “key” Operations int size()‏ boolean isEmpty()‏ value get( key )‏ value put( key, value )‏ value remove( key )‏ coll keys()‏ coll values()‏ coll entries()‏ Map (Goodrich, 369)‏

  4. Hash Table An implementation of the map ADT An array in which each entry’s key is mapped to its location by a hash function Keys are not ordered The most efficient way to implement a map Expected case efficiency = O(1)‏ Map (Goodrich, 372–388; Dale, 753)‏

  5. A Simple Hash Table Bucket array Each cell in the array is a “bucket” that holds an entry The array index == the entry key Example Small company with less than 100 employees Each employee has an ID number in the range 0–99 Store employee records in an array, so that the employee ID number matches the array index EMPTY 01 Turing, A. 02 Babbage, C. EMPTY 04 Gates, W. Map (Goodrich, 372; Dale, 620)‏ 0 1 2 3 4 A //More array cells… What’s the Big-Oh for myHash.put(k,v)?

  6. Searching in a Hash Table Look at the key to discover the location in the array where the value is (or where it should be)‏ Go directly there Ideally, we never need more than one comparison Map (Goodrich, 372–388)‏

  7. Hash Function Function that maps each key to a number within the range of the array index Example Small company with less than 100 employees Already uses a 5-digit ID number EMPTY 55301 Turing, A. 81202 Babbage, C. EMPTY 77404 Gates, W. Map (Goodrich, 373; Dale, 620)‏ 0 1 2 3 4 A //More array cells…

  8. Hash Function Function that maps each key to a number within the range of the array index Example Small company with less than 100 employees Already uses a 5-digit ID number EMPTY 55301 Turing, A. 81202 Babbage, C. EMPTY 77404 Gates, W. Map (Goodrich, 373; Dale, 620)‏ A simple hash function for this example is ( ID % 100 )‏ int hashValue( int key ){ return (key % 100); } 0 1 2 3 4 A //More array cells…

  9. Collisions Two different entries are mapped to the same bucket Best approach – minimize collisions by picking a good hash function Example Previous hash function of ( ID % 100 ) is too likely to cause collisions EMPTY 55301 Turing, A. 81202 Babbage, C. EMPTY 77404 Gates, W. Map (Goodrich, 379–383; Dale, 620)‏ ! 38104 McNealy, S. 0 1 2 3 4 A //More array cells…

  10. Hash Function Actions Desirable qualities – minimize collisions, fast computation Separate these two actions: Map the key to an integer called the “hash code” Map the hash code to an integer in the range of the array index numbers (This step is called the “compression function”)‏ Map (Goodrich, 372)‏

  11. Hash Code Some integer that is mapped to the entry’s key Does not have to be in the range of the array index numbers Must avoid collisions as much as possible Different methods are used to calculate hash codes for keys with different data types Goal = calculate the hash code with a function that derives a unique hash code and no two different keys will map to the same one Map (Goodrich, 374–377)‏

  12. Hash Code for “int” Keys Return the int value Map (Goodrich, 374)‏ public int hashCode( int key ){ return key; }

  13. Hash Code for“char” or “byte" or “short” Keys Cast it to an int Return the hash code for the int type Map (Goodrich, 374)‏ public int hashCode( int key ){ return key; } public int hashCode( char key ){ return hashCode( int(key) ); }

  14. Hash Code for “float” Keys For hashing, treat the number as a series of bits IEEE 754 single precision float is 32 bits Convert the 32 bits to an int, which is also 32 bits Return the hash code for the int type Map (Goodrich, 374)‏ public int hashCode( float key ){ int i = Float.floatToIntBits(key); return hashCode( i ); }

  15. Hash Code for“long” or “double” Keys The long int has twice as many bits as the int datatype, e.g. 32 bits for int, 64 bits for long Treat the high-order bits as an integer and the low-order bits as an integer, then sum them Map (Goodrich, 375)‏ public int hashCode( long key ){ int i = (int)( key >> 32 ) + (int)key; return hashCode( i ); } • The double also has 32 bits, so convert it totwo ints

  16. Hash Code for string Keys One approach is to sum the ASCII values of all the chars in the string Problem: too many collisions because many different words will have the same result For example, stop, tops, pots, spot Map (Goodrich, 375–376)‏ ASCII s = 115 t = 116 o = 111 p = + 112 Hashcode = 454

  17. Polynomial Hash Code Better approach for string keys Modify each char’s ASCII value by a number based on its position in the string Then sum the results Where x represents a char, k is the total number of chars, and a is a constant (but not 1), the following formula can be used: x0ak-1 + x1ak-2 + … + xk-2a + xk-1 s = 115 * 103 = 115000 t = 116 * 102 = 11600 o = 111 * 101 = 1110 p = 112 * 100 = + 112 Hashcode = 127822 Example, assume that the string is “stop” and a = 10 Map (Goodrich, 375–376)‏

  18. Cyclic Shift Hash Code Used for string keys Variation of the polynomial hash code Instead of multiplying by the constant a, it uses a cyclic bit shift Number of collisions in experiments – p. 376 Map (Goodrich, 375–376)‏ public int hashCode( String s ){ int h = 0; for( int i=0; i<length(); ++i ){ h = (h << 5) | (h >>> 27); h += (int)s.charAt(i); } return hashCode( int(h) ); }

  19. Cyclic Shift Details(Just for fun)‏ “Mixes up” the bits in a systematic way 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 0 1 0 1 0 0 Map (Goodrich, 375–376)‏ public int hashCode( String s ){ int h = 0; for( int i=0; i<length(); ++i ){ h = (h << 5) | (h >>> 27); h += (int)s.charAt(i); } return hashCode( int(h) ); } Assume that the key is “stop” h + ‘s’ = h >>> 27 = h << 5 = bitwise OR h + ‘t’ =

  20. Compression Functions The second hash function action Once we have a hash code, we need to map it to an integer in the range of array index numbers Two methods Division method MAD method Map (Goodrich, 377–378)‏

  21. Division Method private int k = hashCode( key ); private int index = Math.abs(k) % CAPACITY; The array capacity should be a prime number Reduces the number of collisions Spreads out the distribution of hashed values Example Keys = {200,210,220,230,…,600} Array size = 100 //i.e., a non-prime number Produces three collisions for each hash code Map (Goodrich, 378)‏

  22. MAD Method private int k = hashCode( key ); private int i = (Math.abs(a * k + b) % prime) % CAPACITY; MAD stands for “Multiply, Add, and Divide” a and b are non-negative integers (a % CAPACITY) must not be 0 a and b are chosen at random when the program is written May be more effective at avoiding collisions Map (Goodrich, 378)‏

  23. Collisions Two different keys are mapped to the same location in the array Best approach – minimize collisions by picking a good hash function Example A bad hash function is ( key % 100 ) because it is too likely to cause collisions Map (Goodrich, 379–383; Dale, 620)‏

  24. Collision Handling Two different approaches: Separate chaining Open addressing (Does not prevent collisions! Just handles them when they occur.)‏ Map (Goodrich, 379–383)‏

  25. Separate Chaining Each location in the hash table holds a reference variable that represents a list Each list can hold many items As long as the hash function is good, the lists will be small because there will be few collisions What is the Big-Oh? Map (Goodrich, 379–383)‏

  26. Separate Chaining Example Map (Goodrich, 379–383)‏ A 0 1 41 next 28 next 54 next null 2 3 4 18 next null 5 6 7 8 9 36 next 10 next null 10 11 90 next 12 next 38 next 25 next null 12

  27. Separate Chaining – Algorithms Assume the hash table is implemented as an array named myData Assume that myData is an array of NodeList objects: private NodePositionList<Entry<Integer,Integer>>[] myData = (NodePositionList<Entry<Integer,Integer>>) new Object[11]; What are the pseudocode algorithms for the following Map operations? put( k, e )‏ get( k )‏ remove( k )‏ Map (Goodrich, 232–233, 379)‏

  28. Separate Chaining – put(k,v)‏ //Given: private int hash(int k, int cap){ return (k % cap); } public static final int CAPACITY = 11; private NodePositionList<Entry<Integer,Integer>>[] myData = (NodePositionList<Entry<Integer,Integer>>) new Object[11]; Map (Goodrich, 232–233, 379)‏

  29. Separate Chaining – put(k,v)‏ //Given: private int hash(int k, int cap){ return (k % cap); } public static final int CAPACITY = 11; private NodePositionList<Entry<Integer,Integer>>[] myData = (NodePositionList<Entry<Integer,Integer>>) new Object[11]; Map (Goodrich, 232–233, 379)‏ Algorithm put( k, v )‏ In: an integer key and a value Out: void Get the index of the location in the hash table Add the item in the list at that location

  30. Separate Chaining – put(k,v)‏ //Given: private int hash(int k, int cap){ return (k % cap); } public static final int CAPACITY = 11; private NodePositionList<Entry<Integer,Integer>>[] myData = (NodePositionList<Entry<Integer,Integer>>) new Object[11]; Map (Goodrich, 232–233, 379)‏ Algorithm put( k, v )‏ In: an integer key and a value Out: void Get the index of the location in the hash table Add the item in the list at that location int i = hash(k,CAPACITY)‏ myData[i].addLast( new Entry<Integer,Integer>(k,v) )‏ n++ //keep track of the size

  31. Separate Chaining – put(k,v)‏ //Given: private int hash(int k, int cap){ return (k % cap); } public static final int CAPACITY = 11; private NodePositionList<Entry<Integer,Integer>>[] myData = (NodePositionList<Entry<Integer,Integer>>) new Object[11]; Map (Goodrich, 232–233, 379)‏ Algorithm put( k, v )‏ In: an integer key and a value Out: void Get the index of the location in the hash table Add the item in the list at that location int i = hash(k,CAPACITY)‏ myData[i].addLast( new Entry<Integer,Integer>(k,v) )‏ n++ //keep track of the size What’s the Big-Oh?

  32. Separate Chaining – get(k)‏ //Given: private int hash(int k, int cap){ return (k % cap); } public static final int CAPACITY = 11; private NodePositionList<Entry<Integer,Integer>>[] myData = (NodePositionList<Entry<Integer,Integer>>) new Object[11]; Map (Goodrich, 232–233, 379)‏ Algorithm get( k )‏ In: an integer key Out: a value ? What’s the Big-Oh?

  33. Separate Chaining – remove(k)‏ //Given: private int hash(int k, int cap){ return (k % cap); } public static final int CAPACITY = 11; private NodePositionList<Entry<Integer,Integer>>[] myData = (NodePositionList<Entry<Integer,Integer>>) new Object[11]; Map (Goodrich, 232–233, 379)‏ Algorithm remove( k )‏ In: an integer key Out: void ? What’s the Big-Oh?

  34. Open Addressing Always store one entry per location in the array Collision resolution methods Linear probing Quadratic probing Double hashing Map (Goodrich, 381–383)‏

  35. Linear Probing If a bucket is already occupied, then try the next available bucket EMPTY 55301 Turing, A. 81202 Babbage, C. EMPTY 77404 Gates, W. Map (Goodrich, 381–383)‏ ! 38104 McNealy, S. 0 1 2 3 4 A //All other cells filled…

  36. Linear Probing If a bucket is already occupied, then try the next available bucket EMPTY 55301 Turing, A. 81202 Babbage, C. EMPTY 77404 Gates, W. 38104 McNealy, S. 55301 Turing, A. 81202 Babbage, C. EMPTY 77404 Gates, W. Map (Goodrich, 381–383)‏ ! 38104 McNealy, S. 0 1 2 3 4 A //All other cells filled… 0 1 2 3 4 A //All other cells filled…

  37. Linear Probing – put(k,v)‏ If a location is already occupied, then try the next available location Example: hash(key) is implemented as key % 11 Put the following keys into hash table A {13,26,5,37,16,21,15} 0 1 2 3 4 5 6 7 8 9 10 A Map (Goodrich, 381–383)‏ 13

  38. Linear Probing – put(k,v)‏ If a location is already occupied, then try the next available location Example: hash(key) is implemented as key % 11 Put the following keys into hash table A {13,26,5,37,16,21,15} 0 1 2 3 4 5 6 7 8 9 10 A Map (Goodrich, 381–383)‏ 13 26 5

  39. Linear Probing – put(k,v)‏ If a location is already occupied, then try the next available location Example: hash(key) is implemented as key % 11 Put the following keys into hash table A {13,26,5,37,16,21,15} 0 1 2 3 4 5 6 7 8 9 10 A Map (Goodrich, 381–383)‏ 13 26 5 37

  40. Linear Probing – put(k,v)‏ If a location is already occupied, then try the next available location Example: hash(key) is implemented as key % 11 Put the following keys into hash table A {13,26,5,37,16,21,15} 0 1 2 3 4 5 6 7 8 9 10 A Map (Goodrich, 381–383)‏ 13 26 5 37 16 15 21

  41. Linear Probing – put(k,v)‏ If a location is already occupied, then try the next available location Example: hash(key) is implemented as key % 11 Put the following keys into hash table A {13,26,5,37,16,21,15} 0 1 2 3 4 5 6 7 8 9 10 A Map (Goodrich, 381–383)‏ 13 26 5 37 16 15 21 What’s the Big-Oh?

  42. Algorithm put( k, v )‏ In: an integer key and a value Out: void Get the index of the location in the hash table int index = hashCode(k) % CAPACITY loop examine location: if it’s not used, insert an entry there else try the next location Linear Probing – put(k,v)‏ 0 1 2 3 4 5 6 7 8 9 10 A 13 26 5 37 16 15 21 Map (Goodrich, 381–383)‏

  43. Algorithm put( k, v )‏ In: an integer key and a value Out: void Get the index of the location in the hash table int index = hashCode(k) % CAPACITY loop examine location: if it’s not used, insert an entry there else try the next location Linear Probing – put(k,v)‏ 0 1 2 3 4 5 6 7 8 9 10 A 13 26 5 37 16 15 21 Map (Goodrich, 381–383)‏ myData[index] = new Entry<Integer,Integer>(k,v)‏ sz++ break index = (index + 1) % CAPACITY

  44. Algorithm put( k, v )‏ In: an integer key and a value Out: void Get the index of the location in the hash table int index = hashCode(k) % CAPACITY loop examine location: if it’s not used, insert an entry there else try the next location Linear Probing – put(k,v)‏ 0 1 2 3 4 5 6 7 8 9 10 A 13 26 5 37 16 15 21 Map (Goodrich, 381–383)‏ int start = index while index != start myData[index] = new Entry<Integer,Integer>(k,v)‏ sz++ break index = (index + 1) % CAPACITY

  45. Linear Probing – get(k)‏ Look at the key to discover the location in the array where the entry should be If the location is empty, assume the entry is not there Else compare the key at that location with the target key, if they match then the entry is found, else compare with the key at the next location Examples: get( 13 )‏ get( 12 )‏ get( 15 )‏ 0 1 2 3 4 5 6 7 8 9 10 A Map (Goodrich, 381–383)‏ 13 26 5 37 16 15 21 What’s the Big-Oh?

  46. Linear Probing – get(k)‏ Algorithm get( k )‏ In: an integer key Out: value Get the index of the location in the hash table int i = hashCode(k) % CAPACITY int start = i loop examine location: if it’s empty return null else if ( myData[i].key == k )‏ return myData[i].getValue()‏ else try the next location i = (i + 1) % CAPACITY while i != start 0 1 2 3 4 5 6 7 8 9 10 A Map (Goodrich, 381–383)‏ 13 26 5 37 16 15 21

  47. Linear Probing – remove(k)‏ findEntry(k)‏ Look at the key to discover the location in the array where the item should be If the location is empty, assume the entry is not there Else compare key with the target, if not found then compare with key at next location When the target is found, remove the entry Examples: remove( 5 )‏ 0 1 2 3 4 5 6 7 8 9 10 A Map (Goodrich, 381–383)‏ 13 26 5 37 16 15 21

  48. Linear Probing – remove(k)‏ findEntry(k)‏ Look at the key to discover the location in the array where the item should be If the location is empty, assume the entry is not there Else compare key with the target, if not found then compare with key at next location When the target is found, remove the entry Examples: remove( 5 )‏ 0 1 2 3 4 5 6 7 8 9 10 A Map (Goodrich, 381–383)‏ 13 26 37 16 15 21

  49. Linear Probing – remove(k)‏ findEntry(k)‏ Look at the key to discover the location in the array where the item should be If the location is empty, assume the entry is not there Else compare key with the target, if not found then compare with key at next location When the target is found, remove the entry Examples: remove( 5 )‏ 0 1 2 3 4 5 6 7 8 9 10 A Map (Goodrich, 381–383)‏ Problem: What happens when we search for 16 now? 13 26 37 16 15 21

  50. Linear Probing – Using Lazy Deletes Problem: If the findEntry() operation is looking for a key, it stops looking when it gets to an empty location and assumes the key isn’t there If multiple items with the same key are stored in the hash table with linear probing and then one of them is deleted, a “hole” is created, and findEntry() might stop prematurely We need to shift all the elements with the same key so that the hole is filled! 0 1 2 3 4 5 6 7 8 9 10 A Map (Goodrich, 381–383)‏ 13 26 37 16 15 21

More Related