1 / 19

Hashing II – Other Probing Methods

Hashing II – Other Probing Methods We have seen that one of the drawbacks to linear probing is clustering. We can attempt to avoid clustering by using different probing techniques. In the following discussion, we assume that the table has size s.

Download Presentation

Hashing II – Other Probing Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing II – Other Probing Methods We have seen that one of the drawbacks to linear probing is clustering. We can attempt to avoid clustering by using different probing techniques. In the following discussion, we assume that the table has size s. Probing every Rth location: if a collision occurs at location h, we could probe every Rth location: h, h + R, h + 2R, h + 3R, …, h + (s – 1)*R (all mod s)

  2. Choosing a table size that is prime In order to guarantee that we end up probing every location in the table, the table size must not be divisible by R. Think about what would happen if s was 100, h was 6 and R was 2 – if a collision occurred, we would probe locations 6, 8, 10, …, 100, 2, 4, 6, … we never probe locations with an odd index. We can guarantee that this will not occur if s is a prime number.

  3. Quadratic Probing: if a collision occurs at location h, we probe locations in the following order: h, h + 12, h + 22, h + 32, …, h + ( s – 1 ) 2 (all mod s ) until we find an empty slot. How can we avoid the possibility of probing the same set of locations when using quadratic probing? Theorem: If quadratic probing is used and the size of the table is a prime number, then a new element can always be inserted provided the table is at least half empty. Under these conditions, during the course of insertion, no location is probed more than once.

  4. Random Probing: we first generate a sequence of random numbers (which are saved so that they can be used when retrieving values from the table – later we will need to know in what order to probe the table in order to find the item we are looking for): r1, r2, …, rs-1 then, if a collision occurs at location h, we probe the locations: h + r1, h + r2, …, h + rs-1 (all mod s) until we find an empty slot.

  5. Rehashing: we define a number of hash functions: f1, f2, …, fm If a collision occurs with key k, rather than probing to find an empty slot, we apply different hash functions until we find an empty slot: f1(k), f2(k), …, fm(k) In all of the above methods, care must be taken to ensure that our collision resolution strategy attempts to probe all available locations in the array.

  6. All of the strategies we have examined so far fall into the category of open address hashing – if the hash index that we first compute leads to a collision, we attempt to insert the value at another index (address) in the array.

  7. Chaining or Closed Address Hashing When a collision occurs, rather than probing for an empty location somewhere else in the array, we can use a chained hash table. Such a table is a table of linked lists. When we want to insert a key/value pair into the table, we use a hash function just as before in order to determine a hash index. We then insert the key/value pair into the linked list pointed to by the given hash index. Each list is referred to as a chain or bucket. Suppose we have 5 digit key values and a table of size 5 and that we create a hash function that takes the middle 3 digits of the key mod 5 in order to generate a hash code. Now suppose that we insert values with the corresponding keys into the hash table: 12540, 51288, 90100, 41233, 54991, 45329, 14236

  8. [0] [1] [2] [3] [4] Suppose we have 5 digit key values and a table of size 5 and that we create a hash function that takes the middle 3 digits of the key mod 5 in order to generate a hash code. Now suppose that we insert values with the corresponding keys into the hash table: 12540, 51288, 90100, 41233, 54991, 45329, 14236. Our hash table will look as follows:

  9. Now when we search for an item, we apply the hash function to its key in order to get a hash index and then we perform a linear search of the linked list. If the item is not in the list, then it’s not in the table. If the table size is sufficient and we have a perfect hash function, each chain will have length 1 giving O(1) access. In practice, collisions will occur leading to longer chains.

  10. Advantages of chaining: • most efficient hashing technique in general- each cluster is a separate list so secondary clustering cannot occur- size of table does not have to be twice the number of items- deletion is simple – no need to mark slots as empty, used or dirty- collision handling is simpleDisadvantages of chaining: • if the key/value pair is very small, the overhead of pointers in the linked list can be quite large

  11. A thought on chaining: • is there another organization other than a linked-list that we can use for each bucket?

  12. Performance of Hashing • The performance of hashing depends on: • the quality of the hash function- the collision resolution algorithm- the available space in the hash tableTo estimate how full a table is, we can calculate the load factor,  where:  = ( # entries in table ) / ( table size ) • hence for open address hashing 0 <=  < 1 but for closed address hashing 0 <=  < .

  13. D. Knuth has estimated the expected number of probes for a successful search as a function of the load factor, :

  14. # of probes linear probing 6 random probing chaining 1 load factor 1 Graphically:

  15. Implementation Details We will briefly examine some of the implementation details of theMapHashclass. This class uses open address hashing with linear probing for collision resolution. The table in this case is an array of structs declared as: enum Entry_flag { EMPTY, FULL, DIRTY };template <class Key, class Value, class Hasher>struct MapLinearHashEntry{ Entry_flag flag; Key key; Value value;}; Note that thestructis templatized and that it has three template parameters. The first two represent the data type of the key and value.

  16. The last template parameter represents a class that simply defines the hash function – hence the name hasher. In order to make our MapHash class as re-usable as possible, we do not want to hard-wire a particular hash function into the code – we’d like the client to be able to specify the hash function. We have already seen how one function can be passed to another as a parameter – recall our traverse functions for traversing a linked list or binary tree: void traverse( node* head, void (*visit)( node* ) ); Here we are seeing a different mechanism for passing a function – it will be passed as a public member function of a class. The class will have no data members associated with it at all – it’s only purpose is to define the hash function.

  17. Such an object is called a function object or functor. In the case of thehasherclass, the class will declare a single public member function namedhash. Here’s an example: class MyHash{ public: int hash( const string& someKey ) const // Post: returns a hash value created from someKey { int len = someKey.size(); int value = 0; while( len > 0 ) { len--; value = value + int( someKey[ len ] ); } return value; }};

  18. We can now declare an instance of theMapHashclass: MapHash< string, Employee, MyHash > myTable; The MapHashclass will have a private data member of typeHasher: Hasher theHasher; Any of the member functions of theMapHashclass can now generate a hash index from a key value as follows: hashCode = theHasher.hash( someKey ); We now have a very convenient way of passing functions to a class via a template parameter. Note that the same mechanism can also be used to pass functions to other functions…

  19. Suppose we want to traverse a linked list and print out the data in every node: class Visitor{public: void visit( Node* current ) // Post: data in current node printed on screen { cout << current->data << endl; }}; Our traverse function now looks like:void traverse( Node* head, Visitor functor ){ for( Node* cursor = head; cursor != NULL; cursor = cursor->next ) functor.visit( cursor );}

More Related