Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables

Data StructuresCSCI 132, Spring 2014Lecture 33Hash Tables

Tables with complicated index functions • Index functions are not always simple functions that compute an integer value from integer inputs. • Often, the key used for table lookup is not a number, but rather an object or string. • Example: Keys that consist of 8 character words. • Problem: There are 268 = 2 x 1011 possible arrangements of characters. There is not enough memory to contain a table with one position for each possible word. Furthermore, only a few of the table positions would be filled--it would be a sparse table.

Hash Tables • Hash tables use an index function that maps many possible keys to a single location. • If the table is sparse, then most of the time only 1 key will go to each location. • If 2 records do get assigned to the same location (a collision), we use a method for reassigning the second record (collision resolution). A hash table

The Hash Table Algorithm Insertion: 1) Calculate hash function of the key of the record to be inserted. 2) If the location is empty, insert the record there. 3) If the location contains the same record, do not insert. 4) If the location contains a different record, find a new location for insertion with collision resolution method. Retrieval: 1) Calculate the hash function of the key. 2) If the record is at that location, retrieve it. 3) Otherwise, follow collision resolution method to find the record.

Creating Hash Functions Hash functions should: 1) Be easy and quick to compute 2) Achieve an even distribution of keys across the table. Methods: Truncation Folding Modular Arithmetic

A Hash Function Example class Key: public String { public: char key_letter(int position) const; void make_blank( ); // Add constructors and other methods. }; int hash(const Key &target) { int value = 0; for (int position = 0; position < 8; position++) value = 4 * value + target.key_letter(position); return value%hash_size; }

Collision Resolution Methods: Linear Probing Quadratic probing Key dependent Increments Random probing Chaining

Chaining Chaining uses a table of linked lists. Collisions are resolved by inserting the new elements into a list at the shared location.

Advantages and disadvantages of chaining • Advantages: • Create an array of addresses rather than records. If the records are large, this saves considerable space. • Collision handling is simple--Insert colliding records into a list. • Allows more records to be stored than the size of the table. • Deletion of records is easy. • Disadvantages: • If table is full (or nearly full) there may be long lists at some key locations. This can slow down retrieval because you have to search the list for your record. • Pointers take up memory space. This may be wasteful if the records are small.

The C++ Hash Table Specification const int hash_size = 997; // a prime number of appropriate size class Hash_table { public: Hash_table( ); void clear( ); Error_code insert(const Record &new_entry); Error_code retrieve(const Key &target, Record &found) const; private: Record table[hash_size]; };

Implementation of insert( ) Error_code Hash_table :: insert(const Record &new_entry) { Error_code result = success; int probe_count, // Counter to be sure that table is not full. increment, // Increment used for quadratic probing. probe; // Position currently probed in the hash table. Key null; // Null key for comparison purposes. null.make_blank( ); probe = hash(new_entry); //Find location to insert new_entry probe_count = 0; increment = 1;

insert( ) continued //we will complete this in class. }

insert( ) continued while (table[probe] != null // Is the location empty? && table[probe] != new_entry // Duplicate key? && probe_count < (hash_size + 1)/2) { // Has overflow occurred? probe_count++; probe = (probe + increment)%hash_size; increment += 2; // Prepare increment for next iteration. } if (table[probe] == null) table[probe] = new_entry; // Insert new entry. else if (table[probe] == new_entry) result = duplicate_error; else result = overflow; // The table is full. return result; }

Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables