1 / 13

Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables. Recall Hash Tables. Hash tables use an index function that maps many possible keys to a single location. If the table is sparse, then most of the time only 1 key will go to each location.

Download Presentation

Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data StructuresCSCI 132, Spring 2014Lecture 34Analyzing Hash Tables

  2. Recall Hash Tables • Hash tables use an index function that maps many possible keys to a single location. • If the table is sparse, then most of the time only 1 key will go to each location. • If 2 records do get assigned to the same location (a collision), we use a method for reassigning the second record (collision resolution). A hash table

  3. The C++ Hash Table Specification const int hash_size = 997; // a prime number of appropriate size class Hash_table { public: Hash_table( ); void clear( ); Error_code insert(const Record &new_entry); Error_code retrieve(const Key &target, Record &found) const; private: Record table[hash_size]; };

  4. Implementation of insert( ) Error_code Hash_table :: insert(const Record &new_entry) { Error_code result = success; int probe_count, // Counter to be sure that table is not full. increment, // Increment used for quadratic probing. probe; // Position currently probed in the hash table. Key null; // Null key for comparison purposes. null.make_blank( ); probe = hash(new_entry); //Find location to insert new_entry probe_count = 0; increment = 1;

  5. insert( ) continued while (table[probe] != null // Is the location empty? && table[probe] != new_entry // Duplicate key? && probe_count < (hash_size + 1)/2) { // Has overflow occurred? probe_count++; probe = (probe + increment)%hash_size; increment += 2; // Prepare increment for next iteration. } if (table[probe] == null) table[probe] = new_entry; // Insert new entry. else if (table[probe] == new_entry) result = duplicate_error; else result = overflow; // The table is full. return result; }

  6. Likelihood of collisions • How many people have to be in a room before the probability that two of them have the same birthday reaches 50%? • P = (1 - (364/365)*(363/365)*(362/365)* ...*(365-m+1)/365 > 0.5 • when m >= 23 • The calculation for a probability of a collision in a table is similar. • The table does not have to be very full for the probability of a collision to reach at least 50%. • Therefore: Collisions happen! We must handle them efficiently.

  7. Counting Probes • We can analyze the running time of hash tables by counting comparisons. • Comparisons take place when "probing" an entry: Looking at an entry and comparing its key to a target. • The number of probes done depends on how full the table is. • n = number of entries in the table • t = number of total positions in table (= hash_size) • l = n/t = Load Factor • l = 0 means no entries in table • l = 0.5 means the table is 1/2 full • l <= 1 for contiguous table without chaining (open addressing) • l can be greater than 1 if using chaining

  8. Number of comparisons for chaining • Unsuccessful searches: • If entries distributed evenly over the table, then the expected number of entries in each chain is: n/t = l. • For an unsuccessful search, we must do one probe for each entry in the list, so the average number of probes (or comparisons) is l. • Successful searches: • Average number of comparisons for sequential search of a list with k items is: (k + 1)/2 • The node we are looking for is in our list, the other n-1 nodes are distributed evenly over the table so the average number of nodes will be: k = (n-1)/t + 1 ~ n/t + 1 = l + 1. • Average number of comparisons will be (l + 1 + 1)/2 = l/2 + 1

  9. Open addressing (without chaining) Evenly distributed entries, Random probing: Number of Comparisons (approx) Successful case: (1/l)ln(1/(1-l)) Unsuccessful case: 1/(1 - l) Linear Probing: Successful case: 0.5(1 + 1/(1-l) ) Unsuccessful case: 0.5(1 + 1/(1-l)2 )

  10. Theoretical and empirical results

  11. Hash Tables vs. Other Methods • Speed of retrieval from a hash table does not depend on the total number of entries, but on the ratio of entries/table-size (l). • A table of size 40 with 20 entries has the same performance as a table of size 4000 with 2000 entries. Sequential Search: Q(n) Binary Search: Q( lg(n)) Hash Table retrieval: O (1) for small l. • Read section 9.8 on choosing a method for storage and retrieval of data.

  12. Radix sort Radix sort creates a table of queues. Each queue corresponds to a letter of the alphabet. Sort from least significant letter to most significant letter.

  13. Implementation of Radix Sort const int key_size = 5; const int max_chars = 28; template <class Record> void Sortable_list<Record> :: radix_sort( ) { Record data; Queue queues[max_chars]; for (int position = key_size - 1; position >= 0; position--) { // Loop from the least to the most significant position. while (remove(0, data) == success) { int queue_number = alphabetic_order(data.key_letter(position)); queues[queue_number].append(data); // Queue operation. } rethread(queues); // Reassemble the list. } }

More Related