1 / 12

Introduction to Hashing - Hash Functions

Introduction to Hashing - Hash Functions. Sections 5.1, 5.2, and 5.6. Hashing. Data items stored in an array of some fixed size Hash table Search performed using some part of the data item key Used for performing insertions, deletions, and finds in constant average time

ina
Download Presentation

Introduction to Hashing - Hash Functions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6

  2. Hashing • Data items stored in an array of some fixed size • Hash table • Search performed using some part of the data item • key • Used for performing insertions, deletions, and finds in constant average time • Operations requiring ordering information not supported efficiently • Such as findMin, findMax

  3. Hash Table Example

  4. Hash Table Applications • Comparing search efficiency of different data structures: • Vector, list: O(N) • AVL search tree: O(log(N)) • Hash table: O(1) expected time • Compilers to keep track of declared variables • Symbol tables • Mapping from name to id • On-line spelling checkers

  5. Hash Functions • Map keys to integers (which represent table indices) • Hash(Key) = Integer • Evenly distributed index values • Even if the input data is not evenly distributed • What happens if multiple keys mapped to the same integer (same position)? • Collision management (discussed in detail later) • Collisions are likely to be reduced if keys are evenly distributed over the hash table

  6. Simple Hash Functions • Assumptions: • K: an unsigned 32-bit integer • M: the number of buckets (the number of entries in a hash table) • Goal: • If a bit is changed in K, all bits are equally likely to change for Hash(K) • So that items are evenly distributed in the hash table

  7. A Simple Function • What if • Hash(K) = K % M • Where M is of any integer value • What is wrong? • Values of K may not be evenly distributed • But Hash(K) needs to be evenly distributed • Suppose • M = 10, • K = 10, 20, 30, 40 • Then K % M = 0, 0, 0, 0, 0…

  8. Another Simple Function • If • Hash(K) = K % P, P = prime number • Suppose • P = 11 • K = 10, 20, 30, 40 • K % P = 10, 9, 8, 7 • More uniform distribution… • So hash tables often have prime number of entries

  9. A Simple Hash for Strings unsigned int Hash(const string& Key) { unsigned int hash = 0; for (int j = 0; j != Key.size(); ++j) { hash += Key[j] } return hash; } Problem: Small sized keys may not use a large fraction of a large hash table 9

  10. Another Simple Hash Function unsigned int Hash(const string& Key) { return Key[0] + 27*Key[1] + 729*Key[2]; } • Problem: English does not use random strings; so, the hash values are not uniformly distributed • Using more characters of the key can improve the hash function

  11. A Better Hash Function unsigned int Hash(const string &Key) { unsigned int hash = 0; for (int j = 0; j != Key.size(); ++j) hash = 37*hash + (Key[j]-’a’+1); return hash%TableSize; } • The for loop computes ai37n-i using Horner’s rule, where ai has the value 1 for ‘a’, 2 for ‘b’, etc • a3 + 37a2 + 372a1 + 373a0 = 37(37(37a0 + a1)+ a2) + a3 • The for implicitly performs arithmetic modulo 2k, where k is the number of bits in an unisigned int

  12. STL Hash Tables • STL extensions • hash_set • hash_map • The key type, hash function, and equality operator may need to be provided • Available in new standard as unordered set and map • <tr1/unordered_map> or <unordered_map> • <trl/unordered_set> or <unordered_set> • Example: Lec24/hashmapex.cpp • Reference • www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html

More Related