1 / 31

WEEK 1 Hashing Part I

WEEK 1 Hashing Part I. CE222 – Data Structures & Algorithms II Chapter 5.1-5.3 (based on the book by M. A. Weiss, Data Structures and Algorithm Analysis in C++, 3rd edition, 2006). GOAL. Develop a structure that will allow users to insert / delete / find records in

Download Presentation

WEEK 1 Hashing Part I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WEEK 1 Hashing Part I CE222 – Data Structures & Algorithms II Chapter 5.1-5.3 (based on the book by M. A. Weiss, Data Structures and Algorithm Analysis in C++, 3rd edition, 2006)

  2. GOAL • Develop a structure that will allow users toinsert / delete / find records in constantaveragetime (e.g O(1)) • Structurewill be a table (relatively small) • Table completely contained in memory • Implementedby an array • Capitalizeson ability to access any element ofthe array in constant time CE 222-Data Structures & Algorithms II, Izmir University of Economics

  3. General Idea • A stored item needs to have a data member, called key, that will be used in computing the index value for the item. • Key could be an integer, a string, etc • e.g. a name or Id that is a part of a large employee structure • If the size of the array is N, the items that are stored in the hash table are indexed by values from 0 to N – 1. • Each key is mapped into some number in the range 0 to N – 1. • The mapping is called a hash function. CE 222-Data Structures & Algorithms II, Izmir University of Economics

  4. Example Hash Table 0 1 Items Hash Function 2 linda 25000 3 linda 25000 key joe 31250 4 joe 31250 5 dave 27500 6 mary 28200 7 mary 28200 8 dave 27500 key 9 CE 222-Data Structures & Algorithms II, Izmir University of Economics

  5. Hash Function • Determines position of keys in the array (Maps items to cells in array) • The hash function: • must be simple to compute. • must distribute the keys evenly among the cells. • If all the keys are known, then it is possible to write perfect hash functions !!  not possible CE 222-Data Structures & Algorithms II, Izmir University of Economics

  6. An example 1/2 • Assume that keys are non-negative integers between 0 and MAX_INT and table size N is 5. x  key hash(x)  hashing function hash(x)= x mod(N)  hash(x)=x%5 CE 222-Data Structures & Algorithms II, Izmir University of Economics

  7. An example 2/2 hash(x)= x mod(N)  hash(x)=x%5 Assumethatkeysare 23,14, 25, 46 82 in order. Steps : 1. hash(23)=23%5=3 2. hash(14)=14%5=4 3. hash(25)=25%5=0 4. hash(46)=46%5=1 5. hash(82)=82%5=2 CE 222-Data Structures & Algorithms II, Izmir University of Economics

  8. Hash Functions Problems: • Keys may not be numeric. • Number of possible keys is much larger than the space available in table. • Different keys may map into same location (What happens if keys are 25, 30, and 40 in the previous example? ) • Hash function is not one-to-one => collision. • If there are too many collisions, the performance of the hash table will suffer dramatically. CE 222-Data Structures & Algorithms II, Izmir University of Economics

  9. Hash Functions • If the input keys are integers then simply key mod TableSizeis ageneral strategy. • Unless key happens to have some undesirable properties.  Make Table size a prime !!! (Assume that table size is 10 and all keys=10*i ? ) CE 222-Data Structures & Algorithms II, Izmir University of Economics

  10. Hash Functions • If the input keys are strings then hash function needs to convert keys into a numeric value. • How to convert a string to a numeric value ? • Use ASCII codes of chars (127 different chars) CE 222-Data Structures & Algorithms II, Izmir University of Economics

  11. Hash Function for Strings 1 • Add up the ASCII values of all characters of the key Example : tableSize= N andkey =“john” hashVal= 106 + 111 + 104 +110 = 431 index= 431%N int hash(const string &key, int tableSize) { int hasVal = 0; for (int i = 0; i < key.length(); i++) hashVal += key[i]; return hashVal % tableSize; } CE 222-Data Structures & Algorithms II, Izmir University of Economics

  12. Hash Function for Strings 1 int hash(const string &key, int tableSize) { int hasVal = 0; for (int i = 0; i < key.length(); i++) hashVal += key[i]; return hashVal % tableSize; } • Easytoimplement!! • However, if the table size is large, the function does not distribute the keys well. e.g. Table size =10000, key length <= 8, the hash function can assume values only between 0 and 8*127=1016 CE 222-Data Structures & Algorithms II, Izmir University of Economics

  13. Hash Function for Strings 2 • Examine only the first 3 characters of the key. • In English we have 26 different letters int hash (const string &key, int tableSize) { return (key[0]+27 * key[1] + 272*key[2]) % tableSize; } • In theory, 26 * 26 * 26 = 17576 different combinations(ignoring blanks) can be generated. However, English is not random, only 2851 different combinations are possible. • Thus, this function although easily computable, is also not appropriate if the hash table is reasonably large. • e.g TableSize=10007 without any collisions 28.4% (2851/10007) of table can be hashed to. CE 222-Data Structures & Algorithms II, Izmir University of Economics

  14. Hash Function for Strings 3 int hash (const string &key, int tableSize) { int hashVal = 0; for (int i = 0; i < key.length(); i++) hashVal = 37 * hashVal + key[i]; hashVal %=tableSize; if (hashVal < 0) /* in case overflows occurs */ hashVal += tableSize; return hashVal; }; CE 222-Data Structures & Algorithms II, Izmir University of Economics

  15. Hash function for Strings 3 key[i] 108 105 98 key a l i i 0 1 2 KeySize = 3; TableSize=10007 // hashVal =0; // for (int i = 0; i < key.length(); i++) // hashVal = 37 * hashVal + key[i]; hashVal=0; hashVal=37*0 +key[0]; // 0+98 hashVal=37*98 +key[1]; // 37*98+108 hashVal=37*(37*98+108)+key[2]; // 37*37*98+37*108+105 hash(“ali”) = (105 * 1 + 108*37 + 98*372) % 10,007 = 8172 CE 222-Data Structures & Algorithms II, Izmir University of Economics

  16. Hash Function : Collision • Let hash(x) = x % 15 • Then, • if x = 25 129 35 2501 47 36 • hash(x) = 10 9 5 11 2 6 • Storing the keys in the array is straightforward: • Thus, delete and find can be done in O(1), andalso insert, except… CE 222-Data Structures & Algorithms II, Izmir University of Economics

  17. Hash Function : Collision • What happens when you try to insert: x = 65 ? • x = 65 • hash(x) = 5 ? • If, when an element is inserted, it hashes to the same value as an already inserted element, this is called a collision. CE 222-Data Structures & Algorithms II, Izmir University of Economics

  18. Handling Collisions • Separate Chaining • Open Addressing • Linear Probing • Quadratic Probing • Double Hashing CE 222-Data Structures & Algorithms II, Izmir University of Economics

  19. Separate Chaining • The idea is to keep a list of all elements that hash to the same value. • The array elements are pointers to the first nodes of the lists. • A new item is inserted to the front of the list. • Advantages: • Better space utilization for large items. • Simple collision handling: searching linked list. • Overflow: we can store more items than the hash table size. • Deletion is quick and easy: deletion from the linked list. CE 222-Data Structures & Algorithms II, Izmir University of Economics

  20. 0 0 1 81 1 2 3 4 64 4 5 25 6 36 16 7 8 9 49 9 Separate Chaining Example Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81 hash(key) = key % 10. CE 222-Data Structures & Algorithms II, Izmir University of Economics

  21. Separate Chaining : Operations • Initialization: all entries are set to NULL • Find: • locate the cell using hash function. • sequential search on the linked list in that cell. • Insertion: • Locate the cell using hash function. • (If the item does not exist) insert it as the first item in the list. • Deletion: • Locate the cell using hash function. • Delete the item from the linked list. CE 222-Data Structures & Algorithms II, Izmir University of Economics

  22. Separate Chaining: Disadvantages • Parts of the array might never be used. • As chains get longer, search time increases to O(N) in the worst case. • Constructing new chain nodes is relatively expensive (still constant time, but the constant is high). • Is there a way to use the “unused” space in the array instead of using chains to make more space?(Later!) CE 222-Data Structures & Algorithms II, Izmir University of Economics

  23. Analysis of Separate Chaining • Collisions are very likely. • How likely and what is the average length of lists? • Load factor l definition: • Ratio of number of elements (N) in a hash table to the hash TableSize. • i.e. l = N/TableSize • The average length of a list is also l. • For chaining l is not bounded by 1; it can be > 1. CE 222-Data Structures & Algorithms II, Izmir University of Economics

  24. Cost of searching 1/4 Search Time(or Cost) = Time to evaluate hash function + the time to traverse the list • Search can be • either unsuccessful or successful? CE 222-Data Structures & Algorithms II, Izmir University of Economics

  25. Cost of searching 2/4 Unsuccessful search: • We have to traverse the entire list, so we need to compare l nodes on the average CE 222-Data Structures & Algorithms II, Izmir University of Economics

  26. Cost of searching 3/4 • Successful search: • Successful search time to traverse the list = the node searched + half the expected # of other nodes in the list) • N=# of elements; M= Number of Lists (TableSize) • Expected # of other nodes = (N-1)/M = l -1/M (which is essentially l, since M is presumed large) • On the average, we need to check half of the other nodes while searching for a certain element • Thus average search cost = 1 + l /2 CE 222-Data Structures & Algorithms II, Izmir University of Economics

  27. Cost of searching 4/4 Observation: Table size is not important but load factor is.  For separate chaining make λ ~ 1 CE 222-Data Structures & Algorithms II, Izmir University of Economics

  28. How to implement Hashing ?EXAMPLE CE 222-Data Structures & Algorithms II, Izmir University of Economics

  29. Implementation :Example p1/3 class Node { public : int key; // EASY  put all members to public Node(int a) {key=a; next=NULL;} Node * next; }; class List { public : Node * head; List() {head=NULL;} bool searchList(int x) { for(Node * p=head; p!=NULL && p->key !=x ;p=p->next); if(p==NULL) return false; else return true;} void insertList(int x) { if (head==NULL) head= new Node(x); else { Node * p=new Node(x); p->next=head; head=p; }} } CE 222-Data Structures & Algorithms II, Izmir University of Economics

  30. Implementation :Example p2/3 const int TABLE_SIZE = 5; int hash(int x); // hash function to generate an index number // between 0 - TableSize class HashTable { public: HashTable (); void makeEmpty(); // remove all entries in the table void insert(int x); // insert x to table void remove(int x); // remove x from table private: List table[TABLE_SIZE]; } CE 222-Data Structures & Algorithms II, Izmir University of Economics

  31. Implementation :Example p3/3 void HashTable:: insert(int x) { int value= hash(x); // table[value] is the head of corresponding list if(table[value].searchList(x)==false) table[value].insertList(x); } void HashTable:: remove(int x) { int value= hash(x); // table[value] is the head of corresponding list if(table[value].searchList()==false) cout<< “cannot remove”; else } What in here? CE 222-Data Structures & Algorithms II, Izmir University of Economics

More Related