Design and Analysis of Algorithms: Hash Tables

Design and Analysis of Algorithms Hash Tables HaidongXue Fall 2013, at GSU

Dictionary operations Very likely Worst case • INSERT • DELETE • SEARCH (1) O(1) O(1) O(1) (n) O(1) “A hash table is an effective data structure for implementing dictionaries” – textbook page 253

Direct-address tables Direct-addressing: use keys as addresses 2 3 6 1 7 5 1 4 5 7 8 9 10 2 3 6 Direct-address table: What’s the problem here? SEARCH(S, 6) O(1) Storage requirement = , is the universe of keys INSERT(S, ) 4 O(1) When the range of element is in [1, 30000]….. DELETE(S, ) 7 O(1)

Hash tables • Can we have O(1) INSERT, DELETE AND SEARCH with less storage? Yes! 2 2 3 3 6 6 1 1 7 7 5 5 h(2) = 2 mod 3 = 2 h(3) = 3 mod 3 = 0 0 1 2 Hash Table: h(6) = 6 mod 3 = 0 Collision! h(1) = 1 mod 3 = 1 Multiple elements in one slot h(7) = 7 mod 3 = 1 h(5) = 5 mod 3 = 2 Hash Function: h(x) = x mod 3

Hash tables A common method is to put them into a linked-list, i.e. chaining 0 1 2 Hash Table: What is the upper bound length? What is the average length? 3 1 2 5 6 7 SEARCH(S, 6) h(6)=6 mod 3=0 SEARCH in 0-linked-list O(1)+2 (2 is the length of the linked-list) INSERT(S, ) 4 h(4)=4 mod 3=1 INSERT in 1-linked-list O(1)+O(1) = O(1) DELETE(S, ) 7 O(1)+O(1) = O(1) DELETE in 1-linked-list h(7)=7 mod 3=1

Analysis of hash tables …….. Load factor n m 3 4 0 1 2 m-1 …….. Hash Table: Uniform hashing “each key is equally likely to hash to any of the m slots” … … … … … …

Analysis of hash tables With the assumption of uniform hashing 3 4 0 1 2 m-1 …….. Therorem11.1 Unsuccessful search: (1+) Therorem11.2 Successful search: … … … … … … (1+) =, T(n)=(1+) If =, T(n)=(1+O(m))=O(1) How to get uniform hashing?

Hash functions How to get uniform hashing? Uniform hashing “each key is equally likely to hash to any of the m slots” To achieve this goal, many hashing methods are proposed: • Division hashing • Multiplication hashing • Universal hashing

Hash functions – division hashing • h(k) = k mod m where k is value of key, m is the number of slots • E.g.: • Final grades of all my students with a hash table of 10 slots • Items in grocery stores with a hash table of 10 slots • 99 cents, large soda • $1.99, ground beef • $6.99, lamb What’s the problem here? What if we still use 10 slots?

Hash functions – division hashing • h(k) = k mod m • Choose m as a prime number • 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73,… • it sometimes not very convenient to be implemented () e.g.: 99 mod 7 = 1 199 mod 7 = 3 699 mod 7 = 6 What’s the problem here?

Hash functions – multiplication hashing • h(k) = floor(m(kA mod 1)) where m is the number of slots and A is a constant number in (0, 1) • E.g.: A=0.123, m=10 • 99*0.123=12.177 • 199*0.123=24.477 • 699*0.123= 85.977 h(99)=floor(10*0.177)=1 h(199)=floor(10*0.477)=4 h(699)=floor(10*0.977)=9

Hash functions – universal hashing • is set of hash functions; • At the beginning of each execution, randomly choose a hash function from • Universal: where, and are keys, is the number of slots • If is not in the table, • If is in the table, Theorem 11.3

Another method to deal with collisions: Open Address • No linked-list • Hash functions include probe number: • Linear probing: • Quadratic probing: • Double hashing: • When does not work, use Number of probes for unsuccessful search is at most Number of probes for successful search is at most

Another method to deal with collisions: Open Address 2 2 3 3 6 6 1 1 0 3 4 6 7 8 9 1 2 5 Open addressing: h(2, 0)=((2 mod 3) +0)mod 10=2 h(3, 0)=((3 mod 3) +0)mod 10=0 h(6, 0)=((6 mod 3) +0)mod 10=0 h(6, 1)=((6 mod 3) +1)mod 10=1 h(1, 0)=((1 mod 3) +0)mod 10=1 h(1, 1)=((1 mod 3) +1)mod 10=2 h(1, 2)=((1 mod 3) +2)mod 10=3

Design and Analysis of Algorithms: Hash Tables