140 likes | 224 Views
Learn about hash tables, dictionary operations, collision resolution, and hash functions in data structures. Understand direct addressing, chaining, uniform hashing, and open addressing techniques. Master the complexity of hash table operations in worst-case scenarios.
E N D
Design and Analysis of Algorithms Hash Tables HaidongXue Fall 2013, at GSU
Dictionary operations Very likely Worst case • INSERT • DELETE • SEARCH (1) O(1) O(1) O(1) (n) O(1) “A hash table is an effective data structure for implementing dictionaries” – textbook page 253
Direct-address tables Direct-addressing: use keys as addresses 2 3 6 1 7 5 1 4 5 7 8 9 10 2 3 6 Direct-address table: What’s the problem here? SEARCH(S, 6) O(1) Storage requirement = , is the universe of keys INSERT(S, ) 4 O(1) When the range of element is in [1, 30000]….. DELETE(S, ) 7 O(1)
Hash tables • Can we have O(1) INSERT, DELETE AND SEARCH with less storage? Yes! 2 2 3 3 6 6 1 1 7 7 5 5 h(2) = 2 mod 3 = 2 h(3) = 3 mod 3 = 0 0 1 2 Hash Table: h(6) = 6 mod 3 = 0 Collision! h(1) = 1 mod 3 = 1 Multiple elements in one slot h(7) = 7 mod 3 = 1 h(5) = 5 mod 3 = 2 Hash Function: h(x) = x mod 3
Hash tables A common method is to put them into a linked-list, i.e. chaining 0 1 2 Hash Table: What is the upper bound length? What is the average length? 3 1 2 5 6 7 SEARCH(S, 6) h(6)=6 mod 3=0 SEARCH in 0-linked-list O(1)+2 (2 is the length of the linked-list) INSERT(S, ) 4 h(4)=4 mod 3=1 INSERT in 1-linked-list O(1)+O(1) = O(1) DELETE(S, ) 7 O(1)+O(1) = O(1) DELETE in 1-linked-list h(7)=7 mod 3=1
Analysis of hash tables …….. Load factor n m 3 4 0 1 2 m-1 …….. Hash Table: Uniform hashing “each key is equally likely to hash to any of the m slots” … … … … … …
Analysis of hash tables With the assumption of uniform hashing 3 4 0 1 2 m-1 …….. Therorem11.1 Unsuccessful search: (1+) Therorem11.2 Successful search: … … … … … … (1+) =, T(n)=(1+) If =, T(n)=(1+O(m))=O(1) How to get uniform hashing?
Hash functions How to get uniform hashing? Uniform hashing “each key is equally likely to hash to any of the m slots” To achieve this goal, many hashing methods are proposed: • Division hashing • Multiplication hashing • Universal hashing
Hash functions – division hashing • h(k) = k mod m where k is value of key, m is the number of slots • E.g.: • Final grades of all my students with a hash table of 10 slots • Items in grocery stores with a hash table of 10 slots • 99 cents, large soda • $1.99, ground beef • $6.99, lamb What’s the problem here? What if we still use 10 slots?
Hash functions – division hashing • h(k) = k mod m • Choose m as a prime number • 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73,… • it sometimes not very convenient to be implemented () e.g.: 99 mod 7 = 1 199 mod 7 = 3 699 mod 7 = 6 What’s the problem here?
Hash functions – multiplication hashing • h(k) = floor(m(kA mod 1)) where m is the number of slots and A is a constant number in (0, 1) • E.g.: A=0.123, m=10 • 99*0.123=12.177 • 199*0.123=24.477 • 699*0.123= 85.977 h(99)=floor(10*0.177)=1 h(199)=floor(10*0.477)=4 h(699)=floor(10*0.977)=9
Hash functions – universal hashing • is set of hash functions; • At the beginning of each execution, randomly choose a hash function from • Universal: where, and are keys, is the number of slots • If is not in the table, • If is in the table, Theorem 11.3
Another method to deal with collisions: Open Address • No linked-list • Hash functions include probe number: • Linear probing: • Quadratic probing: • Double hashing: • When does not work, use Number of probes for unsuccessful search is at most Number of probes for successful search is at most
Another method to deal with collisions: Open Address 2 2 3 3 6 6 1 1 0 3 4 6 7 8 9 1 2 5 Open addressing: h(2, 0)=((2 mod 3) +0)mod 10=2 h(3, 0)=((3 mod 3) +0)mod 10=0 h(6, 0)=((6 mod 3) +0)mod 10=0 h(6, 1)=((6 mod 3) +1)mod 10=1 h(1, 0)=((1 mod 3) +0)mod 10=1 h(1, 1)=((1 mod 3) +1)mod 10=2 h(1, 2)=((1 mod 3) +2)mod 10=3