Hashing Techniques and Applications
150 likes | 185 Views
Hashing is utilized for fast INSERT, SEARCH, and DELETE operations. Direct-Address and Hash tables are the two main categories, each with its own advantages and challenges. Understanding hash functions, collisions, and implementation methods is crucial for effective use of hashing.
Hashing Techniques and Applications
E N D
Presentation Transcript
Hashing Jeff Chastine
Hashing • Many applications require INSERT, SEARCH and DELETE functions • Hashing on average time can do all of these in O (1) • Based on keys • Falls under two general categories: • Direct-Address Tables • Hash Tables Jeff Chastine
Direct-Addressing • Good for when universe U of keys is small • U = {0, 1, …, m – 1 | m is not large} • All elements have unique keys • Table T [0..m -1] | each slot corresponds to a key • All operations take only O (1) Jeff Chastine
Direct Implementation 0 key satellite data 1 U (universe of keys) 2 2 0 3 3 6 9 7 4 4 1 2 5 5 K (actual keys) 3 6 5 7 8 8 8 9 Jeff Chastine
Direct-Addressing Operations DIRECT-ADDRESS-SEARCH (T, k) return T[k] DIRECT-ADDRESS-INSERT (T, x) T[key[x]] ←x DIRECT-ADDRESS-DELETE (T, x) T[key[x]] ←NIL Jeff Chastine
Hash Tables • What are potential problems with direct addressing? • |U| may be impractical • Set of actual keys may be small • Example SSNs • Here, hash tables require much less storage • Only catch: O (1) is average time instead of worst-case ! Jeff Chastine
How it works • With direct-addressing, something with key k goes into slot k • With hashing it goes into h (k) | h is a hash function • Hash functions try to “randomize” • Hash function maps U to T [0..m – 1] h :U→ {0, 1, …, m – 1} • Instead of |U| values,need only m values Jeff Chastine
Hash Implementation T 0 U (universe of keys) h (k1) h (k4) k1 h (k2)= h (k5) K (actual keys) k5 k4 k2 k3 h (k3) m - 1 Jeff Chastine
Collisions • Have two keys hash to the same slot • Because |U| > m, pigeon hole principle • Therefore, collisions must exist • We often talk of the load factor (α = n/m) • Pick a good hash function • Near random, yet deterministic • Can chain collisions together • This is where the worst-case comes from • Can use open addressing Jeff Chastine
Chaining T U (universe of keys) k1 k7 k4 k7 k1 k5 k2 K (actual keys) k5 k4 k2 k3 k3 Jeff Chastine
Hash Functions • What makes a good hash function? • Equally likely to hash to any of the m slots • If keys are random numbers [0 … 1} then take floor of km • Convert strings to ASCII to hash? • Most usually involve mod Jeff Chastine
Hash Functions • Division method: h (k ) = k mod m • Multiplication method: Let 0 < A < 1 h (k ) = floor(m (k A mod 1) ) // Fractional part Jeff Chastine
Open Addressing • Systematically examine or probe slots until item is found • No lists and no elements stored outside the table; thus α <= 1 • Instead of following pointers, we compute the sequence • Instead of fixed order – is based off of key Jeff Chastine
Kinds of Open Addressing • Linear Probing h (k, i ) = (h’ (k ) + i ) mod m • Quadratic Probing h (k, i ) = (h’ (k ) +c1i + c2i 2) mod m • Double Hashing h (k, i ) = (h1(k ) + i h2(k )) mod m Jeff Chastine