Hashing Techniques and Applications

Hashing Jeff Chastine

Hashing • Many applications require INSERT, SEARCH and DELETE functions • Hashing on average time can do all of these in O (1) • Based on keys • Falls under two general categories: • Direct-Address Tables • Hash Tables Jeff Chastine

Direct-Addressing • Good for when universe U of keys is small • U = {0, 1, …, m – 1 | m is not large} • All elements have unique keys • Table T [0..m -1] | each slot corresponds to a key • All operations take only O (1) Jeff Chastine

Direct Implementation 0 key satellite data 1 U (universe of keys) 2 2 0 3 3 6 9 7 4 4 1 2 5 5 K (actual keys) 3 6 5 7 8 8 8 9 Jeff Chastine

Direct-Addressing Operations DIRECT-ADDRESS-SEARCH (T, k) return T[k] DIRECT-ADDRESS-INSERT (T, x) T[key[x]] ←x DIRECT-ADDRESS-DELETE (T, x) T[key[x]] ←NIL Jeff Chastine

Hash Tables • What are potential problems with direct addressing? • |U| may be impractical • Set of actual keys may be small • Example SSNs • Here, hash tables require much less storage • Only catch: O (1) is average time instead of worst-case ! Jeff Chastine

How it works • With direct-addressing, something with key k goes into slot k • With hashing it goes into h (k) | h is a hash function • Hash functions try to “randomize” • Hash function maps U to T [0..m – 1] h :U→ {0, 1, …, m – 1} • Instead of |U| values,need only m values Jeff Chastine

Hash Implementation T 0 U (universe of keys) h (k1) h (k4) k1 h (k2)= h (k5) K (actual keys) k5 k4 k2 k3 h (k3) m - 1 Jeff Chastine

Collisions • Have two keys hash to the same slot • Because |U| > m, pigeon hole principle • Therefore, collisions must exist • We often talk of the load factor (α = n/m) • Pick a good hash function • Near random, yet deterministic • Can chain collisions together • This is where the worst-case comes from • Can use open addressing Jeff Chastine

Chaining T U (universe of keys) k1 k7 k4 k7 k1 k5 k2 K (actual keys) k5 k4 k2 k3 k3 Jeff Chastine

Hash Functions • What makes a good hash function? • Equally likely to hash to any of the m slots • If keys are random numbers [0 … 1} then take floor of km • Convert strings to ASCII to hash? • Most usually involve mod Jeff Chastine

Hash Functions • Division method: h (k ) = k mod m • Multiplication method: Let 0 < A < 1 h (k ) = floor(m (k A mod 1) ) // Fractional part Jeff Chastine

Open Addressing • Systematically examine or probe slots until item is found • No lists and no elements stored outside the table; thus α <= 1 • Instead of following pointers, we compute the sequence • Instead of fixed order – is based off of key Jeff Chastine

Kinds of Open Addressing • Linear Probing h (k, i ) = (h’ (k ) + i ) mod m • Quadratic Probing h (k, i ) = (h’ (k ) +c1i + c2i 2) mod m • Double Hashing h (k, i ) = (h1(k ) + i h2(k )) mod m Jeff Chastine

Jeff Chastine

Hashing Techniques and Applications

Hashing Techniques and Applications

Presentation Transcript

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

HASHING

Hashing

Hashing

Hashing

Hashing

Hashing

HASHING

Hashing

Hashing, Hashing Tables

Hashing

Hashing

Hashing