Hash Functions

Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming

Introduction • Hash function • Maps keys to integers (buckets) • Hash(Key) = Integer • Ideally in a random-like manner • Evenly distributed bucket values • Even if the input data is not evenly distributed

An Example • ID Number Generation • Key = your name • Hash(Key) = a number • Not a great hash function… • Two people with the same name will have the same number…

Simple Hash Functions • Assumptions: • K: an unsigned 32-bit integer • M: the number of buckets (the number of entries in a hash table) • Goal: • If a bit is changed in K, all bits are equally likely to change for Hash(K)

A Simple Hash Function… • What if K = M? • Hash(K) = K • What is wrong? • Your student ID = SSN • I can’t use your SSN to post your grades…

Another Simple Function • If K > M • Hash(K) = K % M • What is wrong? • Suppose M = 4, K = 2, 4, 6, 8 • K % M = 2, 0, 2, 0

Yet Another Simple Function • If K > P, P = prime number • Hash(K) = K % P • Suppose P = 3, K = 2, 4, 6, 8 • K % P = 2, 1, 0, 3 • More uniform distribution…but still problematic for other cases

More on Prime Numbers • K > P1 > P2, P1 and P2 are prime numbers • Hash(K) = (K % P1) % P2 • Suppose P1 = 5, P2 = 3, K = 2, 4, 6, 8, 10 • (K % 5) = 2, 4, 1, 3, 0 • (K % 5) % 3 = 2, 1, 1, 0, 0 • Still uniform distribution

Polynomial Functions • If K > P, P = prime number • Hash(K) = K(K + 3) % P • Slightly better than pure modulo functions

How About… • Hash(K) = rand() • What is wrong? • Not repeatable

How About… • K > P, P = prime number • Hash(K) = rand(K) % P • Better randomness • Can be expensive to compute random numbers

Pre-generated Randomness • Two prime numbers: P1 and P2 • K > P1 and K > P2 • A table R[P1], with R[i] pre-initialized to rand(i) % P2 • Hash(K) = R[K % P1] • Slight Problem: Possible duplicate mapping

To Avoid Duplicate Mapping… • Two prime numbers: P1 and P2 • K > P1 and K > P2 • A table R[P1], with R[i] pre-initialized to unique random numbers • Hash(K) = R[K % P1]

An Example • K = 0…232, P1 = 3, P2 = 5 • R[3] = {0, 4, 1} • Hash(K) = R[K % 3]

Hashing a Sequence of Keys • K = {K1, K2, …, Kn) • E.g., Hash(“test”) = 98157 • Design Principles • Use the entire key • Use the ordering information • Use pre-generated randomness

Use the Entire Key unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] } return hash; } • Problem: Hash(“ab”) == Hash(“ba”)

Use the Ordering Information unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] hash = /* hash with some shiftings */ } return hash; } • Problem: H(short keys) will not perturb all 32-bits (clustering)

Use Pre-generated Randomness unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ R[Key[j]] hash = /* hash with some shiftings */ } return hash; }

CRC Variant • Do 5-bit circular shift of hash • XOR hash and K[j] … for (…) { highorder = hash & 0xf8000000; hash = hash << 5; hash = hash ^ (highorder >> 27) hash = hash ^ K[j]; } …

CRC Variant + For long keys, all 32-bits are exercised + More randomness toward lower bits - Not all bits are changed for short keys

BUZ Hash • Set up an array R to store precomputed random numbers … for (…) { highorder = hash & 0x80000000; hash = hash << 1; hash = hash ^ (highorder >> 31) hash = hash ^ R[K[j]]; } …

References • Aho, Sethi, and Ullman. Compilers: Principles, Techniques, and Tools, 1986. • Cormen, Leiserson, River. Introduction to Algorithms, 1990 • Knuth. The Art of Computer Programming, 1973 • Kuenning. Hash Functions, 2003.

Hash Functions

Hash Functions

Presentation Transcript

Hash Functions

Cryptographic Hash Functions

Hash functions

Cryptographic Hash Functions

Hash Functions

Hash Functions

Hash Functions

Hash Functions

Designing Hash functions

Cryptographic Hash Functions

Hash Functions

Hash Functions

Cryptographic Hash Functions

Hash functions

Cryptographic Hash Functions

Cryptographic Hash Functions

Cryptographic Hash Functions

Hash Functions

Cryptographic Hash Functions

Cryptographic Hash Functions

Cryptographic Hash Functions