Hash Functions

1 / 22

# Hash Functions - PowerPoint PPT Presentation

Hash Functions. Andy Wang Data Structures, Algorithms, and Generic Programming. Introduction. Hash function Maps keys to integers (buckets) Hash(Key) = Integer Ideally in a random-like manner Evenly distributed bucket values Even if the input data is not evenly distributed. An Example.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Hash Functions

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Hash Functions

Andy Wang

Data Structures, Algorithms, and Generic Programming

Introduction
• Hash function
• Maps keys to integers (buckets)
• Hash(Key) = Integer
• Ideally in a random-like manner
• Evenly distributed bucket values
• Even if the input data is not evenly distributed
An Example
• ID Number Generation
• Key = your name
• Hash(Key) = a number
• Not a great hash function…
• Two people with the same name will have the same number…
Simple Hash Functions
• Assumptions:
• K: an unsigned 32-bit integer
• M: the number of buckets (the number of entries in a hash table)
• Goal:
• If a bit is changed in K, all bits are equally likely to change for Hash(K)
A Simple Hash Function…
• What if K = M?
• Hash(K) = K
• What is wrong?
• Your student ID = SSN
• I can’t use your SSN to post your grades…
Another Simple Function
• If K > M
• Hash(K) = K % M
• What is wrong?
• Suppose M = 4, K = 2, 4, 6, 8
• K % M = 2, 0, 2, 0
Yet Another Simple Function
• If K > P, P = prime number
• Hash(K) = K % P
• Suppose P = 3, K = 2, 4, 6, 8
• K % P = 2, 1, 0, 3
• More uniform distribution…but still problematic for other cases
More on Prime Numbers
• K > P1 > P2, P1 and P2 are prime numbers
• Hash(K) = (K % P1) % P2
• Suppose P1 = 5, P2 = 3, K = 2, 4, 6, 8, 10
• (K % 5) = 2, 4, 1, 3, 0
• (K % 5) % 3 = 2, 1, 1, 0, 0
• Still uniform distribution
Polynomial Functions
• If K > P, P = prime number
• Hash(K) = K(K + 3) % P
• Slightly better than pure modulo functions
• Hash(K) = rand()
• What is wrong?
• Not repeatable
• K > P, P = prime number
• Hash(K) = rand(K) % P
• Better randomness
• Can be expensive to compute random numbers
Pre-generated Randomness
• Two prime numbers: P1 and P2
• K > P1 and K > P2
• A table R[P1], with R[i] pre-initialized to rand(i) % P2
• Hash(K) = R[K % P1]
• Slight Problem: Possible duplicate mapping
To Avoid Duplicate Mapping…
• Two prime numbers: P1 and P2
• K > P1 and K > P2
• A table R[P1], with R[i] pre-initialized to unique random numbers
• Hash(K) = R[K % P1]
An Example
• K = 0…232, P1 = 3, P2 = 5
• R[3] = {0, 4, 1}
• Hash(K) = R[K % 3]
Hashing a Sequence of Keys
• K = {K1, K2, …, Kn)
• E.g., Hash(“test”) = 98157
• Design Principles
• Use the entire key
• Use the ordering information
• Use pre-generated randomness
Use the Entire Key

unsigned int Hash(const char *Key) {

unsigned int hash = 0;

for (unsigned int j = 0; j < K; j++) {

hash = hash ^ Key[j]

}

return hash;

}

• Problem: Hash(“ab”) == Hash(“ba”)
Use the Ordering Information

unsigned int Hash(const char *Key) {

unsigned int hash = 0;

for (unsigned int j = 0; j < K; j++) {

hash = hash ^ Key[j]

hash = /* hash with some shiftings */

}

return hash;

}

• Problem: H(short keys) will not perturb all 32-bits (clustering)
Use Pre-generated Randomness

unsigned int Hash(const char *Key) {

unsigned int hash = 0;

for (unsigned int j = 0; j < K; j++) {

hash = hash ^ R[Key[j]]

hash = /* hash with some shiftings */

}

return hash;

}

CRC Variant
• Do 5-bit circular shift of hash
• XOR hash and K[j]

for (…) {

highorder = hash & 0xf8000000;

hash = hash << 5;

hash = hash ^ (highorder >> 27)

hash = hash ^ K[j];

}

CRC Variant

+ For long keys, all 32-bits are exercised

+ More randomness toward lower bits

- Not all bits are changed for short keys

BUZ Hash
• Set up an array R to store precomputed random numbers

for (…) {

highorder = hash & 0x80000000;

hash = hash << 1;

hash = hash ^ (highorder >> 31)

hash = hash ^ R[K[j]];

}

References
• Aho, Sethi, and Ullman. Compilers: Principles, Techniques, and Tools, 1986.
• Cormen, Leiserson, River. Introduction to Algorithms, 1990
• Knuth. The Art of Computer Programming, 1973
• Kuenning. Hash Functions, 2003.