hash functions n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Hash Functions PowerPoint Presentation
Download Presentation
Hash Functions

Loading in 2 Seconds...

play fullscreen
1 / 22

Hash Functions - PowerPoint PPT Presentation


  • 183 Views
  • Uploaded on

Hash Functions. Andy Wang Data Structures, Algorithms, and Generic Programming. Introduction. Hash function Maps keys to integers (buckets) Hash(Key) = Integer Ideally in a random-like manner Evenly distributed bucket values Even if the input data is not evenly distributed. An Example.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Hash Functions' - morna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hash functions

Hash Functions

Andy Wang

Data Structures, Algorithms, and Generic Programming

introduction
Introduction
  • Hash function
    • Maps keys to integers (buckets)
      • Hash(Key) = Integer
    • Ideally in a random-like manner
      • Evenly distributed bucket values
      • Even if the input data is not evenly distributed
an example
An Example
  • ID Number Generation
    • Key = your name
    • Hash(Key) = a number
  • Not a great hash function…
    • Two people with the same name will have the same number…
simple hash functions
Simple Hash Functions
  • Assumptions:
    • K: an unsigned 32-bit integer
    • M: the number of buckets (the number of entries in a hash table)
  • Goal:
    • If a bit is changed in K, all bits are equally likely to change for Hash(K)
a simple hash function
A Simple Hash Function…
  • What if K = M?
  • Hash(K) = K
  • What is wrong?
  • Your student ID = SSN
    • I can’t use your SSN to post your grades…
another simple function
Another Simple Function
  • If K > M
  • Hash(K) = K % M
  • What is wrong?
  • Suppose M = 4, K = 2, 4, 6, 8
  • K % M = 2, 0, 2, 0
yet another simple function
Yet Another Simple Function
  • If K > P, P = prime number
  • Hash(K) = K % P
  • Suppose P = 3, K = 2, 4, 6, 8
  • K % P = 2, 1, 0, 3
  • More uniform distribution…but still problematic for other cases
more on prime numbers
More on Prime Numbers
  • K > P1 > P2, P1 and P2 are prime numbers
  • Hash(K) = (K % P1) % P2
  • Suppose P1 = 5, P2 = 3, K = 2, 4, 6, 8, 10
  • (K % 5) = 2, 4, 1, 3, 0
  • (K % 5) % 3 = 2, 1, 1, 0, 0
  • Still uniform distribution
polynomial functions
Polynomial Functions
  • If K > P, P = prime number
  • Hash(K) = K(K + 3) % P
  • Slightly better than pure modulo functions
how about
How About…
  • Hash(K) = rand()
  • What is wrong?
  • Not repeatable
how about1
How About…
  • K > P, P = prime number
  • Hash(K) = rand(K) % P
  • Better randomness
  • Can be expensive to compute random numbers
pre generated randomness
Pre-generated Randomness
  • Two prime numbers: P1 and P2
  • K > P1 and K > P2
  • A table R[P1], with R[i] pre-initialized to rand(i) % P2
  • Hash(K) = R[K % P1]
  • Slight Problem: Possible duplicate mapping
to avoid duplicate mapping
To Avoid Duplicate Mapping…
  • Two prime numbers: P1 and P2
  • K > P1 and K > P2
  • A table R[P1], with R[i] pre-initialized to unique random numbers
  • Hash(K) = R[K % P1]
an example1
An Example
  • K = 0…232, P1 = 3, P2 = 5
  • R[3] = {0, 4, 1}
  • Hash(K) = R[K % 3]
hashing a sequence of keys
Hashing a Sequence of Keys
  • K = {K1, K2, …, Kn)
  • E.g., Hash(“test”) = 98157
  • Design Principles
    • Use the entire key
    • Use the ordering information
    • Use pre-generated randomness
use the entire key
Use the Entire Key

unsigned int Hash(const char *Key) {

unsigned int hash = 0;

for (unsigned int j = 0; j < K; j++) {

hash = hash ^ Key[j]

}

return hash;

}

  • Problem: Hash(“ab”) == Hash(“ba”)
use the ordering information
Use the Ordering Information

unsigned int Hash(const char *Key) {

unsigned int hash = 0;

for (unsigned int j = 0; j < K; j++) {

hash = hash ^ Key[j]

hash = /* hash with some shiftings */

}

return hash;

}

  • Problem: H(short keys) will not perturb all 32-bits (clustering)
use pre generated randomness
Use Pre-generated Randomness

unsigned int Hash(const char *Key) {

unsigned int hash = 0;

for (unsigned int j = 0; j < K; j++) {

hash = hash ^ R[Key[j]]

hash = /* hash with some shiftings */

}

return hash;

}

crc variant
CRC Variant
  • Do 5-bit circular shift of hash
  • XOR hash and K[j]

for (…) {

highorder = hash & 0xf8000000;

hash = hash << 5;

hash = hash ^ (highorder >> 27)

hash = hash ^ K[j];

}

crc variant1
CRC Variant

+ For long keys, all 32-bits are exercised

+ More randomness toward lower bits

- Not all bits are changed for short keys

buz hash
BUZ Hash
  • Set up an array R to store precomputed random numbers

for (…) {

highorder = hash & 0x80000000;

hash = hash << 1;

hash = hash ^ (highorder >> 31)

hash = hash ^ R[K[j]];

}

references
References
  • Aho, Sethi, and Ullman. Compilers: Principles, Techniques, and Tools, 1986.
  • Cormen, Leiserson, River. Introduction to Algorithms, 1990
  • Knuth. The Art of Computer Programming, 1973
  • Kuenning. Hash Functions, 2003.