1 / 12

Experiments with Hashing 15-451 Feb. 15, 2001

Experiments with Hashing 15-451 Feb. 15, 2001. Some Hash Functions Bucket Size Distribution Maximum Bucket Sizes. http://www.cs.cmu.edu/~bryant. Parameters. Keys /usr/dict/words N = 45,402 English words From “Aarhus” to “Zurich” 1–28 characters long “antidisestablishmentarianism”

Download Presentation

Experiments with Hashing 15-451 Feb. 15, 2001

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experiments with Hashing 15-451 Feb. 15, 2001 • Some Hash Functions • Bucket Size Distribution • Maximum Bucket Sizes http://www.cs.cmu.edu/~bryant

  2. Parameters • Keys • /usr/dict/words • N = 45,402 English words • From “Aarhus” to “Zurich” • 1–28 characters long • “antidisestablishmentarianism” • Hashing • Into M buckets • Load = N/M • 8 different hash functions

  3. Hash Functions • Key x = c1 c2 … clen(K) • Functions • h1(x) = c1 mod M • This is really bad! • Since only have 52 characters • h2(x) = ci mod M • Hashes “not” and “ton” to same bucket • h3(x) =  (ai * ci) mod M • ai’s random 22-bit numbers • This should be a good function • h4(x) =  (ai * ci + bi) mod M • ai’s, bi’s random 22-bit numbers • This should be even better function

  4. More Hash Functions • h5(x) =  (ai * ci) mod M • ai’s random 22-bit numbers • All sums & products computed module p = 524,287 • This should be a good function • h6(x) =  (ai * ci + bi) mod M • ai’s, bi’s random 22-bit numbers • All sums & products computed module p = 524,287 • This should be the best function • h7(x) = h6(first 5 characters of K) • hashes “botch”, “botches”, “botching”, and “botched” to same bucket • h8(x) = random(0..M-1) • Not a real hash function • Should represent ideal case

  5. Bucket Size Distribution • Experiment • Hash 45,402 keys into 128 buckets • Load = 354.7 • Average number of keys per bucket • Measure • Range of bucket sizes • Normalize as count/load • Average = 1.0 • Determines how well hash function does at distributing keys

  6. Bucket Size Distribution Results

  7. Bucket Size Dist. Results (closeup)

  8. Distribution Observations • Load = 354.7 • h1 is really bad • only uses 52 buckets • Largest one has 4532 elements • h7 is pretty bad too • Good function, but only over first 5 characters • Largest has 529 elements • Rest look fairly decent • h2: 441 max. [Ignores order of characters] • h4: 428 max. [Why not better than h3?] • h6: 409 max. [Why not better than h5?] • h8: 403 max. [Random] • Hey! This should be the best! • h3: 402 max. [Mod p helps] • h5: 400 max. [Mod p helps]

  9. Maximum Bucket Size • Experiment • Hash 45,402 keys into M buckets • M powers of 2 from 128 to 65,536 • Load 354.7 to 0.69 • Measure • Maximum bucket size • Normalize as count/load • Determines worst case access time

  10. Max. Bucket Size Results

  11. Max. Bucket Size Results (Closeup)

  12. Bucket Size Observations • h1 is really bad • only uses 52 buckets • Largest one has 4532 elements, independent of M • h7 is pretty bad too • Good function, but only over first 5 characters • Largest bucket with M=65,536 has 197 elements • h2 doesn’t do very well • Ignores order of characters • Largest bucket with M=65,536 has 163 elements • Rest are comparable • 6–7 elements in largest bucket for M= 65,536 • Compare to theory • When M=N, E[largest bucket size] = log N / log log N • For M=65,536, this would be 16/4 = 4.

More Related