1 / 35

Hash

Hash. Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc.edu.cn. Searching. A dictionary-like data structure contains a collection of tuple data: <k1, v1>, <k2, v2>, … keys are comparable and pair-wise distinct supports these operations: new () insert (dict, k, v)

Download Presentation

Hash

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hash Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc.edu.cn

  2. Searching • A dictionary-like data structure • contains a collection of tuple data: • <k1, v1>, <k2, v2>, … • keys are comparable and pair-wise distinct • supports these operations: • new () • insert (dict, k, v) • lookup (dict, k) • delete (dict, k)

  3. Examples

  4. Summary So Far

  5. What’s the Problem? • For every mapping (k, v)s • After we insert it into the dictionary dict, we don’t know it’s position! • Ex: insert (d, “li”, 97), (d, “wang”, 99), (d, “zhang”, 100), … • and then lookup (d, “zhang”); … (“li”, 97) (“wang”, 99) (“zhang”, 100)

  6. (k, v) i … Basic Plan • Start from the array-based approach • Use an array A to hold elements (k, v)s • For every key k: • if we know its position (array index) i from k • then lookup, insert and delete are simple: • A[i] • done in constant time O(1)

  7. (“li”, 97) ? Example • Ex: insert (d, “li”, 97), (d, “wang”, 99), (d, “zhang”, 100), …;and then lookup (d, “zhang”); Problem#1: How to calculate index from the given key? …

  8. (“li”, 97) ? Example • Ex: insert (d, “li”, 97), (d, “wang”, 99), (d, “zhang”, 100), …;and then lookup (d, “zhang”); Problem#2: How long should array be? …

  9. Basic Plan • Save (k, v)s in an array, index i calculated from key k • Hash function: a method for computing index from given keys (“li”, 97) hash (“li”) …

  10. Hash Function • Given any key, compute an index • Efficiently computable • Ideal goals: for any key, the index is uniform • different keys to different indexes • However, thorough research problem, :-( • Next, we assume that the array is of infinite length, so the hash function has type: • int hash (key k); • To get some idea, next we perform a “case analysis” on how different key types affect “hash”

  11. Hash Function On “int” // If the key of hash is of “int” type, the hash // function is trivial: int hash (int i) { return i; }

  12. Hash Function On “char” // If the key of hash is of “char” type, the hash // function comes with type conversion: int hash (char c) { return c; }

  13. Hash Function On “float” // Also type conversion: int hash (float f) { return (int)f; } // how to deal with 0.aaa, say 0.5?

  14. Hash Function On “string” // Example: “BillG”: // A trivial one, but not so good: int hash (char *s) { int i=0, sum=0; while (s[i]) { sum += s[i]; i++; } return sum; }

  15. Hash Function On “Point” // Suppose we have a user-define type: struct Point2d { int x; int y; }; int hash (struct Point2d pt) { // ??? }

  16. From “int” Hash to Index • Recall the type: • int hash (T data); • Problems with “int” return type • At any time, the array is finite • no negative index (say -10) • Our goal: • int i ==> [0, N-1] • Ok, that’s easy! It’s just: abs(i) % N

  17. Bug! • Note that “int”s range: -231~231-1 • So abs(-231) = 231 • Overflow! • The key step is to wipe the sign bit off int t = i & 0x7fffffff; int hc = t % N; • In summary: hc = (i & 0x7fffffff) % N;

  18. Collision • Given two keys k1 and k2, we compute two hash codes hc1, hc2[0, N-1] • If k1<>k2, but h1==h2, then a collision occurs (k1, v1) (k2, v2) i …

  19. Collision Resolution • Open Addressing • Re-hash • Chaining (Multi-map)

  20. Chaining • For collision index i, we keep a separate linear list (chain) at index i (k1, v1) (k2, v2) i … k1 k2

  21. k8 k1 k5 k43 k2 General Scheme

  22. k8 k1 k5 k43 k2 Load Factor • loadFactor=numItems/numBuckets • defaultLoadFactor: default value of the load factor

  23. “hash” ADT: interface #ifndef HASH_H #define HASH_H typedef void *poly; typedef poly key; typedef poly value; typedef struct hashStruct *hash; hash newHash (); hash newHash2 (double lf); void insert (hash h, key k, value v); poly lookup (hash h, key k); void delete (hash h, key k); #endif

  24. Hash Implementation #include “hash.h” #define EXT_FACTOR 2 #define INIT_BUCKETS 16 struct hashStruct { linkedList *buckets; int numBuckets; int numItems; double loadFactor; };

  25. k8 k1 k5 k43 k2 In Figure h buckets numBuckets numItems loadFactor

  26. “newHash ()” hash newHash () { hash h = (hash)malloc (sizeof (*h)); h->buckets = malloc (INIT_BUCKETS * sizeof (linkedList)); for (…) // init the array h->numBuckets = INIT_BUCKETS; h->numItems = 0; h->loadFactor = 0.25; return h; }

  27. “newHash2 ()” hash newHash2 (double lf) { hash h = (hash)malloc (sizeof (*h)); h->buckets=(linkedList *)malloc (INIT_BUCKETS * sizeof (linkedList)); for (…) // init the array h->numBuckets = INIT_BUCKETS; h->numItems = 0; h->loadFactor = lf; return h; }

  28. “lookup (hash, key)” value lookup (hash h, key k, compTy cmp) { int i = k->hashCode (); // how to perform this? int hc = (i & 0x7fffffff) % (h->numBuckets); value t =linkedListSearch ((h->buckets)[hc], k); return t; }

  29. k8 k1 k5 k43 k2 Ex: lookup (ha, k43) hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1 ha buckets

  30. k8 k1 k5 k43 k2 Ex: lookup (ha, k43) hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1 ha buckets compare k43 with k8,

  31. k8 k1 k5 k43 k2 Ex: lookup (ha, k43) hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1 ha buckets compare k43 with k43, found!

  32. “insert” void insert (hash h, poly k, poly v) { if (1.0*numItems/numBuckets >=defaultLoadFactor) // buckets extension & items re-hash; int i = k->hashCode (); // how to perform this? int hc = (i & 0x7fffffff) % (h->numBuckets); tuple t = newTuple (k, v); linkedListInsertHead ((h->buckets)[hc], t); return; }

  33. k8 k1 k5 k43 k2 Ex: insert (ha, k13) hc = (hash (k13) & 0x7fffffff) % 8; // suppose hc==4 ha buckets

  34. Ex: insert (ha, k13) hc = (hash (k13) & 0x7fffffff) % 8; // suppose hc==4 ha buckets k8 k13 k5 k43 k1 k2

  35. Complexity

More Related