360 likes | 536 Views
Hash. Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc.edu.cn. Searching. A dictionary-like data structure contains a collection of tuple data: <k1, v1>, <k2, v2>, … keys are comparable and pair-wise distinct supports these operations: new () insert (dict, k, v)
E N D
Hash Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc.edu.cn
Searching • A dictionary-like data structure • contains a collection of tuple data: • <k1, v1>, <k2, v2>, … • keys are comparable and pair-wise distinct • supports these operations: • new () • insert (dict, k, v) • lookup (dict, k) • delete (dict, k)
What’s the Problem? • For every mapping (k, v)s • After we insert it into the dictionary dict, we don’t know it’s position! • Ex: insert (d, “li”, 97), (d, “wang”, 99), (d, “zhang”, 100), … • and then lookup (d, “zhang”); … (“li”, 97) (“wang”, 99) (“zhang”, 100)
(k, v) i … Basic Plan • Start from the array-based approach • Use an array A to hold elements (k, v)s • For every key k: • if we know its position (array index) i from k • then lookup, insert and delete are simple: • A[i] • done in constant time O(1)
(“li”, 97) ? Example • Ex: insert (d, “li”, 97), (d, “wang”, 99), (d, “zhang”, 100), …;and then lookup (d, “zhang”); Problem#1: How to calculate index from the given key? …
(“li”, 97) ? Example • Ex: insert (d, “li”, 97), (d, “wang”, 99), (d, “zhang”, 100), …;and then lookup (d, “zhang”); Problem#2: How long should array be? …
Basic Plan • Save (k, v)s in an array, index i calculated from key k • Hash function: a method for computing index from given keys (“li”, 97) hash (“li”) …
Hash Function • Given any key, compute an index • Efficiently computable • Ideal goals: for any key, the index is uniform • different keys to different indexes • However, thorough research problem, :-( • Next, we assume that the array is of infinite length, so the hash function has type: • int hash (key k); • To get some idea, next we perform a “case analysis” on how different key types affect “hash”
Hash Function On “int” // If the key of hash is of “int” type, the hash // function is trivial: int hash (int i) { return i; }
Hash Function On “char” // If the key of hash is of “char” type, the hash // function comes with type conversion: int hash (char c) { return c; }
Hash Function On “float” // Also type conversion: int hash (float f) { return (int)f; } // how to deal with 0.aaa, say 0.5?
Hash Function On “string” // Example: “BillG”: // A trivial one, but not so good: int hash (char *s) { int i=0, sum=0; while (s[i]) { sum += s[i]; i++; } return sum; }
Hash Function On “Point” // Suppose we have a user-define type: struct Point2d { int x; int y; }; int hash (struct Point2d pt) { // ??? }
From “int” Hash to Index • Recall the type: • int hash (T data); • Problems with “int” return type • At any time, the array is finite • no negative index (say -10) • Our goal: • int i ==> [0, N-1] • Ok, that’s easy! It’s just: abs(i) % N
Bug! • Note that “int”s range: -231~231-1 • So abs(-231) = 231 • Overflow! • The key step is to wipe the sign bit off int t = i & 0x7fffffff; int hc = t % N; • In summary: hc = (i & 0x7fffffff) % N;
Collision • Given two keys k1 and k2, we compute two hash codes hc1, hc2[0, N-1] • If k1<>k2, but h1==h2, then a collision occurs (k1, v1) (k2, v2) i …
Collision Resolution • Open Addressing • Re-hash • Chaining (Multi-map)
Chaining • For collision index i, we keep a separate linear list (chain) at index i (k1, v1) (k2, v2) i … k1 k2
k8 k1 k5 k43 k2 General Scheme
k8 k1 k5 k43 k2 Load Factor • loadFactor=numItems/numBuckets • defaultLoadFactor: default value of the load factor
“hash” ADT: interface #ifndef HASH_H #define HASH_H typedef void *poly; typedef poly key; typedef poly value; typedef struct hashStruct *hash; hash newHash (); hash newHash2 (double lf); void insert (hash h, key k, value v); poly lookup (hash h, key k); void delete (hash h, key k); #endif
Hash Implementation #include “hash.h” #define EXT_FACTOR 2 #define INIT_BUCKETS 16 struct hashStruct { linkedList *buckets; int numBuckets; int numItems; double loadFactor; };
k8 k1 k5 k43 k2 In Figure h buckets numBuckets numItems loadFactor
“newHash ()” hash newHash () { hash h = (hash)malloc (sizeof (*h)); h->buckets = malloc (INIT_BUCKETS * sizeof (linkedList)); for (…) // init the array h->numBuckets = INIT_BUCKETS; h->numItems = 0; h->loadFactor = 0.25; return h; }
“newHash2 ()” hash newHash2 (double lf) { hash h = (hash)malloc (sizeof (*h)); h->buckets=(linkedList *)malloc (INIT_BUCKETS * sizeof (linkedList)); for (…) // init the array h->numBuckets = INIT_BUCKETS; h->numItems = 0; h->loadFactor = lf; return h; }
“lookup (hash, key)” value lookup (hash h, key k, compTy cmp) { int i = k->hashCode (); // how to perform this? int hc = (i & 0x7fffffff) % (h->numBuckets); value t =linkedListSearch ((h->buckets)[hc], k); return t; }
k8 k1 k5 k43 k2 Ex: lookup (ha, k43) hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1 ha buckets
k8 k1 k5 k43 k2 Ex: lookup (ha, k43) hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1 ha buckets compare k43 with k8,
k8 k1 k5 k43 k2 Ex: lookup (ha, k43) hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1 ha buckets compare k43 with k43, found!
“insert” void insert (hash h, poly k, poly v) { if (1.0*numItems/numBuckets >=defaultLoadFactor) // buckets extension & items re-hash; int i = k->hashCode (); // how to perform this? int hc = (i & 0x7fffffff) % (h->numBuckets); tuple t = newTuple (k, v); linkedListInsertHead ((h->buckets)[hc], t); return; }
k8 k1 k5 k43 k2 Ex: insert (ha, k13) hc = (hash (k13) & 0x7fffffff) % 8; // suppose hc==4 ha buckets
Ex: insert (ha, k13) hc = (hash (k13) & 0x7fffffff) % 8; // suppose hc==4 ha buckets k8 k13 k5 k43 k1 k2