110 likes | 243 Views
In this insightful lecture by Charles Morris, discover the fundamentals of hash tables, an array-like data structure that efficiently associates keys with values using hashing functions. Understand the significance of hash functions, collision resolution techniques, and their applications in cryptography and data storage. Learn how to mitigate collisions and the importance of having a well-designed hashing algorithm. Explore practical examples, including a hashing function used in Perl, and deepen your knowledge about this essential data structure.
E N D
A lecture on hashes By Charles Morris
What is a hash table? • A hash table is an array-like data structure that associates its input (the key) with the associated output (the record, or value). • They use a ‘Hashing Function’ to create the association; more on this later. • Hashes were first known as “Associative Arrays” (and you may think of them as so); but using seven syllables is tiring.
Why a hash? • Hashes have many distinct benefits. • Hashes are designed so that you may find a variable without knowing its location. • Computational complexity for lookup varies; Almost always O(1) or O(2), but in the case of `c` collisions, (where c <= n) it is O(c) (like searching an array).
What is a ‘hash function’? • A hash function is simply a function that generates a fingerprint based on it’s input. • Hashing functions are used in many fields, in cryptography one-way hash functions are used to create a small pseudo-random checksum out of (normally) much larger data. This checksum is used for authentication and secure network transmissions, amongst other applications.
Hash functions and Collisions • In a one-way cryptographic hash function, the amount of collisions is not as important as the randomness of the collisions. These functions try to minimize the computational feasibility of finding a key ‘j’ such that j = f(k); where k is the original key.
Hash functions and Collisions • In a hash function used to store data in a hash table, collisions should be minimized as much as possible; as every time a collision happens for key ‘j’ where j = f(k); it increases the O(lookup of f(k)). • This assumes that your hash table watches for collisions, if it doesn’t, the old value will be trampled with the new value.
Collision mitigation • As was stated, if two keys ‘k’ and ‘j’ both hash to the same index ( f(j) == f(k) ), this causes a collision. • Collisions are avoided by using a good hashing algorithm, however they always happen to some degree when a hash function is given enough data to run on.
Collision resolution • Chaining • Instead of one value being at the location f(k), there is a chain (maybe a linked-list) of values. • Linear or Quadratic Probing is a similar solution, where space is reserved at certain locations. • Double hashing • Double hashing requires another hash computation, O(2n), however the records become so sparse that collisions become very rare.
Hash function example • The hashing function used in Perl 5.005: • (close relative of the popular ‘djb2’ algorithm) // (Defined by the PERL_HASH macro in hv.h) // ported by Charles Morris (me) for C++ programmers unsigned long hashingfunction( string key ) { unsigned long fingerprint = 0; for( int j = 0; j < key.length(); j++) //for each letter in the string { fingerprint = fingerprint * 33 + (int)key.at(j); //sum } return fingerprint; }
Hash function example • hf(‘abc’) using the previous function fingerprint = 0 //fingerprint = fingerprint * 33 + (int)’a’; fingerprint = 0 * 33 + 97; //fingerprint = 97 //fingerprint = fingerprint * 33 + (int)’b’; fingerprint = (97 * 33) + 98; //fingerprint = 3299 //fingerprint = fingerprint * 33 + (int)’c’; fingerprint = (3299 * 33) + 99; //fingerprint = 108966