930 likes | 1.12k Views
the hash table. hash table. hash table. A hash table consists of two major components …. hash table. … a bucket array. hash table. … and a hash function. hash table. Performance is expected to be O(1). bucket array. bucket array. hash table. A bucket array is an array A of size N
E N D
hash table A hash table consists of two major components …
hash table … a bucket array
hash table … and a hash function
hash table Performance is expected to be O(1)
bucket array hash table • A bucket array is an array A of size N • A[i] is a bucket, i.e. a collection of <key,value> pairs • N is the capacity of A • <k,e> is inserted in A[k] • if keys are well distributed between 0 .. N-1 • if keys are unique integers in range 0 .. N-1 • then each bucket holds at most one entry. • consequently O(1) for get, insert, delete • downside: space is proportional to N • if N is much larger than n (number of entries) we waste space • downside: keys must be in range 0 .. N • this may not be the case (think matric number)
bucket array hash table 0 1 2 3 4 5 6 7 8 9 10 (7,Q) (1,D) (3,C) (6,C) Bucket array of size 11 for the entries (1,D), (3,C), (3,F), (6,C) and (7,Q) If hashed keys unique entries in range [0..11] then each bucket holds at most one entry. Otherwise we have a collision and need to deal with it.
collision bucket array hash table When two different entries map to the same bucket we have a collision 11
collision bucket array hash table When two different entries map to the same bucket we have a collision It’s good to avoid collisions 12
hash function hash table A hash function maps each key to an integer in the range [0,N-1] Given entry <k,e> … h(k) is the index into the bucket array store entry <k,e> in A[h(k)] • h is a good hash function if • h maps keys so as to minimise collisions • h is easy to compute/program • h is fast to compute • h(k) has two actions • map k to a hash code • map hash code into range [0,N-1]
hash function hash codes in java hash table But care should be taken as this might not be “good”
af2 • Let A and B be sets • A function is • a mapping from elements of A • to elements of B • and is a subset of AxB • i.e. can be defined by a set of tuples!
af2 • A is the domain • B is codomain • f(x) = y • y is image of x • x is preimage of y • There may be more than one preimage of y • There is only one image of x • otherwise not a function • There may be an element in the codomain with no preimage • Range of f is the set of all images of A • the set of all results
Injection (aka one-to-one, 1-1) af2 a u a x b v b c w c y x d y d z z not an injection injection If an injection then preimages are unique
Injection (aka one-to-one, 1-1) af2 • Ideally we want our hash function to be • injective (no collisions) • have a small codomain and range • may need to compress range a u a x b v b c w c y x d y d z z not an injection injection If an injection then preimages are unique
hash code & hash function Just to clear this up (but lets not make too big a deal about it) …
hash code & hash function Just to clear this up (but lets not make too big a deal about it) … We assume hash code is an integer in the codomain Hash function brings hash codes into the range [0,N-1] We will examine just a few hash functions, acting on strings
Polynomial hash codes hash code & hash function Assume we have a key s that is a character String Here is a really dumb hash code public int dumbHash(String s){ int code = 0; for (int i=0;i<s.length();i++) code = code + s.charAt(i); return code; } • What would we get for • dumbHash(“spot”) • dumbHash(“pots”) • dumbHash(“tops”) • dumbHash(“post”)
Polynomial hash codes hash code & hash function Take into consideration the “position” of elements of the key So, this doesn’t look any different from an every-day number It’s to the base a and the coefficients are the components of the key
Polynomial hash codes hash code & hash function Good values for a appear to be 33, 37, 39, 41
Yikes! Look at that range!!!! Polynomial hash codes hash code & hash function • Small scale experiments on unix dictionary • a = 33 • 25104 words/strings • minimum hash value -9165468936209580338 • maximum hash value 8952279818009261254 • collision count 7
Cyclic shift hash codes hash code & hash function Start moving bits around
Cyclic shift hash codes hash code & hash function
Cyclic shift hash codes hash code & hash function Thanks to Arash Partow
Cyclic shift hash codes hash code & hash function
Cyclic shift hash codes hash code & hash function
Cyclic shift hash codes hash code & hash function
Cyclic shift hash codes hash code & hash function
Cyclic shift hash codes hash code & hash function
Cyclic shift hash codes hash code & hash function
Cyclic shift hash codes hash code & hash function
Compression Functions hash code & hash function So, you think you’ve found something that produces a good hash code … How do we compress its range to fit into our machine?
Compression Functions hash code & hash function Assume we want to limit storage to buckets in range [0,N-1] The division method NOTE: keep N prime int i = (int)(hash(s) % N); S[i] = s; … ideally, but there may be collisions
Compression Functions hash code & hash function Assume we want to limit storage to buckets in range [0,N-1] The multiply add and divide (MAD) method • N is prime • a > 1 is scaling factor • b ≥ 0 is a shift • a % N ≠ 0
hash tables Collision handling schemes
Collision handling schemes hash tables Separate Chaining
Collision handling schemes Separate Chaining hash tables • bucket[i] is a small map • implemented as a list bucket[i] should be a short list It may be sorted It might be something other than a list
Collision handling schemes Separate Chaining hash tables Let N be number of buckets and n the amount of data stored load factor is n/M • Upside: • simple • Downside: • requires auxiliary data structures (to resolve collisions) • this may put additional burden on space
Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list 0 1 2 3 4 5 6 7
Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Jon,plumber) hash(Jon) = 3 0 1 2 3 4 5 6 7
Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Jon,plumber) hash(Jon) = 3 0 1 2 3 Jon,plumber 4 5 6 7
Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Fred,painter) hash(Fred) = 6 0 1 2 3 Jon,plumber 4 5 6 7
Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Fred,painter) hash(Fred) = 6 0 1 2 3 Jon,plumber 4 5 6 Fred,painter 7
Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Joe,prof) hash(Joe) = 1 0 1 2 3 Jon,plumber 4 5 6 Fred,painter 7