Comprehensive Guide to Symbol Tables: Operations, Storage, and Hash Functions

Symbol Tables • Symbol tables are used by compilers to keep track of information about • variables • functions • class names • type names • temporary variables • etc. • Typical symbol table operations are Insert, Delete and Search • It's a dictionary structure!

Symbol Tables • What kind of information is usually stored in a symbol table? • type • storage class • size • scope • stack frame offset • register • We also need a way to keep track of reserved words.

Symbol Tables • Where is a symbol table stored? • array/linked list • simple, but linear lookup time • However, we may use a sorted array for reserved words, since they are generally few and known in advance. • balanced tree • O(lgn) lookup time • hash table • most common implementation • O(1) amortized time for dictionary operations

Hashing • Hash tables • use array of size m to store elements • given key k (the identifier name), use a function h to compute index h(k) for that key • collisions are possible • two keys hash into the same slot. • Hash functions • A good hash function • is easy to compute • avoids collisions (by breaking up patterns in the keys and uniformly distributing the hash values)

Hashing • In the following slides: • k is a key • h(k) is the hash function • m is the size of the hash table • n is the number of keys in the hash table

Hashing • What makes a good hash function? • It is easy to compute • It minimizes collisions. • hash = to chop into small pieces (Merriam- Webster) =to chop any patterns in the keys so that the results are uniformly distributed (cs311)

Hashing • When the key is a string, we generally use the ASCII values of its characters in some way: • Examples for k = c1c2c3...cx • h(k) = (c1128x-1+c2128x-2+...+cx1280) mod m • h(k) = (c1+c2+...+cx) mod m • h(k) = (h1(c1)+h2(c2)+...hx(cx)) mod m, where each hi is an independent hash function.

Hash functions Truncation • Ignore part of the key and use the remaining part directly as the index. • Example: if the keys are 8-digit numbers and the hash table has 1000 entries, then the first, fourth and eighth digit could make the hash function. • Not a very good method : does not distribute keys uniformly

Hash functions Folding • Break up the key in parts and combine them in some way. • Example : if the keys are 9 digit numbers, break up a key into three 3-digit numbers and add them up.

Hash functions Middle square • Compute k*k and pick some digits from the resulting number. • Example : given a 9-digit key k, and a hash table of size 1000 pick three digits from the middle of the number k*k. • Works fairly well in practice if the keys do not have many leading or trailing zeroes.

Hash functions Division • h(k)=k mod m • Fast • Not all values of m are suitable for this. For example powers of 2 should be avoided because then k mod m is just the least significant digits of k • Good values for m are prime numbers .

Hash functions Multiplication • h(k)=m (k  c- k  c)  , 0<c<1 • In English : • Multiply the key k by a constant c, 0<c<1 • Take the fractional part of kc • Multiply that by m • Take the floor of the result • The value of m does not make a difference • Some values of c work better than others • A good value is

Hash functions Multiplication • Example: Suppose the size of the table, m, is 1301. For k=1234, h(k)=850 For k=1235, h(k)=353 For k=1236, h(k)=115 For k=1237, h(k)=660 For k=1238, h(k)=164 For k=1239, h(k)=968 For k=1240, h(k)=471 nice distribution!

Hash functions Universal Hashing • Worst-case scenario: The chosen keys all hash to the same slot. This can be avoided if the hash function is not fixed: • Start with a collection of hash functions • Select one at random and use that. • Good performance on average: the probability that the randomly chosen hash function exhibits the worst-case behavior is very low.

Load factor • Given a hash table of size m, and n elements stored in it, we define theload factor of the table as=n/m • The load factor gives us an indication of how full the table is. • The possible values of the load factor depend on the method we use for resolving collisions.

Resolving collisions: Chaining • Chaining* • Put all the elements that collide in a chain (list) attached to the slot. • The hash table is an array of linked lists • The load factor indicates the average number of elements stored in a chain. It could be less than, equal to, or larger than 1. * a.k.a. closed addressing

Resolving collisions: Chaining • Insert/Delete/Lookup in expected O(1) time • Keep the list doubly-linked to facilitate deletions • Worst case of lookup time is linear. • However, this assumes that the chains are kept small. • If the chains start becoming too long, the table must be enlarged and all the keys rehashed.

Resolving collisions: Chaining • Assumption: simple uniform hashing • any given key is equally likely to hash into any of the m slots • Analysis of unsuccessful search: • average time to search unsuccessfully for key k = the average time to search to the end of a chain. • The average length of a chain is . • Total (average) time required : (1+ )

Resolving collisions: Chaining • Analysis of successful search: • Expected number e of elements examined during a successful search for key k= one more than the expected number of elements examined when k was inserted. • it makes no difference whether we insert at the beginning or the end of the list. • Take the average, over the n items in the table, of 1 plus the expected length of the chain to which the i th element was added:

Resolving collisions: Chaining Total time : (1+ )

Resolving collisions: Chaining • Both types of search take (1+ ) time on average. • If n=O(m), then =O(1) and the total time for Search is O(1) on average • Insert : O(1) in the worst case • Delete : O(1) in the worst case

Resolving collisions: Chaining • Storage for the elements may be allocated and deallocated within the hash table itself by linking all unused slots into a free list. • Insert: • if key k hashes into empty slot h(k), put it there and set a flag to indicate that this is the actual position where the element hashed. • if h(k) is not empty, and the element k1 it contains has its flag set, then use a slot off the free list to store k1. Its flag should be unset. • if h(k) is not empty, and the element k1 it contains has its flag unset, then move k1 to another slot and store k in h(k).

Comprehensive Guide to Symbol Tables: Operations, Storage, and Hash Functions