Lecture 11

Lecture 11

Dictionary • Dictionary: • Dynamic-set data structure for storing items indexed using keys. • Supports operations Insert, Search, and Delete. • Applications: • Symbol table of a compiler. • Memory-management tables in operating systems. • Large-scale distributed systems. • Hash Tables: • Effective way of implementing dictionaries. • Generalization of ordinary arrays.

Direct-address Tables • Direct-address Tables are ordinary arrays. • Facilitate direct addressing. • Element whose key is k is obtained by indexing into the kth position of the array. • Applicable when we can afford to allocate an array with one position for every possible key. • i.e. when the universe of keys U is small. • Dictionary operations can be implemented to take O(1) time

Hashing • Hash function h: Mapping from U to the slots of a hash table T[0..m–1]. h : U {0,1,…, m–1} • With arrays, key k maps to slot A[k]. • With hash tables, key k maps or “hashes” to slot T[h[k]]. • h[k] is the hash value of key k.

Hash Function • Distribute keys among cells of the hash table as evenly as possible • A hash function has to be easy to compute

Hashing 0 U (universe of keys) h(k1) h(k4) k1 K (actual keys) k4 k2 h(k2)=h(k5) k3 h(k3) m–1

Example A, FOOL, AND, HIS, MONEY, SOON, PARTED Hash function: Assume taking mod by 13. (19+15+15+14)%13=11 (SOON)

Issues with Hashing • Multiple keys can hash to the same slot – collisions are possible. • Design hash functions such that collisions are minimized. • But avoiding collisions is impossible. • Design collision-resolution techniques. • Search will cost Ө(n) time in the worst case. • However, all operations can be made to have an expected complexity of Ө(1).

Collision A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED Hash function: Assume taking mod by 13. Collision between SOON and ARE (19+15+15+14)%13=11 (SOON) (1+18+5)%13=11 (ARE)

Methods of Resolution • Chaining: • Store all elements that hash to the same slot in a linked list. • Store a pointer to the head of the linked list in the hash table slot. • Open Addressing: • All elements stored in hash table itself. • When collisions occur, use a systematic (consistent) procedure to store elements in free slots of the table. 0 k1 k4 k5 k2 k6 k7 k3 k8 m–1

Collision Resolution by Chaining 0 U (universe of keys) h(k1)=h(k4) X k1 k4 K (actual keys) k2 X k6 h(k2)=h(k5)=h(k6) k5 k7 k8 k3 X h(k3)=h(k7) h(k8) m–1

Collision Resolution by Chaining 0 U (universe of keys) k1 k4 k1 k4 K (actual keys) k2 k6 k5 k2 k6 k5 k7 k8 k3 k7 k3 k8 m–1

Hashing with Chaining Dictionary Operations: • Chained-Hash-Insert (T, x) • Insert x at the head of list T[h(key[x])]. • Worst-case complexity – O(1). • Chained-Hash-Delete (T, x) • Delete x from the list T[h(key[x])]. • Worst-case complexity – proportional to length of list with singly-linked lists. O(1) with doubly-linked lists. • Chained-Hash-Search (T, k) • Search an element with key k in list T[h(k)]. • Worst-case complexity – proportional to length of list.

Analysis on Chained-Hash-Search • Load factor=n/m = average keys per slot. • m – number of slots. • n – number of elements stored in the hash table. • Worst-case complexity:(n) + time to compute h(k). • Average depends on how h distributes keys among m slots. • Assume • Simple uniform hashing. • Any key is equally likely to hash into any of the m slots, independent of where any other key hashes to. • O(1) time to compute h(k). • Time to search for an element with key k is Q(|T[h(k)]|). • Expected length of a linked list = load factor =  = n/m.

Open addressing • Another approach for collision resolution. • All elements are stored in the hash table itself (so no pointers involved as in chaining). • To insert: if slot is full, try another slot, and another, until an open slot is found (probing) • To search, follow same sequence of probes as would be used when inserting the element

1 2 3 0 Open Addressing • The key is first mapped to a slot: • If there is a collision subsequent probes are performed: • If the offset constant, c and m are not relatively prime, we will not examine all the cells. Ex.: • Consider m=4 and c=2, then only every other slot is checked.When c=1 the collision resolution is done as a linear search. This is known as linear probing.

Open Addressing • Linear probing: Given auxiliary hash function h, the probe sequence starts at slot h(k) and continues sequentially through the table, wrapping after slot m − 1 to slot 0. Given key k and probe number i(0 ≤ i< m), h(k, i) = (h(k) + i) mod m. • Quadratic probing: As in linear probing, the probe sequence starts at h(k). Unlike linear probing, it examines cells 1,4,9, and so on, away from the original probe point: h(k, i) = (h(k) + c1i + c2i 2) mod m

Open Addressing • Even with a good hash function, linear probing has its problems: • The position of the initial mapping i 0 of key k is called the home position of k. • When several insertions map to the same home position, they end up placed contiguously in the table. This collection of keys with the same home position is called a cluster. • As clusters grow, the probability that a key will map to the middle of a cluster increases, increasing the rate of the cluster’s growth. This tendency of linear probing to place items together is known as primary clustering. • As these clusters grow, they merge with other clusters forming even bigger clusters which grow even faster.

Quadratic Probing Quadratic probing solves the primary clustering problem, but it has the secondary clustering problem, in which, elements that hash to the same position probe the same alternative cells. Secondary clustering is a minor theoretical blemish.

Insertion in hash table HASH_INSERT(T,k) • i  0 • repeat j  h(k,i) • if T[j] = NIL • then T[j] = k • return j • else i  i +1 • until i = m • error “ hash table overflow”

Searching from Hash table HASH_SEARCH(T,k) 1 i  0 2 repeat j  h(k,i) 3 if T[j] = k 4 then return j 5 i  i +1 6 until T[j] = NIL or i = m 7 return NIL

Worst case for inserting a key is (n) • Worst case for searching is (n) • Algorithm assumes that keys are not deleted once they are inserted • Deleting a key from an open addressing table is difficult, instead we can mark them in the table as removed (introduced a new class of entries, full, empty and removed)

Lecture 11

Lecture 11

Presentation Transcript

Lecture 11

Lecture 11:

Lecture 11

Lecture 11

Lecture 11

Lecture #11

Lecture #11

Lecture 11

Lecture 11

Lecture # 11

Lecture 11

Lecture 11

Lecture 11

Lecture 11

Lecture 11

Lecture 11

Lecture 11

Lecture 11

Lecture 10 Lecture 10 Lecture 11 Lecture 11 Lecture 11 Lecture 11

Lecture #11

Lecture 11

Lecture 11