- 54 Views
- Uploaded on
- Presentation posted in: General

Lecture 11

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Lecture 11

- Dictionary:
- Dynamic-set data structure for storing items indexed using keys.
- Supports operations Insert, Search, and Delete.
- Applications:
- Symbol table of a compiler.
- Memory-management tables in operating systems.
- Large-scale distributed systems.

- Hash Tables:
- Effective way of implementing dictionaries.
- Generalization of ordinary arrays.

- Direct-address Tables are ordinary arrays.
- Facilitate direct addressing.
- Element whose key is k is obtained by indexing into the kth position of the array.

- Applicable when we can afford to allocate an array with one position for every possible key.
- i.e. when the universe of keys U is small.

- Dictionary operations can be implemented to take O(1) time

- Hash function h: Mapping from U to the slots of a hash table T[0..m–1].
h : U {0,1,…, m–1}

- With arrays, key k maps to slot A[k].
- With hash tables, key k maps or “hashes” to slot T[h[k]].
- h[k] is the hash value of key k.

- Distribute keys among cells of the hash table as evenly as possible
- A hash function has to be easy to compute

0

U

(universe of keys)

h(k1)

h(k4)

k1

K

(actual

keys)

k4

k2

h(k2)=h(k5)

k3

h(k3)

m–1

A, FOOL, AND, HIS, MONEY, SOON, PARTED

Hash function: Assume taking mod by 13.

(19+15+15+14)%13=11 (SOON)

- Multiple keys can hash to the same slot – collisions are possible.
- Design hash functions such that collisions are minimized.
- But avoiding collisions is impossible.
- Design collision-resolution techniques.

- Search will cost Ө(n) time in the worst case.
- However, all operations can be made to have an expected complexity of Ө(1).

A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED

Hash function: Assume taking mod by 13.

Collision between SOON and ARE

(19+15+15+14)%13=11 (SOON)

(1+18+5)%13=11 (ARE)

- Chaining:
- Store all elements that hash to the same slot in a linked list.
- Store a pointer to the head of the linked list in the hash table slot.

- Open Addressing:
- All elements stored in hash table itself.
- When collisions occur, use a systematic (consistent) procedure to store elements in free slots of the table.

0

k1

k4

k5

k2

k6

k7

k3

k8

m–1

0

U

(universe of keys)

h(k1)=h(k4)

X

k1

k4

K

(actual

keys)

k2

X

k6

h(k2)=h(k5)=h(k6)

k5

k7

k8

k3

X

h(k3)=h(k7)

h(k8)

m–1

0

U

(universe of keys)

k1

k4

k1

k4

K

(actual

keys)

k2

k6

k5

k2

k6

k5

k7

k8

k3

k7

k3

k8

m–1

Dictionary Operations:

- Chained-Hash-Insert (T, x)
- Insert x at the head of list T[h(key[x])].
- Worst-case complexity – O(1).

- Chained-Hash-Delete (T, x)
- Delete x from the list T[h(key[x])].
- Worst-case complexity – proportional to length of list with singly-linked lists. O(1) with doubly-linked lists.

- Chained-Hash-Search (T, k)
- Search an element with key k in list T[h(k)].
- Worst-case complexity – proportional to length of list.

- Load factor=n/m = average keys per slot.
- m – number of slots.
- n – number of elements stored in the hash table.

- Worst-case complexity:(n) + time to compute h(k).
- Average depends on how h distributes keys among m slots.
- Assume
- Simple uniform hashing.
- Any key is equally likely to hash into any of the m slots, independent of where any other key hashes to.

- O(1) time to compute h(k).

- Simple uniform hashing.
- Time to search for an element with key k is Q(|T[h(k)]|).
- Expected length of a linked list = load factor = = n/m.

- Another approach for collision resolution.
- All elements are stored in the hash table itself (so no pointers involved as in chaining).
- To insert: if slot is full, try another slot, and another, until an open slot is found (probing)
- To search, follow same sequence of probes as would be used when inserting the element

1

2

3

0

- The key is first mapped to a slot:
- If there is a collision subsequent probes are performed:
- If the offset constant, c and m are not relatively prime, we will not examine all the cells. Ex.:
- Consider m=4 and c=2, then only every other slot is checked.When c=1 the collision resolution is done as a linear search. This is known as linear probing.

- Linear probing: Given auxiliary hash function h, the probe sequence starts at slot h(k) and continues sequentially through the table, wrapping after slot m − 1 to slot 0. Given key k and probe number i(0 ≤ i< m),
h(k, i) = (h(k) + i) mod m.

- Quadratic probing: As in linear probing, the probe sequence starts at h(k). Unlike linear probing, it examines cells 1,4,9, and so on, away from the original probe point:
h(k, i) = (h(k) + c1i + c2i 2) mod m

- Even with a good hash function, linear probing has its problems:
- The position of the initial mapping i 0 of key k is called the home position of k.
- When several insertions map to the same home position, they end up placed contiguously in the table. This collection of keys with the same home position is called a cluster.
- As clusters grow, the probability that a key will map to the middle of a cluster increases, increasing the rate of the cluster’s growth. This tendency of linear probing to place items together is known as primary clustering.
- As these clusters grow, they merge with other clusters forming even bigger clusters which grow even faster.

Quadratic probing solves the primary clustering problem, but it has the secondary clustering problem, in which, elements that hash to the same position probe the same alternative cells. Secondary clustering is a minor theoretical blemish.

HASH_INSERT(T,k)

- i 0
- repeat j h(k,i)
- if T[j] = NIL
- then T[j] = k
- return j
- else i i +1
- until i = m
- error “ hash table overflow”

HASH_SEARCH(T,k)

1 i 0

2repeat j h(k,i)

3if T[j] = k

4 then return j

5 i i +1

6until T[j] = NIL or i = m

7return NIL

- Worst case for inserting a key is (n)
- Worst case for searching is (n)
- Algorithm assumes that keys are not deleted once they are inserted
- Deleting a key from an open addressing table is difficult, instead we can mark them in the table as removed (introduced a new class of entries, full, empty and removed)