1 / 25

# Hashing - PowerPoint PPT Presentation

Hashing. Motivating Applications. Large collection of datasets Datasets are dynamic (insert, delete) Goal: efficient searching/insertion/deletion Hashing is ONLY applicable for exact-match searching. Direct Address Tables. If the keys domain is U  Create an array T of size U

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Hashing' - zoltin

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Hashing

• Large collection of datasets

• Datasets are dynamic (insert, delete)

• Goal: efficient searching/insertion/deletion

• Hashing is ONLY applicable for exact-match searching

• If the keys domain is U  Create an array T of size U

• For each key K  add the object to T[K]

• Supports insertion/deletion/searching in O(1)

return T[k]

T[key[x]] ← x

T[key[x]] ← NIL

• Running time for these operations: O(1)

Solution is to use hashing tables

Drawbacks

>> If U is large, e.g., the domain of integers, then T is large (sometimes infeasible)

>> Limited to integer values and does not support duplication

Example 1:

Example 2:

U is the domain

K is the actual number of keys

• A data structure that maps values from a certain domain or range to another domain or range

Hash function

3

15

Domain: String values

20

55

Domain: Integer values

• A data structure that maps values from a certain domain or range to another domain or range

Hash function

Range

0

…..

10000

Student IDs

950000

…..

960000

Domain: numbers [0 … 10,000]

Domain: numbers [950,000 … 960,000]

• When K is much smaller than U, a hash tablerequires much less space than a direct-address table

• Can reduce storage requirements to |K|

• Can still get O(1) search time, but on the average case, not the worst case

• Use a hash function h to compute the slot for each key k

• Store the element in slot h(k)

• Maintain a hash table of size m  T [0…m-1]

• A hash function h transforms a key into an index in a hash table T[0…m-1]:

h : U → {0, 1, . . . , m - 1}

• We say that k hashes to slot h(k)

Hash Table (of size m)

0

U

(universe of keys)

h(k1)

h(k4)

k1

K

(actual

keys)

h(k2) = h(k5)

k4

k2

k3

k5

h(k3)

m - 1

>> m is much smaller that U (m <<U)

>> m can be even smaller than |K|

• Back to the example of 100 students, each with 9-digit SSN

• All what we need is a hash table of size 100

0

U

(universe of keys)

h(k1)

h(k4)

k1

K

(actual

keys)

h(k2) = h(k5)

k4

k2

Collisions!

k3

k5

h(k3)

m - 1

• Collision means two or more keys will go to the same slot

• Many ways to handle it

• Chaining

• Linear probing

• Double hashing

• Put all elements that hash to the same slot into a linked list (Chain)

• Slot j contains a pointer to the head of the list of all elements that hash to j

• Choosing the size of the hash table

• Small enough not to waste space

• Large enough such that lists remain short

• Typically 10% -20% of the total number of elements

• How should we keep the lists: ordered or not?

• Usually each list is unsorted linked list

Alg.:CHAINED-HASH-INSERT(T, x)

insert x at the head of list T[h(key[x])]

• Worst-case running time is O(1)

• May or may not allow duplication based on the application

Alg.:CHAINED-HASH-DELETE(T, x)

delete x from the list T[h(key[x])]

• Need to find the element to be deleted.

• Worst-case running time:

• Deletion depends on searching the corresponding list

Alg.:CHAINED-HASH-SEARCH(T, k)

search for an element with key k in list T[h(k)]

• Running time is proportional to the length of the list of elements in slot h(k)

What is the worst case and average case??

0

chain

m - 1

Analysis of Hashing with Chaining:Worst Case

• All keys will go to only one chain

• Chain size is O(n)

• Searching is O(n) + time to apply h(k)

0

chain

chain

chain

chain

m - 1

Analysis of Hashing with Chaining:Average Case

• With good hash function and uniform distribution of keys

• Any given element is equally likely to hash into any of the m slots

• All chain will have similar sizes

• Assume n (total # of keys), m is the hash table size

• Average chain size  O (n/m)

Average Search Time O(n/m): The common case

Analysis of Hashing with Chaining:Average Case

• If m (# of slots) is proportional to n (# of keys):

• m = O(n)

• n/m = O(1)

 Searching takes constant time on average

### Hash Functions

• A hash function transforms a key (k) into a table address (0…m-1)

• What makes a good hash function?

(1) Easy to compute

(2) Approximates a random function: for every input, every output is equally likely (simple uniform hashing)

(3) Reduces the number of collisions

• Goal: Map a key k into one of the m slots in the hash table

• Make table size (m) a prime number

• Avoids even and power-of-2 numbers

• Common function

h(k) = F(k) mod m

• Some function or operation on K (usually generates an integer)

The output of the “mod” is number [0…m-1]

Collection of images

F(k): Sum of the pixels colors

h(k) = F(k) mod m

Collection of strings

F(k): Sum of the ascii values

h(k) = F(k) mod m

Collection of numbers

F(k): just return k

h(k) = F(k) mod m