Hashing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Hashing PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on
  • Presentation posted in: General

Hashing. Motivating Applications. Large collection of datasets Datasets are dynamic (insert, delete) Goal: efficient searching/insertion/deletion Hashing is ONLY applicable for exact-match searching. Direct Address Tables. If the keys domain is U  Create an array T of size U

Download Presentation

Hashing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Hashing

Hashing


Motivating applications

Motivating Applications

  • Large collection of datasets

  • Datasets are dynamic (insert, delete)

  • Goal: efficient searching/insertion/deletion

  • Hashing is ONLY applicable for exact-match searching


Direct address tables

Direct Address Tables

  • If the keys domain is U  Create an array T of size U

  • For each key K  add the object to T[K]

  • Supports insertion/deletion/searching in O(1)


Direct address tables1

Direct Address Tables

Alg.: DIRECT-ADDRESS-SEARCH(T, k)

return T[k]

Alg.: DIRECT-ADDRESS-INSERT(T, x)

T[key[x]] ← x

Alg.: DIRECT-ADDRESS-DELETE(T, x)

T[key[x]] ← NIL

  • Running time for these operations: O(1)

Solution is to use hashing tables

Drawbacks

>> If U is large, e.g., the domain of integers, then T is large (sometimes infeasible)

>> Limited to integer values and does not support duplication


Direct access tables example

Direct Access Tables: Example

Example 1:

Example 2:

U is the domain

K is the actual number of keys


Hashing1

Hashing

  • A data structure that maps values from a certain domain or range to another domain or range

Hash function

3

15

Domain: String values

20

55

Domain: Integer values


Hashing2

Hashing

  • A data structure that maps values from a certain domain or range to another domain or range

Hash function

Range

0

…..

10000

Student IDs

950000

…..

960000

Domain: numbers [0 … 10,000]

Domain: numbers [950,000 … 960,000]


Hash tables

Hash Tables

  • When K is much smaller than U, a hash tablerequires much less space than a direct-address table

    • Can reduce storage requirements to |K|

    • Can still get O(1) search time, but on the average case, not the worst case


Hash tables main idea

Hash Tables: Main Idea

  • Use a hash function h to compute the slot for each key k

  • Store the element in slot h(k)

  • Maintain a hash table of size m  T [0…m-1]

  • A hash function h transforms a key into an index in a hash table T[0…m-1]:

    h : U → {0, 1, . . . , m - 1}

  • We say that k hashes to slot h(k)


Hash tables main idea1

Hash Tables: Main Idea

Hash Table (of size m)

0

U

(universe of keys)

h(k1)

h(k4)

k1

K

(actual

keys)

h(k2) = h(k5)

k4

k2

k3

k5

h(k3)

m - 1

>> m is much smaller that U (m <<U)

>> m can be even smaller than |K|


Example

Example

  • Back to the example of 100 students, each with 9-digit SSN

  • All what we need is a hash table of size 100


What about collisions

What About Collisions

0

U

(universe of keys)

h(k1)

h(k4)

k1

K

(actual

keys)

h(k2) = h(k5)

k4

k2

Collisions!

k3

k5

h(k3)

m - 1

  • Collision means two or more keys will go to the same slot


Handling collisions

Handling Collisions

  • Many ways to handle it

    • Chaining

    • Open addressing

      • Linear probing

      • Quadratic probing

      • Double hashing


Chaining main idea

Chaining: Main Idea

  • Put all elements that hash to the same slot into a linked list (Chain)

    • Slot j contains a pointer to the head of the list of all elements that hash to j


Chaining discussion

Chaining - Discussion

  • Choosing the size of the hash table

    • Small enough not to waste space

    • Large enough such that lists remain short

    • Typically 10% -20% of the total number of elements

  • How should we keep the lists: ordered or not?

    • Usually each list is unsorted linked list


Insertion in hash tables

Insertion in Hash Tables

Alg.:CHAINED-HASH-INSERT(T, x)

insert x at the head of list T[h(key[x])]

  • Worst-case running time is O(1)

  • May or may not allow duplication based on the application


Deletion in hash tables

Deletion in Hash Tables

Alg.:CHAINED-HASH-DELETE(T, x)

delete x from the list T[h(key[x])]

  • Need to find the element to be deleted.

  • Worst-case running time:

    • Deletion depends on searching the corresponding list


Searching in hash tables

Searching in Hash Tables

Alg.:CHAINED-HASH-SEARCH(T, k)

search for an element with key k in list T[h(k)]

  • Running time is proportional to the length of the list of elements in slot h(k)

What is the worst case and average case??


Analysis of hashing with chaining worst case

T

0

chain

m - 1

Analysis of Hashing with Chaining:Worst Case

  • All keys will go to only one chain

  • Chain size is O(n)

  • Searching is O(n) + time to apply h(k)


Analysis of hashing with chaining average case

T

0

chain

chain

chain

chain

m - 1

Analysis of Hashing with Chaining:Average Case

  • With good hash function and uniform distribution of keys

    • Any given element is equally likely to hash into any of the m slots

  • All chain will have similar sizes

  • Assume n (total # of keys), m is the hash table size

    • Average chain size  O (n/m)

Average Search Time O(n/m): The common case


Analysis of hashing with chaining average case1

Analysis of Hashing with Chaining:Average Case

  • If m (# of slots) is proportional to n (# of keys):

    • m = O(n)

    • n/m = O(1)

       Searching takes constant time on average


Hash functions

Hash Functions


Hash functions1

Hash Functions

  • A hash function transforms a key (k) into a table address (0…m-1)

  • What makes a good hash function?

    (1) Easy to compute

    (2) Approximates a random function: for every input, every output is equally likely (simple uniform hashing)

    (3) Reduces the number of collisions


Hash functions2

Hash Functions

  • Goal: Map a key k into one of the m slots in the hash table

  • Make table size (m) a prime number

    • Avoids even and power-of-2 numbers

  • Common function

    h(k) = F(k) mod m

  • Some function or operation on K (usually generates an integer)

    The output of the “mod” is number [0…m-1]


    Examples of hash functions

    Examples of Hash Functions

    Collection of images

    F(k): Sum of the pixels colors

    h(k) = F(k) mod m

    Collection of strings

    F(k): Sum of the ascii values

    h(k) = F(k) mod m

    Collection of numbers

    F(k): just return k

    h(k) = F(k) mod m


  • Login