hashing
Download
Skip this Video
Download Presentation
Hashing

Loading in 2 Seconds...

play fullscreen
1 / 25

Hashing - PowerPoint PPT Presentation


  • 131 Views
  • Uploaded on

Hashing. Motivating Applications. Large collection of datasets Datasets are dynamic (insert, delete) Goal: efficient searching/insertion/deletion Hashing is ONLY applicable for exact-match searching. Direct Address Tables. If the keys domain is U  Create an array T of size U

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Hashing' - zoltin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
motivating applications
Motivating Applications
  • Large collection of datasets
  • Datasets are dynamic (insert, delete)
  • Goal: efficient searching/insertion/deletion
  • Hashing is ONLY applicable for exact-match searching
direct address tables
Direct Address Tables
  • If the keys domain is U  Create an array T of size U
  • For each key K  add the object to T[K]
  • Supports insertion/deletion/searching in O(1)
direct address tables1
Direct Address Tables

Alg.: DIRECT-ADDRESS-SEARCH(T, k)

return T[k]

Alg.: DIRECT-ADDRESS-INSERT(T, x)

T[key[x]] ← x

Alg.: DIRECT-ADDRESS-DELETE(T, x)

T[key[x]] ← NIL

  • Running time for these operations: O(1)

Solution is to use hashing tables

Drawbacks

>> If U is large, e.g., the domain of integers, then T is large (sometimes infeasible)

>> Limited to integer values and does not support duplication

direct access tables example
Direct Access Tables: Example

Example 1:

Example 2:

U is the domain

K is the actual number of keys

hashing1
Hashing
  • A data structure that maps values from a certain domain or range to another domain or range

Hash function

3

15

Domain: String values

20

55

Domain: Integer values

hashing2
Hashing
  • A data structure that maps values from a certain domain or range to another domain or range

Hash function

Range

0

…..

10000

Student IDs

950000

…..

960000

Domain: numbers [0 … 10,000]

Domain: numbers [950,000 … 960,000]

hash tables
Hash Tables
  • When K is much smaller than U, a hash tablerequires much less space than a direct-address table
    • Can reduce storage requirements to |K|
    • Can still get O(1) search time, but on the average case, not the worst case
hash tables main idea
Hash Tables: Main Idea
  • Use a hash function h to compute the slot for each key k
  • Store the element in slot h(k)
  • Maintain a hash table of size m  T [0…m-1]
  • A hash function h transforms a key into an index in a hash table T[0…m-1]:

h : U → {0, 1, . . . , m - 1}

  • We say that k hashes to slot h(k)
hash tables main idea1
Hash Tables: Main Idea

Hash Table (of size m)

0

U

(universe of keys)

h(k1)

h(k4)

k1

K

(actual

keys)

h(k2) = h(k5)

k4

k2

k3

k5

h(k3)

m - 1

>> m is much smaller that U (m <

>> m can be even smaller than |K|

example
Example
  • Back to the example of 100 students, each with 9-digit SSN
  • All what we need is a hash table of size 100
what about collisions
What About Collisions

0

U

(universe of keys)

h(k1)

h(k4)

k1

K

(actual

keys)

h(k2) = h(k5)

k4

k2

Collisions!

k3

k5

h(k3)

m - 1

  • Collision means two or more keys will go to the same slot
handling collisions
Handling Collisions
  • Many ways to handle it
    • Chaining
    • Open addressing
      • Linear probing
      • Quadratic probing
      • Double hashing
chaining main idea
Chaining: Main Idea
  • Put all elements that hash to the same slot into a linked list (Chain)
    • Slot j contains a pointer to the head of the list of all elements that hash to j
chaining discussion
Chaining - Discussion
  • Choosing the size of the hash table
    • Small enough not to waste space
    • Large enough such that lists remain short
    • Typically 10% -20% of the total number of elements
  • How should we keep the lists: ordered or not?
    • Usually each list is unsorted linked list
insertion in hash tables
Insertion in Hash Tables

Alg.:CHAINED-HASH-INSERT(T, x)

insert x at the head of list T[h(key[x])]

  • Worst-case running time is O(1)
  • May or may not allow duplication based on the application
deletion in hash tables
Deletion in Hash Tables

Alg.:CHAINED-HASH-DELETE(T, x)

delete x from the list T[h(key[x])]

  • Need to find the element to be deleted.
  • Worst-case running time:
    • Deletion depends on searching the corresponding list
searching in hash tables
Searching in Hash Tables

Alg.:CHAINED-HASH-SEARCH(T, k)

search for an element with key k in list T[h(k)]

  • Running time is proportional to the length of the list of elements in slot h(k)

What is the worst case and average case??

analysis of hashing with chaining worst case
T

0

chain

m - 1

Analysis of Hashing with Chaining:Worst Case
  • All keys will go to only one chain
  • Chain size is O(n)
  • Searching is O(n) + time to apply h(k)
analysis of hashing with chaining average case
T

0

chain

chain

chain

chain

m - 1

Analysis of Hashing with Chaining:Average Case
  • With good hash function and uniform distribution of keys
    • Any given element is equally likely to hash into any of the m slots
  • All chain will have similar sizes
  • Assume n (total # of keys), m is the hash table size
    • Average chain size  O (n/m)

Average Search Time O(n/m): The common case

analysis of hashing with chaining average case1
Analysis of Hashing with Chaining:Average Case
  • If m (# of slots) is proportional to n (# of keys):
    • m = O(n)
    • n/m = O(1)

 Searching takes constant time on average

hash functions1
Hash Functions
  • A hash function transforms a key (k) into a table address (0…m-1)
  • What makes a good hash function?

(1) Easy to compute

(2) Approximates a random function: for every input, every output is equally likely (simple uniform hashing)

(3) Reduces the number of collisions

hash functions2
Hash Functions
    • Goal: Map a key k into one of the m slots in the hash table
  • Make table size (m) a prime number
    • Avoids even and power-of-2 numbers
  • Common function

h(k) = F(k) mod m

Some function or operation on K (usually generates an integer)

The output of the “mod” is number [0…m-1]

examples of hash functions
Examples of Hash Functions

Collection of images

F(k): Sum of the pixels colors

h(k) = F(k) mod m

Collection of strings

F(k): Sum of the ascii values

h(k) = F(k) mod m

Collection of numbers

F(k): just return k

h(k) = F(k) mod m

ad