Hashing

Hashing Chapter 10

The search time of each algorithm depend on the number n of elements of the collection S of the data. • A searching technique called Hashing or Hash addressing which is essentially independent of the number n. • Hashing uses a data structure called a hash table. Although hash tables provide fast insertion, deletion, and retrieval, operations that involve searching, such as finding the minimum or maximum value, are not performed very quickly. • It is also used in many encryption algorithms.

Hash Table is a data structure in which keys are mapped to array positions by a hash function. This table can be searched for an item in O(1) time using a hash function to form an address from the key. • Hash Function is a function which, when applied to the key, produces an integer which can be used as an address in a hash table. • Perfect hash function • Good hash function • When more than one element tries to occupy the same array position, we have a collision. • Collision is a condition resulting when two or more keys produce the same hash location.

Comparison of keys was the main operation used by the previous discussed searching methods . • There is a different way of searching by calculates the position of the key based on the value of the key. • So, the search time is reduced to O(1) from O(n) or from O(log n). • We need to find a function h that can transfer a key K (string, number, record, etc..) into an index the a table used for storing items of the same type as K. • This function is called hash function.

Example: Suppose we want to store a sequence of randomly generated numbers, keys: 5, 17, 37, 20, 42, 3. The array A, the hash table, where we want to store the numbers: 0 1 2 3 4 5 6 7 8 | | | | | | | | | | We need a way of mapping the numbers to the array indexes, a hash function, that will let us store the numbers and later recompute the index when we want to retrieve them. There is a natural choice for this.

Our hashtable has 9 fields and the mod function, which sends every integer to its remainder modulo 9, will map an integer to a number between 0 and 8. 5 mod 9 = 5 17 mod 9 = 8 37 mod 9 = 1 20 mod 9 = 2 42 mod 9 = 6 3 mod 9 = 3 We store the values: | | 37 | 20 | 3 | | 5 | 42 | | 17 | In this case, computing the hash value of the number n to be stored: n mod 9, costs a constant amount of time. And so does the actual storage, because n is stored directly in an array field.

10.1 Hash Functions • Division • A hash function must guarantee that the number it returns is a valid index to one of the table entries. • The simplest way is to use division modulo. • TSize=sizeof(table), as in h(K)= K mod TSize. • It is best if TSize is a prime number. • The division method is usually the preferred choice for the hash function if very little is known about the keys.

10.1 Hash Functions • Folding • The key is divided into several parts. These parts are combined or folded together and are usually transformed in a certain way to create the target address. • It is simple and fast especially when bit patterns are used instead of numerical values. • There are two types of folding • Shift folding • Boundary folding

Shift folding • The key is divided into several parts then these parts processed using simple operation such as addition. • Exe: (SSN) 123-45-6789 can be divided into three parts, 123, 456, 789, and then these parts can be added. The resulting 1,368 can be divided modulo TSize. • Boundary folding • The key is seen as being written on a piece of paper that is folded on the borders between different parts of the key. So, every part will be in the reverse order. • Exe: (SSN) with three parts, 123, 456, 789. the first part is taken in the same order, the second part is in reverse order, and the third pat is in the same order. The result is 123+654+789=1,566. • This process is simple and fast especially when bit patterns are used.

10.1 Hash Functions(cont’) • Mid-Square function • The key is squared and the middle or mid part of the result is used as the address. • Exe: the key is 3,121 then (3,121)2 =9,740,641 and for 1,000-cell table, h(3,121)=406. • Extraction • Only a part of the key is used to compute the address. • Exe: (SSN) 123-45-6789, this method might use for example: the first four digits ( 1234), the last four (6789), or combined the first two with the last two (1289).

10.1 Hash Functions(cont’) • Radix transformation • The key K is transformed into another number base; K is expressed in a numerical system using a different radix. • Collisions can not be avoided.

Hashing

Hashing

Presentation Transcript

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

HASHING

Hashing

Hashing

Hashing

Hashing

Hashing

HASHING

Hashing

Hashing

Hashing, Hashing Tables

Hashing

Hashing

Hashing