1 / 17

Hashing

Hashing. Suppose we want to search for a data item in a huge data record tables How long will it take? It depends on the data structure (unsorted) linked list? (O(N)) (balanced) binary search tree (e.g., AVL tree)? (O(log 2 N)) Unsorted array (O(N))

rafe
Download Presentation

Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing • Suppose we want to search for a data item in a huge data record tables • How long will it take? • It depends on the data structure • (unsorted) linked list? (O(N)) • (balanced) binary search tree (e.g., AVL tree)? (O(log2N)) • Unsorted array (O(N)) • Sorted array, binary search (O(log2N)) • Can we do better than O(log n) (e.g., O(1))? • Yes, we can use “hash” functions

  2. Hashing Basic Idea • Use hash function to map keys into positions in a hash table Ideally • If element e has key k and h is hash function, then e is stored in position h(k) of table • To search for e, compute h(k) to locate position. If no element, the table does not contain e.

  3. Hashing (cont’d) • The general idea behind hashing is to directly map each data item into a position in a table (e.g., an array) using some function h • Complex data items must have a “key” in order to perform the hashing. Then, position = h(key) Data Name: Mike Height: .. Address: … h(“Mike”) =1 • With perfect hashing, every key is mapped to a different position. What is the implication of this in terms of T, the size of the array?

  4. Some Hash Function Example • Division • Folding

  5. Hash Function: Division • Hash Functions: Division • If the keys are integers then key%T is generally a good hash function

  6. Example • The hash function was h(key) = key%10. • Inserting entries 20, 25, 34, 76, 59, and 81

  7. Hash Division • What happens if more than one key hash to the same position? • In general, to avoid situations like this, table size T should be a prime number. • Collision methods, will be discussed later

  8. Hash Function: Folding • A key is divided into several parts, and use a simple operation to combine them in a certain way.

  9. Hash Function: Folding • If the keys are strings the hash function is some function of the characters in the strings. • One possibility is to simply add the ASCII values of the characters: h(str) = (sum(str(i)))%T • Another possibility is to convert the string into some number in some arbitrary base b (b also might be a prime number): Example h(ABC)=(65b0+66b1+67b2)%T

  10. Folding Example int h(String x, int T) { int i, sum; for (sum=0, i=0; i<x.length(); i++) sum+= (int)x.charAt(i); return (sum%T); } • sums the ASCII values of the letters in the string • ASCII value for “A” =65; sum will be in range 650-900 for 10 upper-case letters; good when T around 100, for example • order of chars in string has no effect

  11. Collision • What happens if more than one key hash to the same position? • The problem is called a “collision” • Solution #1: Open Addressing • Solution #2: “separate chaining” collision resolution approach, all data items that hash to the same position are kept in a linked list. So, to find the item that we are looking for, we have to search the linked list • Hash functions should try to achieve uniform coverage of the hash table, while minimizing collisions

  12. Handling Collision • “Open addressing” resolves collisions by trying alternative slots in the hash table, until an empty cell is found. In general we try cells in the following order H1(key) = (h(key) + f(i))%T f(0) = 0

  13. Example: Handling Collision-Open Addressing T =10, after 0, 1, 4, 19, 16, 25, 36, 59, 64 and 81 have been inserted using open addressing. The hash function h(key) = key %10 and f(i) =i

  14. Handling Collision: Chaining • With “separate chaining” all data items that hash to the same location are kept in a linked list in that location. • So, each entry in the hash table can be a linked list of arbitrary length. • To find a piece of data we use the hash function to find the correct list, and then we search the list.

  15. Chaining Example T =10, after 0, 1, 4, 19, 16, 25, 36, 59, 64 and 81 have been inserted

  16. Store colliding elements in the same position in the table • When a bucket is full, the open addressing can be used.

More Related