1 / 20

Tutorial 10 Hashing

Tutorial 10 Hashing. Search Records. Traversal Search  O(n) Binary Search  O(log n) Can we do it in O(1)  Hashing. Hashing. Put a record into one of many buckets in some way based on the key When searching a record by key, identify the bucket and search in the bucket Main concepts

yeriel
Download Presentation

Tutorial 10 Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial 10Hashing

  2. Search Records • Traversal Search  O(n) • Binary Search  O(log n) • Can we do it in O(1) •  Hashing

  3. Hashing • Put a record into one of many buckets in some way based on the key • When searching a record by key, identify the bucket and search in the bucket • Main concepts • hash table, bucket, slot • hash function: maps keys into buckets

  4. Hashing • For a key k, h(k) is the home address ( home bucket) • Two keys, k1 and k2, are said to be synonyms if h(k1)=h(k2) • Collision: home bucket for a new record to be inserted is occupied by a record with a different key already • Overflow: there is no space in the home bucket for the new record

  5. kiwi 0 1 2 3 4 5 6 7 8 9 banana watermelon apple mango peach grapes strawberry Example • 10 bucket, each with 1 slot • h("apple") = 5,h("watermelon") = 3,h("grapes") = 8,h(“peach") = 7,h("kiwi") = 0,h("strawberry") = 9,h("mango") = 6, h("banana") = 2. • Insert h(“orange”)=7

  6. Avoid collisions • T: the size of the key space • N: number of records • Key density: n/T • Loading density: a=n/(bs) • Method 1: use the space of T for support n records • Method 2: design a mechanism to handle overflows

  7. Hash functions • Object • Fast and minimize the number of collisons • Perfect hash function • Uniform hash • Random: h(k)=i with probability 1/b • Division: h(k)=k%D • Mid-Square • Folding

  8. Hash function: Division • h(k)=k % D • Choose of D • Should not be D=2^p or D=10^p • Use the lowest-order p bits of k • Should not be even number • All even keys go to even buckets and odd keys go to odd buckets • Should be a prime number, or at least an odd number

  9. Hash function: mid-square • Squaring the key and use an appropriate number of bits from the middle of the square • Example • A hash table with b=2^r buckets

  10. Hash function: folding • Partition a key k into several parts, each has the same length except the last one. • All these partitions are added together to obtain the home bucket for k • Schema • Shift folding • Folding at the boundaries • Example

  11. Hash function: folding

  12. Convert strings to integers • Add the ASCII code of each characters • h(“ABC”)=65+66+67=198 • Problem: any permutations has the same hash number • Key a is an array of characters of length n • h(a)= • h(“ABC”)==65478

  13. Overflow handling – Linear Probing • Compute h(k) • Examine the hash table buckets in the order , for until one of the following happens • has a record whose key is ; is found. • is empty; is not in the table. • Return to ; the table is full.

  14. Linear Probing • Let a hash table be with b=17 buckets • Let a hash function be h(k)=k%b • Consider inserting 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45. • Intends to have a cluster, a block of contiguously occupied slots • The bigger a cluster is, the more likely it will be even bigger when a new key is hashed into the cluster • The larger the cluster the slower the performance

  15. Exercise • Consider a hash function h(k)=k% D, where D is not given. We want to figure out what value of D is being used. We wish to achieve this using as few attempts as possible, where an attempt consists of supplying the function with k and observing h(k). Indicate how this may be achieved in the following two cases. D is known to be a prime number in the range [10,20].

  16. Solution If D is a prime number in the range of [10, 20], then D must be 11, 13, 17, or 19. We can test the hash function with each of these.

  17. Exercise • Given several records {2341, 4234, 2839, 430, 33, 397, 3920} a hash table of size 7, and a hash function h(x)=x % 7, show the resulting tables after inserting all records with linear probing

  18. Solution • 2341 % 7 = 3 4234 % 7 = 6 2839 % 7 = 4 430 % 7 = 3 • 22 % 7 = 1 397 % 7 = 5 3920 % 7 = 0

  19. Exercise • Suppose you could steal a system file with user names and hashed passwords and suppose you knew the hash function used for the passwords. Would this give you access to user accounts on the system? • Suppose you know someone’s login name and you know a password that is different from their password, but this other password has the same hash value as their password. Does that allow you to log in to their account?

  20. Solution • It wouldn’t give you direct access. Even though you know the user names, you don’t know the passwords. You only know the hashed passwords. When you log in, you don’t enter the hash of the password. You have to enter the password itself and you don’t know it • Yes. The system doesn’t know their true password. It only knows the hash value of their password. So if you enter your password and it hashes to the same hash value, then the system cannot tell the difference. The hashed password matches the one in the “password file” and it assumes you have the correct password. It lets you in.

More Related