1 / 32

Chapter 11 Hash

Chapter 11 Hash. Anshuman Razdan Div of Computing Studies razdan@asu.edu http://dcst2.east.asu.edu/~razdan/cst230/. Searching. Searching for a specific value among a collection of values is a common operation. Complexity of search/find using: array linked list ordered list binary tree

tasha-frye
Download Presentation

Chapter 11 Hash

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 11Hash Anshuman Razdan Div of Computing Studies razdan@asu.eduhttp://dcst2.east.asu.edu/~razdan/cst230/

  2. Searching • Searching for a specific value among a collection of values is a common operation. • Complexity of search/find using: • array • linked list • ordered list • binary tree • BST CST 230 - Razdan et al.

  3. Linear Search • search an array A of n elements for a specified element target i = 0; found = false; while( (i < n) && !found ) if( A[ i ] == (or equals) target found = true; else i++; if( found ) target is at position i else target is not in array CST 230 - Razdan et al.

  4. Complexity of Linear Search • count # of comparisons that must be done. • Worst Case • Average Case CST 230 - Razdan et al.

  5. Binary Search • search a sorted array A of n elements for a specified element target public static int BinarySearch( int[] A, int first, int n, int target ){ int middle; if( n <= 0 ) found = -1; else{ middle = first + size/2; if( target == A[middle] ) found = middle; else if( target < A[middle] ) found = BinarySearch( A, first, n/2, target ); else found = BinarySearch( A, middle+1, (n-1)/2, target ); } return found; } CST 230 - Razdan et al.

  6. Complexity of BinarySearch • BinarySearch body has constant time – so we need to count the number of calls made to BinarySearch • Find the depth of recursive calls – the length of the longest chain on recursive calls in the execution of an algorithm. CST 230 - Razdan et al.

  7. Motivation: Direct Access is Fast • Suppose we have a large number of products to store and that each product has a unique product ID. • If n products have ID’s in range 0..n-1, we can store each product in an array at index prodID. • time to find product? • If # ID’s is much smaller than range of ID’s storing each product at prodID is VERY space inefficient. CST 230 - Razdan et al.

  8. Hashing • Each element has a unique key that identifies the element. • We have: large range of keys • We want: index of elements to be 0..numElem-1 key1 ... key2 ... key3 ... key4 ... keyn hash function 0 1 2 3 ... n-1 CST 230 - Razdan et al.

  9. Common hashing function: Mod • The mod function is a natural choice for hashing because x mod n always results in a number in the range 0 .. n-1. • E.g., Insert the following numbers into a hash table of size 10: 432, 321, 17, 65, 9388, 200, 83, 564 CST 230 - Razdan et al.

  10. Collisions • A perfect hashing function will produce a different index for every key. • Unfortunately, mod is NOT perfect. • 20 mod 10 = 0 • 520 mod 10 = 0 • 1030 mod 10 = 0 • etc. • When two (or more) distinct keys hash to the same index, we have a collision. • There are various methods used to deal with collisions. CST 230 - Razdan et al.

  11. Open-address Hashing • One method to deal with collisions is open-addressing: • compute hash(key) • if data[hash(key)] is not occupied, insert key. else • search forward starting at index hash(key) + 1 until a vacant position is found and insert key. (Note: array is circular, so that after the last index of the array is tried, index 0 is tried next.) • This method is also called “linear probing” CST 230 - Razdan et al.

  12. Example • Insert keys 89, 18, 49, 58, and 9 into a hash table of size 10. CST 230 - Razdan et al.

  13. Hashing non-integer keys • Many applications require collections of objects with non-integer keys (often Strings). • an encoding function converts the key to an integer, and the hash function is performed on the encoding. • all Java classes (objects) include a method called hashCode. • Note: keys must be unique – so encoding of keys must be unique as well. This is very important when designing an encoding scheme. CST 230 - Razdan et al.

  14. Hashtable methods • Common Hashtable methods are: • put  put a new object into the table • containsKey  search for object with specified key (returns boolean) • get retrieve an object for a specified key • remove  removes an object with a specified key CST 230 - Razdan et al.

  15. Example Implementation public class Hashtable{ private int manyItems; private Object[] keys; private Object[] data; private boolean[] hasBeenUsed; private int hash(Object key){ return Math.abs(key.hashCode())%data.length; } private int nextIndex(int i){ return (i+1) % data.length; } ... CST 230 - Razdan et al.

  16. Constructor public Hashtable( int capacity ){ if( capacity <= 0 ) throw new IllegalArgumentException (“Capacity is negative.”); keys = new Object[capacity]; data = new Object[capacity]; hasBeenUsed = new boolean[capacity]; } CST 230 - Razdan et al.

  17. findIndex private int findIndex( Object key ){ int count = 0; int i = hash(key); int retVal = -1; while( (count<data.length) && (hasBeenUsed[i]) && (retVal == -1) ){ if( key.equals(keys[i]) ) retVal = i; count++; i = nextIndex(i); } return retVal; } CST 230 - Razdan et al.

  18. put public Object put(Object key, Object element){ int index = findIndex{key); Object answer = null; if( index != -1 ){ answer = data[index]; data[index] = element; } else if( manyItems < data.length ){ index = hash(key); while( keys[index] != null ) index = nextIndex(index); keys[index] = key; data[index] = element; hasBeenUsed[index] = true; manyItems++; } else throw new IllegalStateException (“Table is full”); return answer; } CST 230 - Razdan et al.

  19. remove public Object remove( key ){ int index = findIndex( key ); Object answer = null; if( index != -1 ){ answer = date[index]; keys[index] = null; data[index] = null; manyItems--; } return answer; } CST 230 - Razdan et al.

  20. get public Object get( Object key ){ int index = findIndex( key ); Object answer = null; if( index != -1 ){ answer = data[index]; } return answer; } CST 230 - Razdan et al.

  21. containsKey public boolean containsKey( Object key ){ } CST 230 - Razdan et al.

  22. Example • Show state of Hashtable after the following are performed (assume hashCode of an integer is the integer itself): • construct Hashtable with capacity 10 • put( new Integer(29), “Barb” ) • put ( new Integer(19), “Mateo” ) • put( new Integer( 9 ), “Eddie” ) • remove( new Integer(19) ) • containsKey( new Integer(9) ) • put( new Integer(30), “Jerry” ) CST 230 - Razdan et al.

  23. Linear probing and clustering • In linear probing, when several keys hash to same index a “cluster” of values forms around the index. • elements take longer to find/add because we must move linearly through entire cluster. • elements are put farther and farther away from desired index. • need other methods that avoid clustering. CST 230 - Razdan et al.

  24. Double Hashing • The most common technique to avoid clustering is double hashing: • use hash function hash1 to determine desired index of element. • if collision occurs, use hash function hash2 to determine next index to search for open spot. • In particular, if index i is occupied, the next index to examine is: (i + hash2(key) ) % data.length CST 230 - Razdan et al.

  25. choosing hash2 • as we step through the array, we must ensure that every array position is examined. • we must choose hash2 to prevent returning to original hash index before visiting entire array. • Array capacity & hash2 value should be relatively prime. One way to accomplish this: • choose data.length as a prime number and have hash2 return values from range 1 .. data.length – 1 • Donald Knuth’s suggestion: • both data.length and data.length – 2 should be prime numbers (called twin primes) e.g. 1231 and 1229 • hash1(key) = Math.abs(key.hashCode()) % data.length • hash2(key) = 1 + (Math.abs(key.hashCode())%(data.length – 2) CST 230 - Razdan et al.

  26. Chained Hashing • In chaining, we essentially allow collisions to occur, and store more than one element at a given array index. • How can we store more than one element? • list • ordered list • bst • If the hash function equally distributes keys over the array, the chains at each index should be relatively short. CST 230 - Razdan et al.

  27. Time Analysis • Worst case for hashing is when all keys hash to same index (linear) • Best case for hashing is when all keys hash to different indices (constant) • Average case analysis gives a better picture of what happens in reality. CST 230 - Razdan et al.

  28. Load Factor • The load factor for a hash table is defined as: • For open-address hashing  <= 1. • For chaining,  could be larger than 1. CST 230 - Razdan et al.

  29. Average Time (Linear Probing) • In open-address hashing with linear probing, a nonfull hash table and no removals, the average number of table elements examined is about • For example. Suppose we have 800 items in a table of capacity 1000. How many entries will we examine on average? CST 230 - Razdan et al.

  30. Average Time (Double Hashing) • In open-address hashing with double hashing, a nonfull hash table, and no removals, the average number of elements examined is about: • How many comparisons for previous example? CST 230 - Razdan et al.

  31. Average Time (Chaining) • In open-address hashing with chained hashing, the average number of table elements examined is about: • How many for previous example? CST 230 - Razdan et al.

  32. Java Data Structures • the java.util package includes the following classes (see http://java.sun.com/j2se/1.4.2/docs/api/ ) • HashMap • Hashtable • LinkedList • as well as interfaces: • Iterator • ListIterator CST 230 - Razdan et al.

More Related