1 / 21

Appendix I Hashing

Appendix I Hashing. Chapter Scope. Hashing, conceptually Using hashes to solve problems Hash implementations. Hashing. In hashing elements are stored in a hash table at a location determined by applying a hash function to the value to be stored.

vashon
Download Presentation

Appendix I Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Appendix I Hashing

  2. Chapter Scope • Hashing, conceptually • Using hashes to solve problems • Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  3. Hashing • In hashing elements are stored in a hash table at a location determined by applying a hash function to the value to be stored. • Elements are stored in a hash table, with their location determined by a hashing function • Each location is a cell or a bucket. Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  4. Idealistically.. • In an ideal world each value would be hashed to a unique address in a 1-to-1 fashion. • If this were the case, then the time to access/store data in a hash table would be O(1) • Factors to prevent this: • Less than perfect hash function • Limitations on the size of the address space Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  5. Example • Consider an example where we create an array that will hold 26 elements • To store names, we create a simple hashing function that associates the first letter of each name to a separate cell • The first letter of the string determines into which cell it goes • The access time to a particular element is independent of the number of elements stored • All operations would be O(1) • But it requires each element mapping to a unique position • That's called a perfect hashing function Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  6. Less than Perfect • A collision occurs when two or more elements map to the same location • two names that begin with the same letter • Collisions will have to be resolved somehow – a technique for storing multiple elements that map to the same bucket • Even if a hashing function isn't perfect, a good hashing function can still result in O(1) operations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  7. Hash Table Size • How large should the table be? • If we have a dataset of size n and a perfect hashing function, we'd need a table of size n • Without a perfect hashing function, a good guideline is to make the table 150% of the dataset size • If we do not know the size of the dataset, we can rely on dynamic resizing – creating a larger hash table and transferring the elements Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  8. Dynamic Resizing • Deciding when to resize is key • One possibility: when the table is full • But performance of a hash table seriously degrades as it becomes full • A better approach is to use a load factor – a percentage of occupancy at which the table will be resized Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  9. Hashing Functions • Hashing function examples • There are many good approaches to hashing functions • The method used in the name example is extraction – part of an element's key value is used to compute the location Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  10. Hashing Function Examples • Extraction • Using only a part of the element’s value or key to compute the location at which to store the element. • Example on page 1007 • Extract the first character of the value and calculate it’s offset from the letter ‘A’ to determine its location. • ‘A’ maps to 0; ‘B’ maps to 1, etc. Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  11. Hashing Function Examples • Another approach is called division – computing the location mathematically as : Hashcode(key) = Math.abs(key) % p • For some positive integer p, the result will be in the range 0 to p-1 • Using the remainder of the key divided by some positive integer p as the index for the element • Example: Hashcode(key) = Math.abs(key) % p • Yields 0 to p-1 location indices • Use the tablesize as p for a one-to-one mapping • Example: Key value = 79 and table size is 43, • Math.abs(79) % 43 yields 36 • It has been found that using a prime number p as the table size and the divisor helps provide a better distribution of keys Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  12. Hashing Function Examples • Folding • The key is divided into parts which are then combined to create the index • Divide the key into parts where each part is of the same length as the desired index except for perhaps the last part • Shift folding • The parts are added together to create the index • Key = 987-65-4321 • 987 + 654 + 321 => 1962 • Use extraction or division to yield a smaller index • Boundary folding • A slight variation of shift folding where some of the parts of the key are reversed before adding • Key = 987-65-4321 • 987 + 654 + 321 • 987 + 456 + 321 => 1764 Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  13. Hashing Function Examples • Mid-Square Method • In the mid-square method, the key is multiplied by itself and then the extraction method is used (from the middle) • For example, if the key is 4321, multiplying it by itself yields 18671041 • Extract three digits from the middle: 710 • It's important that the same three digits be extracted each time • Recap: key = 4321 • 4321 * 4321 => 18671041 • Assume we need a 3 digit key • Extract 671 or 710 Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  14. Hashing Function Examples • Radix Transformation method • Transform the key into another numeric base • If our key is 23 in base 10, we might convert it to 32 in base 7 • Then we use the division method and divide the converted key by the table size and use the remainder as the index • Example: key= 23 in base 10 • Convert to 32 in base 7 • Use division method to convert to index Hashcode(23) = Math.abs(32) % 17 => index of 15 Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  15. Hashing Functions • In the digit analysis method, the index is formed by extracting and then manipulating specific digits from the key • If the key is 1234567, we might select the digits in positions 2 through 4 yielding 234 • The manipulation could then take many forms: • reversing the digits (432) • performing a circular shift (423) • swapping each pair of digits (324) Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  16. Hashing Functions • In the length-dependent method, the key and the length of the key are combined in some way to form either the index itself or an intermediate version • If our key is 8765, we might multiply the first two digits by the length and then divide by the last digit, yielding 69 • If our table size is 43, we would then use the division method to yield an index of 26 Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  17. Hashing Function Examples • Java.lang.Objecthashcode method • Returns an integer based on the memory location of the object • This is generally not useful, but ensures that all objects have a hashcodemethod • A class may override the inherited version of hashcodeto provide their own • The Stringand Integerclasses define their own hashcodemethods Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  18. Resolving Collisions • As mentioned, without a perfect hashing function, collisions must be resolved • There are several techniques for this as well • Chaining • Treat the table as an array of linked lists • Open Addressing • linear probing • quadratic probing • double hashing Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  19. Chaining with Links or Overflow Area Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  20. Open Addressing • The open addressing method looks for another unused position in the table • The simplest approach is linear probing – if an element hashes to position p and that position is occupied, try position (p+1)%s where s is the size of the table • One problem with linear probing is the development of clusters of occupied cells • There are other approaches to open addressing • quadratic probing • double hashing Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

  21. Java Collections Hash Tables • The Java Collections API provides seven implementations of hashing • Three of these are: • Hashtable – Key-Value Pairs, the oldest class, synchronized. • HashMap- Key-Value Pairs, unsynchronized, permits null values • HashSet –Values only which are unique, unsynchronized, permits null values • Note: The chaining method is used to resolve collisions. Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

More Related