1 / 22

Grokking Hash Tables

Grokking Hash Tables. A hash table is…. Just a data structure that’s used to establish a mapping between arbitrary objects Not to be confused with Map : An interface that specifies methods that a program could call to get/set/query key/value mappings

golda
Download Presentation

Grokking Hash Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grokking Hash Tables

  2. A hash table is… • Just a data structure that’s used to establish a mapping between arbitrary objects • Not to be confused with Map: • An interface that specifies methods that a program could call to get/set/query key/value mappings • Basically, defines what mappings are • A hash table is one way (but not the only) to make a Map • And your MondoHashTable is just implementation of that hash table data struct that happens to match the Map interface

  3. Example: /** * MondoHashTable is a hash table based * implementation of the Map interface */ import java.util.Map; public class MondoHashTable implements Map { … }

  4. So what’s a mapping? • A mapping is just a pair relationship between two things • In math, we write a mapping m: xy to denote that m establishes relationships between things of type x and things of type y • Big restriction: every uniquex must map to a singley • Essentially: every left hand side has one and only one right hand side

  5. Examples of mappings • 376.0827 (abs(sqrt(37))) • 79609  “Prof Lane’s office phone” • 123456789 studentRecord(“J. Student”) • -6.0827  36.999 • studentRecord(“J. Student”) gradeSet() • largeFunkyDataObject() otherObject() • “nigeria”  114 • “cs35”  12 • Left side is the key; right side is the value

  6. Small integers are easy… double[] mapArray=new double[200]; mapArray[37]=Math.sqrt(37); 37 6.0827

  7. What about non-integers? • Desire: have a table that gives quick lookup for arbitrary objects • Doesn’t require vast space • Answer: introduce an intermediate step • Hash function turns key into hash code • Use hash code to look up key/val pair in table • table[h(key)]=<key,value>

  8. The big picture… MondoHashTable “nigeria” h(“nigeria”) get(“nigeria”) nigeria 114 1945462417

  9. Some practical questions • 1.9 billion? Isn’t that a little much?

  10. Some practical questions • 1.9 billion? Isn’t that a little much? • A: reduce mod table size • int h=hashFunction(Object a) % table.length;

  11. Some practical questions • 1.9 billion? Isn’t that a little much? • A: reduce mod table size • int h=hashFunction(Object a) % table.length; • Where do the hash functions come from?

  12. Some practical questions • 1.9 billion? Isn’t that a little much? • A: reduce mod table size • int h=hashFunction(Object a) % table.length; • Where do the hash functions come from? • A: java.lang.Object.hashCode()

  13. Some practical questions • 1.9 billion? Isn’t that a little much? • A: reduce mod table size • int h=hashFunction(Object a) % table.length; • Where do the hash functions come from? • A: java.lang.Object.hashCode() • How big should the table be, initially?

  14. Some practical questions • 1.9 billion? Isn’t that a little much? • A: reduce mod table size • int h=hashFunction(Object a) % table.length; • Where do the hash functions come from? • A: java.lang.Object.hashCode() • How big should the table be, initially? • Good choice: pick a prime # • Ask the user (arg to constructor)

  15. Some practical questions • 1.9 billion? Isn’t that a little much? • A: reduce mod table size • int h=hashFunction(Object a) % table.length; • Where do the hash functions come from? • A: java.lang.Object.hashCode() • How big should the table be, initially? • Good choice: pick a prime # • Ask the user (arg to constructor) • What happens if the table gets too full?

  16. Some practical questions • 1.9 billion? Isn’t that a little much? • A: reduce mod table size • int h=hashFunction(Object a) % table.length; • Where do the hash functions come from? • A: java.lang.Object.hashCode() • How big should the table be, initially? • Good choice: pick a prime # • Ask the user (arg to constructor) • What happens if the table gets too full? • A: resize it!

  17. #1 killer question: Collisions • What happens if • (a.hashCode()%tSz)==(b.hashCode()%tSz) • Depends… • If a.equals(b), then these are the same key • If not… • This is a hash collision • Basically, you have two different keys pointing at the same location in the hash table • Have to resolve this somehow -- find unique storage for every key and don’t lose anything

  18. Collision strategy 1: Chaining • Make each cell in the hash table a “bucket” containing multiple key/value pairs h(“nigeria”) nigeria 114 viagra 29 h(“viagra”)

  19. Collision strategy 2: Open addressing • Each cell of the table actually holds a key/value pair • When you have a collision, rehash to find a new location for the new pair • Linear probing: try next cell in line • Quadratic probing: try cell h+1, then h+4, h+9, h+16, h+15 … • Double hashing: h(k)=(h1(k)+i*h2(k)) mod table.size() • Repeat probes until you find an empty spot

  20. Map.keySet() • A common operation on hash tables (Maps): get all of the keys • You’ll probably use this in the “dump” functionality of SpamBGon • Map requires: • Set keySet(): Returns a set view of the keys contained in this map. • What’s a view?

  21. Views of data • Different interface to the same underlying data • Doesn’t copy the data -- just provides new methods for accessing it • A “set view of the keys”, then, is an object that behaves like (i.e., implements) a set, but gives you access to all of the keys of the hash table.

  22. The set view picture MondoHashTable Set size() contains() iterator() keySet()

More Related