1 / 13

Hash Tables and Sets

Hash Tables and Sets. Lecture 3. Sets. A set is simply a collection of elements Unlike lists, elements are not ordered Very abstract, general concept with broad usefulness: The set of all Google search queries from the past 24 hours The set of all photos with your face in them

inara
Download Presentation

Hash Tables and Sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hash Tables and Sets Lecture 3

  2. Sets • A set is simply a collection of elements • Unlike lists, elements are not ordered • Very abstract, general concept with broad usefulness: • The set of all Google search queries from the past 24 hours • The set of all photos with your face in them • The set of all files in a folder • How are sets represented in computers? • Consider the following problem: • We want to store a large set of approx. 10 million random numbers • The following operations are happening constantly: • Add – inserting a new number into the set • Delete – removing an existing element from the set • Lookup – checking if a new random number is in the set

  3. Representing Sets • Suppose we use an ArrayList for this heavily churning set: • Add, Delete, and Lookup are all O(n) • Suppose the ArrayList is sorted: • Lookup is O(log n) • Add/Delete are still O(n) • Cleverer algorithms: • Self-balancing trees: • Lookup, Add, and Delete are guaranteed O(log n) • Hash tables: • Lookup, Add, and Delete are worst-case O(n) • … but on average O(1)

  4. Using Buckets • Let’s go back to ArrayLists, but use a different approach: • Create 2 ArrayLists • Even numbers go in the first list • Odd numbers go in the second list • Now, Add/Delete/Lookup only take half the work: • Check if the number is even or odd • Get the right ArrayList • Search through about 5 million entries instead of 10 million • This is promising! • … but still O(n)

  5. Using Buckets • Yet another approach: • Instead of two different ArrayLists, let’s use 4 • Multiples of 4 go in the first list • Multiples of 4 have the property (x % 4) == 0 • If (x % 4) == 1, then x goes in the second list • If (x % 4) == 2, then x goes in the third list • If (x % 4) == 3, then x goes in the fourth list • Now, Add/Delete/Lookup only take ¼ as much work: • Calculate the number mod 4 • Find the right list • Search through 2.5 million elements instead of 10 million • This is even better! • … but still O(n)

  6. Using Buckets • Yet another approach: use 10 million buckets! • If the numbers are truly randomly distributed, then: • Some buckets may be empty • Some buckets may have 2 or even 100 elements • On average, each bucket has close to 1 element • Suddenly, Add/Delete/Lookup become very cheap – O(1) • As long as we scale up the number of buckets to match the amount of data, we can maintain O(1) lookup • This is a hash table!

  7. Hash Functions • In our example, we were only storing integers • We can use this to store arbitrary data, as long as one thing is provided: • A hash function • What is a hash function? • A function that converts any data into an integer • This integer is used to determine which bucket in which to store the data • The hash function must ensure fairly even distribution in the table. More on this later.

  8. Example Hash Function • Suppose we wish to store a set of strings instead of integers • We need a hash function • Here’s a simple one: • ‘a’ = 1, ‘b’ = 2, ‘c’ = 3, …, ‘z’ = 26 • Sum the value of each letter • “asdf”.hashCode() • = ‘a’ + ‘s’ + ‘d’ + ‘f’ • = 1 + 19 + 4 + 6 • = 30 • “asdf” goes in the 30th bucket

  9. Hash Collisions • This hash function has some problems: • It only deals with English letters • We can solve this by using the ASCII or Unicode value of the character instead of its index in the English alphabet • It is prone to collisions • A hash collision is when two or more distinct values have the same hash code • In example hash function, all anagrams collide: • “least”  12 + 5 + 1 + 19 + 20 = 57 • “steal”  19 + 20 + 5 + 1 + 12 = 57 • “stale”  19 + 20 + 1 + 12 + 5 = 57 • Therefore, this hash table would be very bad for storing sets of anagrams! • It would degenerate into using a single ArrayList, as one bucket would be used.

  10. Generalizing • What exactly is a hash table? • Given elements that have a hash function, hash tables are just arrays! • Each array element is an ArrayList in order to resolve collisions • Number of buckets is proportional to number of elements in the set • Expliot time-memory tradeoff to get quick lookup times • Array is resized when hash table gets too “full” • Load factor: The ratio of filled hash table slots to total slots • Load factor is 0.0 when the hash table is empty and 1.0 when every bucket has at least one element • When load factor reaches a certain value, 0.75 in our case, the array gets larger to maintain sparseness • Hash tables can get much more complicated than this, but the fundamentals remain the same.

  11. The Lab • In this lab, we have implemented a very simple hash table • SimpleHashTable.java • It is so simple that it cannot handle collisions! • Each bucket isn’t an ArrayList – it’s just a single element when full, or “null” if empty • Your task is to modify the code and implement collision resolution • This means that each array slot should be an ArrayListinstead of merely an Object

  12. Java Generics • You will see some strange angle-bracket notation: • ArrayList<T>, SimpleHashTable<T> • If parentheses indicate function arguments, then angle brackets indicate type arguments • Type arguments are a way of specifying data structures that work on various types: • ArrayList<String> has: • void add(String arg0) • String get(int index) • SimpleHashSet<Integer> has: • void add(Integer arg0) • boolean contains(Integer arg0)

  13. Operations to Implement • SimpleHashSet.java: • public void add(T element) • public boolean contains(T element) • public boolean remove(T element) • public void clear() • public booleanisEmpty() • public int size() • Some of these may remain unchanged • You will also have to edit the private members and reimplement some private methods

More Related