Universal Hashing

Universal Hashing • When attempting to foil an malicious adversary, randomize the algorithm • Universal hashing: pick a hash function randomly when the algorithm begins • Guarantees good performance on average, no matter what keys adversary chooses • Need a family of hash functions to choose from • Think of quick-sort

Universal Hashing • Let Gbe a (finite) collection of hash functions • …that map a given universe U of keys… • …into the range {0, 1, …, m - 1}. • Gis said to be universal if: • for each pair of distinct keys x, y  U,the number of hash functions h  Gfor which h(x) = h(y) is |G|/m • In other words: • With a random hash function from G the chance of a collision between x and y is exactly 1/m (x  y)

Universal Hashing • Theorem 11.3: • Choose h from a universal family of hash functions • Hash n keys into a table of m slots, nm • Then the expected number of collisions involving a particular key x is less than 1 • Proof: • For each pair of keys y, z, let cyx= 1 if y and z collide, 0 otherwise • E[cyz] = 1/m (by definition) • Let Cx be total number of collisions involving key x • Since nm, we have E[Cx] < 1

A Universal Hash Function • Choose table size m to be prime • Decompose key x into r+1 bytes, so that x = {x0, x1, …, xr} • Only requirement is that max value of byte < m • Let a = {a0, a1, …, ar} denote a sequence of r+1 elements chosen randomly from {0, 1, …, m - 1} • Define corresponding hash function ha G: • With this definition, Ghas mr+1 members

A Universal Hash Function • Gis a universal collection of hash functions (Theorem 11.5) • How to use: • Pick r based on m and the range of keys in U • Pick a hash function by (randomly) picking the a’s • Use that hash function on all keys

Example • Let m = 5, and the size of each string is 2 bits (binary). Note the maximum value of a string is 3 and m = 5 • a = 1,3, chosen at random from 0,1,2,3,4 • Example for x = 4 = 01,00 (note r = 1) • ha(4) = 1  (01) + 3  (00) = 1

Open Addressing • Basic idea (details in Section 12.4): • To insert: if slot is full, try another slot, …, until an open slot is found (probing) • To search, follow same sequence of probes as would be used when inserting the element • If reach element with correct key, return it • If reach a NULL pointer, element is not in table • Good for fixed sets (adding but no deletion) • Table needn’t be much bigger than n

Lecture 11:Binary Search Trees Shang-Hua Teng

Keys Keys Entry Satellite data Data Format

Insertion and Deletion on dynamic sets • Insert(S,x) • A modifying operation that augments the set S with the element pointed by x • Delete • Given a pointer x to an element in the set S, removes x from S • Notice that this operation uses a pointer to an element x, not a key value

Querying on dynamic sets • Search(S,k) • given a set S and a key value k, returns a pointer x to an element in S such that key[x] = k, or NIL if no such element belongs S • Minimum(S) • on a totally ordered set S that returns a pointer to the element S with the smallest key • Maximum(S) • Successor(S,x) • Given an element x whose key is from a totally ordered set S, returns a pointer to the next larger element in S, or NIL if x is the maximum element • Predecessor(S,x)

Edge interior node path Node subtree leaf child Trees root Degree? parent Depth/Level? Height?

Parent Data Left Right root • Tree • root Binary Tree • Node • data • left child • right child • parent (optional)

Trees • Full tree of height h • all leaves present at level h • all interior nodes full • total number of nodes in a full binary tree? • Complete tree of height h

+ * - 5 3 8 4 Applications - Expression Trees To represent infix expressions (5*3)+(8-4)

Applications - Parse Trees Used in compilers to check syntax statement statement else statement if cond then statement if cond then

Application - Game Trees

Binary Search Trees • Binary Search Trees (BSTs) are an important data structure for dynamic sets • In addition to satellite data, elements have: • key: an identifying field inducing a total ordering • left: pointer to a left child (may be NULL) • right: pointer to a right child (may be NULL) • p: pointer to a parent node (NULL for root)

F B H A D K Binary Search Trees • BST property: key[leftSubtree(x)]  key[x]  key[rightSubtree(x)] • Example:

Pre order visit the node go left go right In order go left visit the node go right Post order go left go right visit the node Level order / breadth first for d = 0 to height visit nodes at level d Traversals for a Binary Tree

A B C D E F G H I Traversal Examples Pre order A B D G H C E F I In order G D H B A E C F I Post order G H D B E I F C A Level order A B C D E F G H I

Traversal Implementation • recursive implementation of preorder • base case? • self reference • visit node • pre-order(left child) • pre-order(right child) • What changes need to be made for in-order, post-order?

Inorder Tree Walk • In order TreeWalk(x) TreeWalk(left[x]); print(x); TreeWalk(right[x]); • Prints elements in sorted (increasing) order

F B H A D K In order Tree Walk • Example: • How long will a tree walk take? • In order walk prints in monotonically increasing order. Why?

+ * - 5 3 8 4 Evaluating an expression tree • Walk the tree in postorder • When visiting a node, use the results of its children to evaluate it.

Operations on BSTs: Search • Given a key and a pointer to a node, returns an element with that key or NULL: TreeSearch(x, k) if (x = NULL or k = key[x]) return x; if (k < key[x]) return TreeSearch(left[x], k); else return TreeSearch(right[x], k);

F B H A D K BST Search: Example • Search for D and C:

Operations of BSTs: Insert • Adds an element x to the tree so that the binary search tree property continues to hold • The basic algorithm • Like the search procedure above • Insert x in place of NULL • Use a “trailing pointer” to keep track of where you came from (like inserting into singly linked list)

F B H A D K BST Insert: Example • Example: Insert C C

BST Search/Insert: Running Time • What is the running time of TreeSearch() or TreeInsert()? • O(h), where h = height of tree • What is the height of a binary search tree? • worst case: h = O(n) when tree is just a linear string of left or right children • We’ll keep all analysis in terms of h for now • Later we’ll see how to maintain h = O(lg n)

Sorting With Binary Search Trees • Informal code for sorting array A of length n: BSTSort(A) for i=1 to n TreeInsert(A[i]); InorderTreeWalk(root); • Argue that this is (n lg n) • What will be the running time in the • Worst case? • Average case? (hint: remind you of anything?)

3 1 8 2 6 5 7 Sorting With BSTs for i=1 to n TreeInsert(A[i]); InorderTreeWalk(root); • Average case analysis • It’s a form of quicksort! 3 1 8 2 6 7 5 1 2 8 6 7 5 2 6 7 5 5 7

Sorting with BSTs • Same partitions are done as with quicksort, but in a different order • In previous example • Everything was compared to 3 once • Then those items < 3 were compared to 1 once • Etc. • Same comparisons as quicksort, different order!

Sorting with BSTs • Since run time is proportional to the number of comparisons, same expected time as quicksort: O(n lg n) • Which do you think is better, quicksort or BSTsort? Why?

Sorting with BSTs • Since run time is proportional to the number of comparisons, same time as quicksort: O(n lg n) • Which do you think is better, quicksort or BSTSort? Why? • Answer: quicksort • Better constants • Sorts in place • Doesn’t need to build data structure

More BST Operations • A priority queue supports • Insert • Minimum • Extract-Min • BSTs are good for more than sorting. For example, can implement a priority queue

BST Operations: Minimum • How can we implement a Minimum() query? • What is the running time? • O(h)

BST Operations: Successor • For deletion, we will need a Successor() operation • What is the successor of node 3? Node 15? Node 13? • What are the general rules for finding the successor of node x? (hint: two cases)

BST Operations: Successor • Two cases: • x has a right subtree: successor is minimum node in right subtree • x has no right subtree: successor is first ancestor of x whose left child is also ancestor of x • Intuition: As long as you move to the left up the tree, you’re visiting smaller nodes. • Predecessor: similar algorithm

F Example: delete Kor H or B B H C A D K BST Operations: Delete • Deletion is a bit tricky • 3 cases: • x has no children: • Remove x • x has one child: • Splice out x • x has two children: • Swap x with successor • Perform case 1 or 2 to delete it

BST Operations: Delete • Why will case 2 always go to case 0 or case 1? • because when x has 2 children, its successor is the minimum in its right subtree • Could we swap x with predecessor instead of successor? • Of course

Next Lecture • Up next: guaranteeing an O(lg n) height tree

Animation • Animated Binary Tree

Universal Hashing

Universal Hashing

Presentation Transcript

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Universal Hashing

Hashing

Hashing

Hashing

Hashing

HASHING

Hashing

Hashing

Hashing, Hashing Tables

Hashing

Hashing

Hashing