820 likes | 923 Views
Understand the intricacies of Binary Search methods and Binary Search Trees (BSTs), including sequential search, binary search, and implementing BSTs in C++. Discover efficient behaviors for BSTs and lookup complexities. Explore examples and learn about tree height to size ratio.
E N D
Binary Search and Trees Joe Meehean
Searching • Important and common problem • Given a collection, determine whether value v is a member • Common variation • given a collection of unique keys,each associated with a value • find value v associated with the key k • find the mapping
Examples • Dictionary • key => word • value => definition • Phonebook • key => name • value => phone number • Webpage • key => address • value => html files and pictures
Searching an Array • Problem: given an array of N values, determine if v in one of them • Two approaches • sequential search • binary search
Sequential Search • Look at each value in turn (iterate) • e.g., a[0], a[1], … • quit when v is found or end of array reached • worst case time: O(N) • What if a is sorted? • look at each value in turn • quit when v is found or end of array reached • OR, when current value is > v • worst case time still O(N)
Binary Search • Array must be sorted • (a[0] <= a[1] <= a[2] … <= a[n]) • Algorithm • like the Clock Game on Price is Right
Binary Search • Array must be sorted • (a[0] <= a[1] <= a[2] … <= a[n]) • Algorithm • look at middle value x in array • if x == v • else eliminate ½ the array • if v < x, eliminate the right half • if v > x, eliminate the left half • repeat until v is found or no remaining values
Binary Search in C++ • Array or vector must be sorted • Data type must provide < operator • if !(a < b) && !(b < a) then b == a • Or Comparator
Log2(N) • Number of times N can be divided by 2 • In Big O it is O(logN) • difference between log2 N and log N is a constant • Scales better than O(N) • O(logN) algorithms are faster • If N = 1024, log(N) is 10
Log2(N): Binary Search ant bat cat elk fox owl dog rat bar ant bat cat elk fox owl dog rat Throws away half the entries at every compare bar ant bat cat elk fox owl dog rat bar ant bat cat elk fox owl dog rat bar
What if we made a special data structure that represents a binary search?
Binary Search Trees • Special kind of binary tree • Each node stores a key • sometimes an associated value • For each node n • all keys in n’s left subtree are < key at n • all keys in n’s right subtree are > key at n • if duplicate keys allowed • keys that equal n can go left XOR right (not both)
Efficient behaviors for BSTs • Insert a key (and associated data) • Lookup a key (and associated data) • Remove a key (…) • Print all keys in sorted • using an inorder traversal
Examples 6 6 4 9 9 4 4 2 5 2 7 2 5 1 3 Yes In order traversal produces: 2 4 5 6 9 No: 7 is not < 6
Implementing BSTs // private inner class of BST<K> class BinaryNode{ public: K key_; BinaryNode * left_; BinaryNode * right_; //constructors }
Implementing BSTs template <class K,class Compare=less<K> > class BST <K,Compare>{ private: BinaryNode* root_; Compare isLessThan; public: BST() {root_ = NULL;} bool insert(const K& key); bool lookup(const K& key); void delete(const K& key); }
Lookup • Key is in BST if it is in • the root • the left subtree • or the right subtree • Don’t need to look in both subtrees • just like binary search in an array
Lookup // public driver method // method of BST bool lookup(K& key){ // private recursive helper method // on next slide return lookup(root, key); }
Lookup // private method of BST bool lookup(Bnode* n, K& k){ if( n == NULL ) return false; else if( isLessThan(k, n->key) ) return lookup(n->left,k); else if( isLessThan(n->key, k)) return lookup(n->right, k); else return true; }
Class Activity 6 • Cases • empty (null) subtree • value found • next look left • next look right • Shout it out • lookup(4) • lookup(5) • lookup(3) 9 4 2 5
Time for Lookup • Always follows path from root down • Worst-case • goes to a leaf along longest path • proportional to tree height • Height related to size • best case tree is balanced • all non-leaf nodes have 2 children • all leafs at the same depth
Tree Height to Size • Best case tree is balanced • all non-leaf nodes have 2 children • all leafs at the same depth • height is log2N • Worst case tree is linear • all non-leaf nodes have a single child • height is N
Log2(N): Binary Search Tree 6 lookup(2) 9 4 3 5 7 15 2
Log2(N): Binary Search Tree 6 lookup(2) 9 4 3 5 7 15 2 Eliminates half the nodes at every compare
Log2(N): Binary Search Tree 6 lookup(2) 9 4 3 5 7 15 2 Eliminates half the nodes at every compare
Log2(N): Binary Search Tree 6 lookup(2) 9 4 3 5 7 15 2 Eliminates half the nodes at every compare
Lookup Complexity • Worst-case • O(height of tree) • Worst of worst • height is N • lookup is O(N) • Best worst-case • height is log2N • lookup is O(logN) • O(LogN) is waaaay better than O(N)
Insert • New values inserted as leaves • Must choose position to respect BST ordering • and to ensure we can find it with a lookup • Duplicate keys are not allowed
Insert • Traverse the tree • like a lookup • If we find a duplicate • return an error • If we end up at a null (child of a leaf) • make a new node with the key • make it the child of the leaf • Note the above two were our base cases for lookup too
Insert // members of BST void insert(const K& key){ insert(root, k); } void insert( BinaryNode*& n, const K& key){ if( n == NULL ){ n = new BinaryNode(key); }else if( isLessThan(k, n->key_) ){ insert(k, n->left_); }else if( isLessThan(n->key_, k) ){ insert(k, n->right_); }else{ //duplicate, do nothing } }
CLASS ACTIVITY • First names BST • You add your names
Time for Insert • Similar to lookup • worst-case follow path from root to leaf • O(logN) for a balanced tree • O(N) for a completely unbalanced tree
Delete Overview • Find the node nw/ key to be deleted • Different actions depending on n’s # of kids • Case 1: n has 0 kids (it’s a leaf) • set parent’s n-pointer (left or right) to null • Case 2: n has 1 kid • set parent’s n-pointer to point to n’s only kid • Case 3: n has 2 kids • replace n’s key with a key further down in the tree • delete that node
Delete Overview • What node value can replace n’s value? • new value of n must be: • > all values in left subtree • < all values in right subtree • Largest value from the left subtree • Smallest value from the right subtree • let’s choose this one (arbitrarily) • use findMin on root of right subtree
Example: Case 1 8 delete(17) 15 … 20 … … 18 … 16 17
Example: Case 1 8 delete(17) 15 … 20 … … 18 … 16 17
Example: Case 2 8 delete(16) 15 … 20 … … 18 … 16 17
Example: Case 2 8 delete(16) 15 … 20 … … 18 … 16 17
Example: Case 3 8 delete(15) 15 … 20 … … 18 … 16 Smallest value in right subtree 17
Example: Case 3 8 delete(15) 16 … 20 … … 18 … 16 Case 2: 1 kid Replace 16 with it’s only child 17
Example: Case 3 8 delete(15) 16 … 20 … … 18 … 16 Case 2: 1 kid Replace 16 with it’s only child 17
Delete Review • Find the node nw/ key to be deleted • Different actions depending on n’s # of kids • case 1: n has 0 kids (it’s a leaf) • set parent’s n-pointer (left or right) to null • case 2: n has 1 kid • set parent’s n-pointer to point to n’s only kid • case 3: n has 2 kids • replace n’s key with a key further down in the tree • delete that node
Delete Details Case 1 (n is leaf) and case 2 (n has 1 kid) both need to update the parents pointer How? Pass a reference to that pointer
Delete Implementation // publicly visible method void BST<K>::delete(const K& key){ delete(root_, key); } // private helper method void BST<K>::delete( Node<K> *&n, const K& k){ // base case 1 (key not in tree) if( n == null ){ return; } ... }
Delete Implementation // private helper method void BST<K>::delete( Node<K> *&n, const K& k){ ... if( isLessThan(k, n->key) ){ delete(n->left, k); }else if( isLessThan(n->key, k) ){ delete(n->right, k); } ... }
Delete Implementation // private helper method void BST<K>::delete( Node<K>*& n, const K& k){ ... // case 3 (has two children) else if( n->left != NULL && n->right != NULL ){ Node<K>** tmp = findMin(&n->right); n->key = (*tmp)->key; // handles cases 1 & 2 for tmp removeNodeSimple(*tmp); } ... }