**CSC 427: Data Structures and Algorithm Analysis** Fall 2004 Heaps and heap sort • complete tree, heap property • min-heaps & max-heaps • heap operations: insert, remove min/max • heap implementation • heap sort

**Tree balancing** • as we saw last time, specialize binary tree structures & algorithms can ensure O(log N) tree height O(log N) cost operations • e.g., an AVL tree ensures height < 2 log(N) + 2 • of course, the IDEAL would be to maintain minimal height • a complete tree is a tree in which • all leaves are on the same level or else on 2 adjacent levels • all leaves at the lowest level are as far left as possible • a complete tree will have minimal depth

**Heaps** • a heap is complete binary tree in which • for every node, the value stored is ≥ the value stored in either subtree technically, this is the definition of a max-heap, where root is max value in heap can also define min-heap, where root is min value in heap • since complete, a heap has minimal height • can insert in O(height) = O(log N) • searching is O(N) heaps are not good for general storage • however, heaps are perfect for implementing priority queues • can access max value in O(1), remove max value in O(height) = O(log N)

**Inserting into a heap** • note: insertion maintains completeness and the heap property • worst case, if add largest value, will have to swap all the way up to the root • but only nodes on the path are swapped O(height) = O(log N) swaps • to insert into a heap • place new item in next open leaf position • if new value is bigger than parent, then swap nodes • continue up toward the root, swapping with parent, until bigger parent found see http://www.cs.oberlin.edu/classes/dragn/labs/heaps/heaps5.html

**Removing root of a heap** • note: removing root maintains completeness and the heap property • worst case, if last value is smallest, will have to swap all the way down to leaf • but only nodes on the path are swapped O(height) = O(log N) swaps • to remove the max value (root) of a heap • replace root with last node on bottom level (note if left or right subtree) • if new root value is less than either child, swap with larger child • continue down toward the leaves, swapping with largest child, until largest see http://www.cs.oberlin.edu/classes/dragn/labs/heaps/heaps5.html

**Implementing a heap** • a heap provides for O(log N) insertion and remove max • but so do AVL trees and other balanced binary search tree variant • heaps also have a simple, vector-based implementation • since there are no holes in a heap, can store nodes in a vector, level-by-level • root is at index 0 • last leaf is at index v.size()-1 • for a node at index i, children are at 2*i+1 and 2*i+2 • to add at next available leaf, simply push_back

**Heap class** • template <class Comparable> • class Heap • { • public: • Heap() { } • void push(const Comparable & newItem) { /* LATER SLIDE */ } • void pop() { /* LATER SLIDE */ } • Comparable top() • { • return items[0]; • } • int size() • { • return items.size(); • } • private: • vector<Comparable> items; • void swapItems(int index1, int index2) • { • Comparable temp = items[index1]; • items[index1] = items[index2]; • items[index2] = temp; • } • }; we can define a templated Heap class to encapsulate heap operations could then be used whenever a priority queue is needed

**push method** • void push(const Comparable & newItem) • { • items.push_back(newItem); • int currentPos = items.size()-1, parentPos = (currentPos-1)/2; • while (parentPos >= 0) { • if (items[currentPos] > items[parentPos]) { • swapItems(currentPos, parentPos); • currentPos = parentPos; • parentPos = (currentPos-1)/2; • } • else { • break; • } • } • } • push works by • adding the new item at the next available leaf (i.e., pushes onto items vector) • follows path back toward root, swapping if out of order • recall: position of parent node in vector is (currenPos-1)/2

**pop method** • pop works by • replace root with value at last leaf (and pop from back of items) • follows path down from root, swapping with largest child if out of order • recall: position of child nodes in vector are 2*currentPos+1 and 2*currentPos+2 • void pop() • { • items[0] = items[items.size()-1]; • items.pop_back(); • int currentPos = 0, childPos = 1; • while (childPos < items.size()) { • if (childPos < items.size()-1 && items[childPos] < items[childPos+1]) { • childPos++; • } • if (items[currentPos] < items[childPos]) { • swapItems(currentPos, childPos); • currentPos = childPos; • childPos = 2*currentPos + 1; • } • else { • break; • } • } • }

**Heap sort** • the priority queue nature of heaps suggests an efficient sorting algorithm • start with the vector to be sorted • construct a heap out of the vector elements • repeatedly, remove max element and put back into the vector template <class Comparable> void HeapSort(vector<Comparable> & items) { Heap<int> itemHeap; for (int i = 0; i < items.size(); i++) { itemHeap.push(items[i]); } for (int i = items.size()-1; i >= 0; i--) { items[i] = itemHeap.top(); itemHeap.pop(); } } • N items in vector, each insertion can require O(log N) swaps to reheapify • construct heap in O(N log N) • N items in heap, each removal can require O(log N) swap to reheapify • copy back in O(N log N) thus, overall efficiency is O(N log N), which is as good as it gets! • can also implement so that the sorting is done in place, requires no extra storage

**Tuesday: TEST 2** • SIMILAR TO TEST 1, will contain a mixture of question types • quick-and-dirty, factual knowledge • e.g., TRUE/FALSE, multiple choice • conceptual understanding • e.g., short answer, explain code • practical knowledge & programming skills • trace/analyze/modify/augment code • cumulative, but will emphasize material since the last test • study advice: • review lecture notes (if not mentioned in notes, will not be on test) • read text to augment conceptual understanding, see more examples • review quizzes and homeworks • review TEST 1 for question formats • feel free to review other sources (lots of C++/algorithms tutorials online)