CSC 336 – Algorithms and Data Structures

CSC 336 – Algorithms and Data Structures Heaps – Priority Queue – Array – Binary Tree? Dr. Paige H. Meeker Computer Science Presbyterian College, Clinton, SC

Definition The binary heap data structure is an array object that can be viewed as a nearly complete binary tree. Each node of the tree corresponds to an element of the array that stores the value in the node. The tree is completely filled on all levels except possibly the lowest, which is filled from the left up to a point.

Max and Min There are two kinds of heaps – max-heaps and min-heaps • Max-Heaps: Must satisfy the property that for every node i other than the root, the value of node i is at most the value of its parent. • Min-Heaps: Must satisfy the property that for every node i other than the root, the value of node i is at least the value of its parent.

Heap Property • So, with heaps, either the minimum or maximum element is at the root of the tree. Why not use an AVL tree and keep these elements at the far left/far right respectively? • Well, to find it would cost logarithmic time (instead of constant) for any node that is along the path from the root to a leaf.

So… • Is every heap a BST? • Is every heap an AVL Tree? • Are any heaps BST’s? • Are any heaps AVL Tree’s? • Are any BST’s heaps? • Are any AVL Tree’s heaps?

Which are min-heaps? 10 20 10 20 80 10 80 20 80 30 15 40 60 85 99 40 60 85 99 10 50 700 50 700 10 20 80 10 20 80 40 60 85 99 20 80 50 700 40 60 99 40 60

wrong! wrong! wrong! wrong! Which are min-heaps? 10 20 10 20 80 10 80 20 80 40 60 85 99 40 60 85 99 30 15 50 700 50 700 10 10 20 80 10 20 80 40 60 85 99 20 80 50 700 40 60 99 40 60

30 30 10 10 20 40 50 30 35 22 9 28 18 11 Which are Max-Heaps? 48 80 21 10 30 25 14 24 33 10 17 7 3

wrong! 30 30 10 10 20 40 50 30 35 wrong! 22 9 28 18 11 Which are Max-Heaps? 48 80 21 10 30 25 14 24 33 10 17 7 3

Heap height and runtime • height of a complete tree is always log n, because it is always balanced • this suggests that searches, adds, and removes will have O(log n) worst-case runtime • because of this, if we implement a priority queue using a heap, we can provide the O(log n) runtime required for the add and remove operations n-node complete tree of height h: h = log n 2hn 2h+1 - 1

Adding to a heap • when an element is added to a heap, it should be initially placed as the rightmost leaf (to maintain the completeness property) • heap ordering property becomes broken! 10 10 20 80 20 80 40 60 85 99 40 60 85 99 50 700 65 50 700 65 15

Adding to a heap, cont'd. • to restore heap ordering property, the newly added element must be shifted upward ("bubbled up") until it reaches its proper place • bubble up (aka: "percolate up") by swapping with parent • how many bubble-ups could be necessary, at most? 10 10 20 80 15 80 40 60 85 99 40 20 85 99 50 700 65 15 50 700 65 60

16 18 5 11 16 11 3 18 3 5 16 18 11 3 5 Adding to a max-heap • same operations, but must bubble up larger values to top

Heap practice problem • Draw the state of the min-heap tree after adding the following elements to it: 6, 50, 11, 25, 42, 20, 104, 76, 19, 55, 88, 2

The peek operation • peek on a min-heap is trivial; because of the heap properties, the minimum element is always the root • peek is O(1) • peek on a max-heap would be O(1) as well, but would return you the maximum element and not the minimum one 10 20 80 40 60 85 99 50 700 65

Removing from a min-heap • min-heaps only support remove of the min element (the root) • must remove the root while maintaining heap completeness and ordering properties • intuitively, the last leaf must disappear to keep it a heap • initially, just swap root with last leaf (we'll fix it) 10 65 20 80 20 80 40 60 85 99 40 60 85 99 700 50 65 700 50 65

Removing from heap, cont'd. • must fix heap-ordering property; root is out of order • shift the root downward ("bubble down") until it's in place • swap it with its smaller child each time • What happens if we don't always swap with the smaller child? 65 20 20 80 40 80 40 60 85 99 50 60 85 99 700 50 700 65

Heap practice problem • Assuming that you have a heap with the following elements to it (from the last question): 6, 50, 11, 25, 42, 20, 104, 76, 19, 55, 88, 2 • Show the state of the heap after remove() has been executed on it 3 times.

Turning any input into a heap • we can quickly turn any complete tree of comparable elements into a heap with a buildHeap algorithm • simply perform a "bubble down" operation on every node that is not a leaf, starting from the rightmost internal node and working back to the root • why does this buildHeap operation work? • how long does it take to finish? (big-Oh) 45 6 21 18 14 18 14 60 32 6 21 60 32 45

i Item Left 2*i Right Child Child Pam 1 Pam 2 2 3 2 Joe 4 4 5 Joe Sue 3 Sue 6 6 7 4 Bob 8 8 9 Bob Mike Tom Sam 5 Mike 10 10 -1 6 Sam -1 11 -1 7 Tom -1 13 -1 Ann Jane Mary 8 Ann -1 15 -1 9 Jane -1 17 -1 10 Mary -1 19 -1 Array tree implementation • corollary: a complete binary tree can be implemented using an array (the example tree shown is not a heap) • LeftChild(i) = 2*i • RightChild(i) = 2*i + 1

i Item Parent i / 2 1 Pam -1 0 Pam 2 Joe 1 1 3 Sue 1 1 Joe Sue 4 Bob 2 2 5 Mike 2 2 Bob Mike Tom Sam 6 Sam 3 3 7 Tom 3 3 8 Ann 4 4 Ann Jane Mary 9 Jane 4 4 10 Mary 5 5 Array binary tree - parent • Parent(i) =  i / 2 

Implementation of a heap when implementing a complete binary tree, we actually can "cheat" and just use an array • index of root = 1 (leave 0 empty for simplicity) • for any node n at index i, • index of n.left = 2i • index of n.right = 2i + 1

Advantages of array heap the "implicit representation" of a heap in an array makes several operations very fast • add a new node at the end (O(1)) • from a node, find its parent (O(1)) • swap parent and child (O(1)) • a lot of dynamic memory allocation of tree nodes is avoided • the algorithms shown usually have elegant solutions

Hashing Notes from Weiss, Ch 20 and Notes by Greg McCarra from Napier University: http://www.nada.kth.se/kurser/kth/2D1345/inda03/hashingReading.pdf

Introduction • What is hashing? Why is it useful to us? • Well, there are lots of applications out there that need to support ONLY the operations INSERT, SEARCH, and DELETE. These are known as “dictionary” operations. • Hashing can make this happen in as much as O(n) but as little as O(1) and is quite fast in practice. Let’s learn more…

Example We have a small group of people who wish to join a club (say about 40 folks). Then, if each of these people have an ID# associated with them (from 1 to 40) we could store their information in an array and access it using the ID# as the array index.

Example Now, we have 7 of these clubs, with consecutive ID#s going up to 280. Now what? • We COULD create a 280 element array and use 40 elements for each club. (wasteful?) • We COULD create a 40 element array and calculate the index of each person using a mapping. (index = ID# - 240).

Example Now, imagine that we are hosting a club in campus open to all students. We could use the PC ID# (8 digits long). How big should our array be? THINGS TO CONSIDER: • How many students do we expect to join? • How can we create a key based on this number?

Hash Functions • If we expect no more than 100 club members, we can use the last two digits of the PC ID# as our index (aka KEY). Do we see any problems with this? • How do we get this number? • Take the remainder • (PC ID# % 100)

Hash Functions • Taking the remainder is called the Division-remainder technique and is an example of a uniform hash function • A uniform hash function is designed to distribute the keys roughly evenly into the available positions within the array (or hash table).

Collisions • So what about students 20061234 and 20071234? They will hash to the same position in the table! What do we do?

Collisions If no two values are able to map into the same position in the hash table, we have what is known as an ideal hashing. For the hash function f, each key k maps into position f(k). Then, to search for an element, we simply compute its hash function and look it up in the table.

Collisions • Usually, ideal hashing is not possible (are at least not guaranteed). Some data is bound to hash to the same table element, in which case, we have a collision. • How do we solve this problem?

Collisions • We can think of each table location as a “bucket” that contains several slots. Each slot is filled with one piece of data. • This approach involves “chaining” the data. This is a common approach when the hash table is used as disk storage. For each element of the table, a linked list (of sorts) is maintained to hold data that map to the same location. This list can grow as items are entered (unordered) or enter items into the list in a sorted fashion (for easier retrieval).

Collisions • Other solutions? • Linear Probing • Quadratic Probing • Designing a Good Hash Function

Linear Probing • Have you ever been to a theatre or sports event where the tickets were numbered? • Has someone ever sat in your seat? • How did you resolve this problem?

Linear Probing Linear Probing involves seeing an item in the hashed location and then moving by 1 through the array (circling to the beginning if necessary) until an open location is found.

Linear Probing • Let’s say that we have 1000 numbered tickets to an event, but only sell 400. If we move the event to a smaller venue, we must also renumber the tickets. The hash function would work like this: • (ticket number) % 400. • How many folks can get the same hashed number? (3 - for example, tickets 42, 442, and 842)

Linear Probing • The idea is that even though these number hash to the same location, they need to be given a slot based on their hash number index. Using linear probing, the entries are placed into the next available position.

Linear Probing • Consider the data with keys: 24, 42, 34,62,73 into a table of size 10. These entries can be placed into the table at the following locations:

Linear Probing • 24 % 10 = 4. Position is free. 24 placed into element 4 • 42 % 10 = 2. Position is free. 42 placed into element 2 • 34 % 10 = 4. Position is occupied. Try next place in the table (5). 34 placed into position 5. • 62 % 10 = 2. Position is occupied. Try next place in the table (3). 62 placed into position 3. • 73 % 10 = 3. Position is occupied. Try next place in the table (4). Same problem. Try (5). Then (6). 73 is placed into position 6.

Linear Probing • How would it look if the numbers were: • 28, 19, 59, 68, 89??

Finding and Deleting • Finding? • Deleting? • we must be more careful. Having found the element, we can’t just remove it. Why? • Use lazy deletion

Clustering • Sometimes, data will cluster – this is caused when many elements hash to the same (or similar) location and linear probing has been used often. We can help with this problem by choosing our divisor carefully in our hash function and by carefully choosing our table size.

Designing a Good Hash Function • If the divisor is even and there are more even than odd key values, the hash function will produce an excess of even values. This is also true if there are and excessive amount of odd values. • However, if the divisor is odd, then either kind of excess of key values would still give a balanced distribution of odd/even results. • Thus, the divisor should be odd. But, this is not enough.

Designing a Good Hash Function • Thus, the divisor should be odd. But, this is not enough. • If the divisor itself is divisible by a small odd number (like 3, 5, or 7) the results are unbalanced again. Ideally, it should be a prime number. If no such prime number works for our table size (the divisor, remember?), we should use an odd number with no small factors.

Problems of Linear Probing • The majority of the problems are caused by clustering. These problems can be helped by using Quadratic probing instead.

Quadratic Probing • Works like linear probing but instead of looking to the next available position, the next location is chosen by looking at the positions that are 12, 22, 32, etc. positions ahead.

Quadratic Probing • Consider the data with keys: 24, 42, 34,62,73 into a table of size 10. These entries can be placed into the table at the following locations:

Quadratic Probing • 24 % 10 = 4. Position is free. 24 placed into element 4 • 42 % 10 = 2. Position is free. 42 placed into element 2 • 34 % 10 = 4. Position is occupied. Try place 12 away in the table (5). 34 placed into position 5. • 62 % 10 = 2. Position is occupied. Try place 12 away in the table. (3) 62 placed into position 3. • 73 % 10 = 3. Position is occupied. Try place 12 away in the table (4). Same problem. Try place 22 away in the table (6). 73 is placed into position 6. • Thus, we jumped over the existing cluster. • This doesn’t completely solve our problem, but it helps.

CSC 336 – Algorithms and Data Structures

CSC 336 – Algorithms and Data Structures

Presentation Transcript

Data Structures

Data Stream Algorithms Intro, Sampling, Entropy

Chapter 1: Foundations: Sets, Logic, and Algorithms

Architectures and Algorithms for Data Privacy

Graph Algorithms

Tutorial on Statistical N-Body Problems and Proximity Data Structures

BİM 202 ALGORITHMS

Recursion

Algorithms and Data Structures for Low-Dimensional Topology

Introduction to Algorithms and Data Structures

CSC 211 Data Structures Lecture 17

Abstract Data Types and Stacks

Kagan Structures

Intro to Computer Science I

241-423 Advanced Data Structures and Algorithms

CS 61b: Final Review

241-423 Advanced Data Structures and Algorithms

CS221: Algorithms and Data Structures Lecture #1 Complexity Theory and Asymptotic Analysis

Genetic Algorithms

CS 235102 Data Structures ( 資料結構 )

Chapter 3: The Fundamentals: Algorithms, the Integers, and Matrices

C++ Plus Data Structures