Ch 13: Advanced Table Implementations

Ch 13: Advanced Table Implementations • As we saw in chapter 11 • the ordered binary tree ADT offers a good compromise between • the rigid size and need for shifting in an array implementation and the • need for sequential search found in an ordered or unordered linked list • but, tree operations are only as efficient as the shape of the tree • the shape of a binary tree is based on the order that elements are inserted and deleted, which is beyond our control • the tree performs best when the tree is height-balanced • if we have ~ log n levels for n nodes, insert/delete/retrieval is O(log n) • but a tree’s shape could be much worse, approaching a linear list, in which case operations deteriorate to O(n) • so we have a vested interest in keeping our tree’s nicely shaped, how? • Here, we explore ways to keep a tree balanced (or as close to height-balanced as possible) • these approaches are based on the idea a tree may only become imbalanced during an insert or delete, so we enhance these operations to rotate nodes around to keep the tree height balanced

Here we see two example trees of the same values the order that values were inserted dictate the tree’s shape in the first tree, values were added in numeric order resulting in a linear list in the second case, values were added in such a way that the tree maintains its balanced shape Example: Height Balancing Values may have been inserted in order as : 40, 20, 10, 30, 60, 50, 70 or 40, 20, 60, 10, 30, 50, 70

2-3 Trees • We start with the 2-3 Tree • unlike the binary tree, a 2-3 tree has a node that can contain either 1 or 2 data and can have either 2 or 3 children • If a node has 1 datum then it has 2 children • If a node has 2 data then it has 3 children • the relationship between data is shown in the figure below • The 2-3 tree’s insert and delete operations rotate nodes and values to make sure that the tree always has its leaf nodes at the same level • thus ensuring that the tree is always height balanced

Properties of a 2-3 Tree • Nodes with 2 children (1 datum) are 2-nodes • Nodes with 3 children (2 data) are 3-nodes • New values are always added at the leaf level • All leaf nodes are at the same level (ensures height balancing) • If the leaf node being added to is a 2-node, then just arrange the 2 data in the node appropriately • If the leaf node being added to is a 3 node, it must first be split into two 2 nodes so that we can insert the new datum A 2-3 Tree with 6 2-nodes and 5 3-nodes for a total of 16 values if a 2-3 tree contains only 2 nodes then it is equivalent to a full binary tree if a 2-3 tree contains only 3 nodes then there are log3 n/2 levels – why? the height of the 2-3 ranges between log2 n and log3 n/2

2-3 Tree as an ADT • The 2-3 Tree requires: • a 2-3 TreeNode which will have • 2 data fields: firstDatum and secondDatum, or smallItem and largeItem • 3 pointers to TreeNode objects: leftChild, middleChild and rightChild • if there is only 1 datum, it goes in firstDatum and middleChild is null • we need to either add another variable to indicate whether the node is a 2 node or a 3 node (perhaps called type) or we check to see if middleChild is null or not (the second solution will fail us if we are looking at a leaf node because all leaf nodes’ middleChild values are null) • if leftChild is also null, then the node is a leaf node, else if only middleChild is null then the node is a 2 node, otherwise the node is a 3 node • the 2-3 Tree ADT requires methods to • search for a given node • traverse the entire tree (inorder, possibly also pre and postorder) • insert a new value • delete a given value from the tree • create a new empty tree, destroy a tree, return whether the tree is empty or not, possibly return the size and height of the tree

Search Method • Search is more complicated than the binary tree search since we have to look at 2 values in any given node to see if either match or to determine which subtree to public Tree23Node search(Tree23Node root, Object value) { if(root = = null) return null else if(root.getType( ) = = 3) // node is a 3-node, check all 5 cases if(value.compareTo(root.getFirstDatum( )) = = 0) return root; else if(value.compareTo(root.getFirstDatum( )) < 0) return search(root.getFirstChild( ), value); else if(value.compareTo(root.getSecondDatum( )) = = 0) return root; else if(value.compareTo(root.getSecondDatum( )) < 0) return search(root.getMiddleChild( ), value); else return search(root.getThirdChild( ), value); else // node is a 2-node, check all 3 cases if(value.compareTo(root.getFirstDatum( )) = = 0) return root; else if(value.compareTo(root.getFirstDatum( )) < 0) return search(root.getFirstChild( ), value); else return search(root.getThirdChild( ), value); }

Traversal Method • Here we only consider the inorder traversal • preorder and postorder will be similar and you should be able to figure them out on your own public void inorder(Tree23Node root) { if (root != null) { inorder(root.getFirstChild( )); System.out.println(root.getFirstDatum( )); if(root.getMiddleChild( ) != null) { inorder(root.getMiddleChilde( )); System.out.println(root.getSecondDatum( )); } inorder(root.getThirdChild( )); } } For a preorder traversal, we visit the node here For a postorder traversal, we visit the node here

As already mentioned, inserts will only occur in a leaf node there are two possibilities, the node being inserted into is: a 2 node the new value is merely added although this may require shifting firstDatum into secondDatum depending on which value is greater a 3 node there is no room for the new value since both firstDatum and secondDatum have values we split the node into two 2 nodes by creating an additional node we move the middle value up to the parent node if parent node is a 2 node, it becomes a 3 node, the new node is added to the parent node as a middleChild with the smallest and largest values being positioned in the firstDatum slots of this node and the new node if the parent node was a 3 node, then we recursively split that node as well A split that occurs at an internal node is much the same except that no physical insertion (new value) is done Splitting the root node creates 2 new nodes instead of 1 since there was no previous parent node to move the middle value into Inserting into a 2-3 Tree

Example Inserts insert 39 insert 38 requires a split, moving 39 up to its parent here is how the split works – 38/39/40 must be split, 39 is moved to its parent with 38 in its own node and 40 moved into a new node and the 2 child nodes arranged appropriately

Example Continued inserting 36 requires a split of 36/37/38 with 37 moving up to 30/39 – but this requires a split moving 37 up to the root inserting 37, no split needed now, we insert 35 (added to the node with 36), followed by 33 (added to the node with 35/36, causing a split, so we now have 33 in one node, 36 in another, and 35 moved to the parent when we add 34, we have the tree as shown here what happens if we add 32?

Inserting 32 Inserting 32 requires that the node with 33/34 be split and the middle value (33) moved up, but this now requires splitting 30/33/35,moving the middle value up to 37/50, but this node also requires splitting the middle value (37) becomes the new root adding 1 level, but the tree remains height balanced

Insert Algorithm – Pictorially • pseudocode for the insert algorithm is given on page 670-1 • here we look at how the splits occur physically 1. inserting a value into a 3-node that is the firstChild of a 2-node: – create a new node and rotate the middle and largest values to the parent node and new node, attaching the new node as the middleChild 2. inserting a value into a 3-node that is the thirdChild of a 2-node: – create a new node and rotate the smallest and middle values to the new node and parent node, attaching the new node as the middleChild

Insert Continued • If the value being moved up is being placed into a 3-node, then just repeat the previous step recursively • If the 3-node to be split is the root node, we have a special case, and we do the following: • If this case occurs, we have split a lower node from a 3-node to two 2-nodes and • moved one value up to the root node • Create two new nodes, insert the middle value and the largest value into the new • nodes and redistribute the 3 children plus the new node caused by the lower split • into firstChild and thirdChild for the two children of the root node as shown above

Deleting from a 2-3 Tree • The deletion algorithm is like the deletion from a binary tree • find the node containing the value to be deleted • find the value that comes next in the tree • swap the next value with the value to be deleted • delete the swapped value which is now in a leaf node • which is either a 3-node and we do not have to physically delete a node, just possibly move secondDatum to firstDatum, or is a 2-node and we have to take care of removing the node from the tree • recall all leaf nodes are on the bottom level, deleting a node would change this • The insert required a “split” process, delete requires a “merge” • the pseudocode for deletion is given on page 677-678 • we won’t cover the details since it is a very complicated process, but we examine the possible cases: • if the node containing the value to be removed is a 3- node, delete the value • if the node is a 2-node and has a sibling that is a 3 node, redistribute the values between the sibling, parent and current node • otherwise, merge the sibling with the parent and move up to the parent level and perform the deletion recursively • if you recurse up to the root node, then reset the root pointer to point at the new merged node

Delete Example We want to delete 70, and so we swap 70 and its inorder successor (80) In the top figure, we have swapped 70 and 80 Now, we must delete 70 from its new position in a leaf node, however, it is in a 2-node so we must merge one of its parents (a 3-node) with a child to create a 2-node at the parent level and two children

Example Continued Here, we delete 100, but we cannot collapse 90 into that node – rotating the values around also does not work as seen below since 80 would no longer be in its proper position with respect to the ordering property, so instead, we redistribute the three values of 60, 80 and 90 to form the new subtree The tree once we are done with the redistribution

Example Continued At this point, we delete 80 by swapping it with 90 But how do we delete 80? It is in a leaf node and there aren’t enough values to redistribute between 60 and 90 First, merge 90 with 60 Now, with a value missing, we collapse the height of the tree, bringing the root into a 3-node with 30 and attaching the subtree of 60/80 as one of the children

Analysis of 2-3 Tree • There are two problems with the 2-3 tree • first, the code is very difficult, especially the delete • second, the 2-3 tree has a tendency to waste memory • every 2-3 Tree Node requires space for 2 data & 3 pointers, but may currently be storing 1 datum & 2 pointers • a tree of n values could use n nodes in which case we are only using 60% of the space set aside • On the other hand, since the 2-3 Tree is always height balanced and contains between log2 n and log3 n/2 levels, all algorithms are bound by O(log n) • except for traversal which is O(n) • So this ADT is the most efficient so far of all of our sorted or ordered list ADTs • can we improve? Sort of…

2-3-4 Tree • The 2-3-4 Tree is much like the 2-3 Tree except now nodes can store up to 3 data and 4 pointers (4 nodes) • height of a 2-3-4 tree ranges like a 2-3 tree but in this case, could be between log2 n and log4 n/3 levels making it slightly more efficient (possibly) • space usage for a 2-3-4 Tree Node is now 3 data and 4 pointers • of which we might only be using 1 datum and 2 pointers so we could waste as much as 4/7s of the space utilization • There are two advantages to the 2-3-4 tree • first, we reimplement the add and delete algorithms to make them somewhat simpler than that of the 2-3 tree • second, the 2-3-4 Tree Node can conceptually be thought of as a binary tree node with a couple of special properties – this is known as a red-black tree, which we will study later

Example Tree • Notice that this is the same set of values as we previously saw with our first 2-3 tree after inserting 32 • This tree is shallower (depth of 3 instead of 4) • notice that there are several 2-nodes in this tree (nodes with 1 datum and 2 pointers) so this tree is less space efficient than the 2-3 tree even though it is very compact • there is only one 4-node so only one node is using all of the space efficiently

2-3-4 Tree Implementation • The 2-3-4 Tree Node extends the 2-3 Tree Node by having • a thirdDatum • a fourthChild • and type can be 2, 3, or 4 • A 2-node will use its firstChild and fourthChild only • A 3-node will use its firstChild, secondChild and fourthChild only • the relationship between values in a node and the subtrees is shown below • The search method is similar to the 2-3 Tree search except that now it has additional cases as you would expect depending on if the node was a 4-node or not • i’ll leave it up to you to consider how to implement the search method for the 2-3-4 tree

2-3-4 Tree Insert: Splitting Nodes • The main difference between the 2-3 and 2-3-4 trees is how we will handle inserts and deletes • in the 2-3 tree, we inserted at the leaf and then worried about splitting nodes recursively back up the tree • in the 2-3-4 tree, we will search from root down to leaf to find the proper position for the insert, but if we come across any 4-node in our search, we will split that node immediately • By splitting on the way down • we don’t have to worry about working our way back up the tree making it easier to implement • we more clearly separate the split mechanism from the insert mechanism, inserting is now a matter of adding to one of the three data slots since no node will be a 4-node (it would have already been split)

Types of Splits • There are 6 cases for splitting a 4-node • the 4-node is the root (case 1) • this turns out to be the simplest case • the 4-node is the child of a 2-node, two subcases: • the 4-node is the first child (case 2) • the 4-node is the fourth child (case 3) • the 4-node is the child of a 3-node, three subcases: • the 4-node is the first child (case 4) • the 4-node is the second child (case 5) • the 4-node is the fourth child (case 6) • In all 6 cases, we have to • create a new node and shift the 3 values so that • one is moved to the new node • one is moved to the parent • one remains in the current node • then we have to reattach the 4 children appropriately to the old node and the new node

Case By Case (1-3) The root split is easy, create 2 new nodes, move the 3 values (the middle value becomes the new root), and move the 3rd and 4th children to become the 1st and 4th of the new node In cases 2 and 3, a new node is created and the 3 values are moved between the new node, the parent and the current node with the 3rd and 4th children or 1st and 2nd children attached to the new node

Case By Case (4-6) Like cases 2 and 3, here a single new node is created, and the 3 values are distributed by moving one to the new node and one to the parent The pattern of reattaching the children is the same in case 4 and 5, the 3rd and 4th children become the 1st and 2nd children of the new node, but in case 6, it is the 1st and 2nd children that are moved to the new node

Example Let’s start with a 2-3-4 tree of 1 node containing 10-30-60 And now we add 20 easily We want to insert 20, but first, we split our root 4-node We insert 40 into the node with 60 without having to split anything, and then we add 50 to the same node (still no splitting since it was not a 4-node when we first reached it) Now we add 70 – but once we reach the node with 40-50-60, we have to split it resulting in the following tree: But now adding 70 is easy

Example Continued After inserting 15 and 80: Now we want to insert 90, but in searching for it’s proper place, we find 60-70-80 and need to split it (this is case 6) resulting in the tree below to the left, and now we can add 90 as shown below Next, we insert 100, but first we have to split the root node resulting in the tree below to the left and then our final tree is given below

Deletions • Like with the 2-3 and binary tree, to delete we • we swap the value to be deleted with the inorder successor • and delete the value from the new position, which will always be a leaf node • We can safely remove a value from a leaf node if that node is not a 2-node • when searching for the node containing the value to be deleted we will merge (collapse) any 2 node into a larger node • in this way, we can be assured that any physical removal will take place only from a 3-node or 4-node • this is simpler than the various possible cases that the 2-3 tree had, but there are many merge situations like there were 6 split situations • the cases depend on the type of node that is the given 2-node’s parent and a next sibling to the left or right • we won’t be covering them here

Red-Black Trees • There is an interesting relationship between a 2-3-4 Tree Node and a binary tree node: • if the 2-3-4 Tree Node is a 2-node, it is almost identical to the binary tree node (1 datum, 2 pointers) • it just has extra space that is currently unused) • if the 2-3-4 Tree Node is a 4-node, it can be thought of as a special case of a binary subtree • secondDatum is the root of the subtree • firstDatum is the left child of secondDatum • thirdDatum is the right child of secondDatum • firstChild and secondChild pointers are the left and right of firstDatum • thirdChild and fourthChild pointers are the left and right of thirdDatum • in this case, firstDatum and thirdDatum really represent the same node with secondDatum, but can be implement as three binary tree nodes • to denote the difference between nodes being conceptually shared node as in the case of a 4-node, and nodes being separate as in the case of 3 2-nodes, we reference them as red (part of the same node) or black (true child)

Here, you can see the binary tree representations for 4-nodes For the 3-node, which of the two values is the root of the subtree? the other will be the right child if we choose the firstDatum as the root the other will be the left child if we choose the secondDatum as the root it doesn’t matter which implementation we use as long as we are consistent – that is, whenever we have a 3-node, we must use the implementation that we chose (the left one or the right one below) What About 3-Nodes?

Example While our binary tree implementation is a binary tree, the binary tree node requires additional variables – if its left child is red or black and if its right child is red or black Additionally, we will need to implement new insert and and delete methods to maintain the 2-3-4 tree operations for splitting and merging The 2-3-4 tree above is implemented as the binary tree to the right where dotted lines denote red nodes (for instance, 37-50 are part of a 3-node, 32-33-34 are part of a 4-node) solid lines represent true children (for instance, 30 is a true child of 37-50) Note: this figure in the book (p 687) is erroneous Searching the red-black tree is identical to searching a binary search tree

Inserting and Deleting • To implement our 2-3-4 tree as a binary tree, the red-black insertion and deletion will in essence mimic what the 2-3-4 tree did • to insert, follow the same insert as a binary tree insert • search from root to leaf and then insert the new value • however, we must maintain height-balancing – how? • in the 2-3-4 tree, we split any 4-node on the way down the tree using one of 6 cases • we do the same for our red-black tree, implementing the 6 cases from the point of view of red-black nodes rather than 4-nodes • for deletion, find the value to be deleted, swap it with it’s inorder successor, and delete the swapped value from its new location (a 2-3-4 leaf, which may or may not be leaf in the red-black tree) • on the way down the tree, as we find 2-nodes, we collapse them into 3-nodes and 4-nodes • most of the splitting/collapsing is done by re-coloring nodes, but there are some cases that require additional rotations

Split Cases 1-3 To split a 4 node into three two nodes, just change the color of the 1st and 3rd data from red to black When splitting the child of a 3-node (in either case 2 or case 3), we create a new node and rotate values around But here, moving the middle value up to the parent node merely requires recoloring the middle node from black to red and changing the 1st and 3rd data from red to black as they are now in their own node

Split Cases 4 & 5 For cases 4 & 5, moving a middle value up to its parent would move a value into a 3 node, creating a 4-node, and so we have to rotate those 3 values – that is, we can’t make M a red child of P, which is already a red child, so P & Q become red children of M, S & L become black nodes

Split Case 6 The last case is like cases 4 and 5, here the split occurs in a 4-node which is the middle child of a 3-node When one value moves up to the parent 3-node, the middle value (Q in these figures) becomes the new root of the subtree with the other two values (P and M) being recolored to red and the children (S and L) recolored to black

Rotations • In addition to the splits as shown in the previous 3 slides, we may also have to rotate nodes as they are added to 2 and 3 nodes • If we add a node to a 2 node, it becomes a 3 node but our representation for our 3 nodes must be consistent • either firstDatum is always the root or secondDatum is • if we add a larger value to a 2 node and we want our secondDatum to be the root, then we have to rotate the previous node with our new node • since our new node is the larger of the two, it is secondDatum, and therefore should be root, thus we rotate the two nodes • If we add a node to a 3 node, it becomes a 4 node • if the added value is less than the first two, or greater than the first two, we have to rotate the nodes around so that the middle node is now the root

Example: Adding to a Red-Black Tree Start with a tree Add 7, a red child Add 12, After rotation, that has a single requires rotating both children value, 4 since 12 > 4 and 7 remain red (the 3 values are a 4-node) Add 15, case 1, After recoloring Add 3 requires recoloring

Example Continued Add 5 Add 14, again the 4-node The tree after needs rotating rotating 12-14-15 Add 18, first recolor 12-14-15, Now add 18 to the tree, Add 16, requires moving 14 up into the node a red child of 15 rotation of the new with 7 (case 3) 4 node

Example Concluded After rotating 15-16-18 Add 17, case 6, 16 is now a black node, and first rotate 7, 14, 16, reattaching 18 becomes a red node children appropriate (4, 12) and recoloring 7 and 16 to red Finally, add 17

First, find the node to delete if it is a leaf node and a red node, delete it, otherwise if the node is a black leaf node, we have to do some rotation to move a new node into the bottom level otherwise the node is a non-leaf, find the node’s inorder successor (which will be a leaf) swap the successor value with the value to be deleted delete the value, now in a leaf node and ensure the tree is properly balanced by altering (re-coloring) nodes and/or rotating nodes as necessary Here, rather than examining how to collapse nodes, we will see the cases from an easier perspective in each case, assume the node to be deleted is v v’s parent is x v is a left child since if v was x’s right child, it would not be the node being deleted, OR v is the only node in the subtree under x v may have a right child, we will call r (if it exists) r will be moved into the place of v, so that r becomes x’s right child x may have another child, we will call it y (if it exists) Red-Black Tree Deletion x v y r Dotted lines here denote optional nodes

Deletion: Case 1 x v y r z Dotted lines here denote optional nodes • If y is black and has a red child z • We must now rotate the nodes x, y, and z • Recall that x is the parent of the node to be deleted whereas y and z are a child and a grandchild of x • Rotate x, y and z so that the middle value of x, y and z becomes the root and the other two nodes are distributed appropriately • Also make sure that r, another child of x, is attached appropriately • Assign the following colors: the new root takes on the color that x had formerly while the two children are black and r is made or kept black

If y is black and both children of y are black NOTE: null pointers are considered black Here, we have 1, 2 or 3 2-nodes and what we want is to combine them into a 3-node or 4-node This is done by recoloring these nodes Color r black, y red and if x is red, color it black that is, the parent becomes the root of a larger node with y as a red node within that larger node r is kept as a separate node note that in doing this, since x may have shifted from red to black, we may have separated the parent from its 2-3-4 tree node If so we must now move up to the parent of x and see if the change of colors to x has affected the parent if so, we have to check the Case 1, 2 and 3 again if Case 2 applies again, we must again check to see if one of the 3 cases applies to the parent in the worst case, Case 2 continues to apply all the way up the tree! Deletion: Case 2 x v y r c1 c2

Deletion: Case 3 x v y r z • If y is red • We must perform a rotation on x, y and z (similar to case 1) • In this case, y is the middle value between x and z • Make y the parent with x and z being children of y • Also, r must be moved appropriately • Make y black, x red, and r remains black • Case 1 or Case 2 may now apply to y and it’s parent, so we must move up to y and check again • If case 2 does apply, it will not propagate any further up the tree (unlike case 2 applying by itself) and so we can stop after fixing y’s parent (if it is necessary to do so)

Deletion Example Starting from our previously tree, Now, let’s remove 12 – while 12 is also a leaf, let’s delete 3 – since 3 is a leaf and it leaves the tree unbalanced since 7 and 12 were there is no node to move into it’s both black. This is Case 1 and is handled by place, we are done after removing 3 by rotation of 4-5-7 Delete 17 just by removing it Deleting 18 causes an But we don’t have (same as with deleting 3) imbalance, handled by to recolor 14 since case 2, recoloring 15 and 16 it is the root

AVL Tree • The final type of height balanced tree that we explore is a binary tree that performs AVL Rotations • AVL is an abbreviation of the authors who thought of the strategy • The basic idea is that you have a normal binary tree implementation • but you add to each node a value storing the difference in height between the node’s left subtree and right subtree • If, when inserting or deleting a node, the difference in heights of the two subtrees of any node becomes greater than 1 • then you need to rebalance the tree by using one of the AVL rotations • There are a number of cases, each which its own rotation • once rotation is done, update all affected node’s values (height differences)

Balance Factors • Every node will have an added int, it’s balance factor • the BF is the heightL – heightR • if the BF of a node becomes greater than 1 or less than -1, then the tree is no longer height-balanced • height-balancing takes place by rotating the nodes around the lowest node whose BF is out of bounds • by adjusting the lowest node, any node higher up with a BF out of bounds will have its BF corrected • we will call the node to be corrected as the pivot Here, M is the pivot, to fix this problem, rotate M/P/N – this will make all BFs be within legal bounds (-1 to +1)

Case 1 • Insertion in the left subtree of the left child of pivot • this may cause the pivot to go from a BF of -1 to BF of 0 (no adjustment) or from a BF of +1 to +2 • the rotation is to make the pivot’s left child the root of the subtree • with the pivot being it’s right child • and it’s old right subtree becoming pivot’s left subtree • this gives both the child (now parent) and pivot a BF = 0 • note: only pivot and its old child have their BFs altered after rotation Case 2 is a mirror image of case 1, the insertion is in the right subtree of pivot’s right child, rotation moves the child to become pivot’s parent, etc

Case 3 • Insertion in the left subtree of the right child of the pivot’s left child • this may cause the pivot’s BF to go from +1 to +2 • unlike case 1, the left child’s BF goes down instead of up, so a greater degree of rotation is needed to rebalance the pivot The grandchild becomes the parent with the child and pivot redistributed as children, and the grandchild’s subtrees are attached to child and pivot – 3 BFs are modified Case 4 is the mirror image of case 3

Case 5 • Neither the pivot nor the pivot’s left child have a right child • if the insertion is into the left child’s subtree, the pivot goes from BF 1 to BF 2 • a simple rotation and rebalance corrects this Case 6 is a mirror image where pivot has a right child and neither pivot nor right child have left children

Example Consider adding 60 to the tree on the left, the result is shown on the right (this is case 2) 20 is the pivot (its BF goes from +1 to +2) The solution is to rotate the pivot’s right child to become it’s parent, and the pivot becomes it’s left child attaching the child’s subtrees appropriately Here is the new tree, again height-balanced with the old pivot’s BF = 0, and the child (now new parent)’s BF = 0

Ch 13: Advanced Table Implementations