- 91 Views
- Uploaded on
- Presentation posted in: General

CS503: Eighth Lecture, Fall 2008 Binary Trees

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

CS503: Eighth Lecture, Fall 2008Binary Trees

Michael Barnathan

- General idea: something that won’t take more than a couple of weeks but that you can use in a portfolio. These are just my ideas; feel free to come up with your own.
- Web crawler.
- Sockets, Trees, Recursion.
- Caveats: dead links, status codes, redirects.

- Fast file indexer based on frequent words.
- File I/O, Trees, Analysis of Algorithms.
- Compression (storing the index).
- Works well because of Zipf/power law distributions.

- Caveats: binary files, errors opening files. Make sure you open read-only.

- Trend analysis tool using regression.
- Sorting, Recursion, Analysis.
- Predictors: I can help you with these.

- AI opponent for a simple game (such as checkers or Reversi)
- Trees, Storage, Recursion, Heuristic Search, Analysis of Algorithms.

- Distributed command server (to order around a bunch of machines).
- Sockets, File I/O, Priority Queues (especially for synchronization), Analysis of Algorithms.

- Not having exams is going to require adjusting the percentages a bit.
- Assignments 40%, Project 30%, Labs 20%, Participation 10%?

- Data structures:
- Binary Trees.
- Binary Search Trees.

- Theory:
- Tree traversals.
- Preorder.
- Inorder.
- Postorder.

- Balanced and complete trees.
- Recursion on Binary Trees.

- Tree traversals.

- Arrays, Linked Lists, Stacks, and Queues are linear data structures.
- Even circularly linked lists.

- One element follows another: there is always just one “next” element.
- As we mentioned, these usually yield recurrences of the form T(n) = T(n-1) + f(n).
- What if we violate this assumption? What if a structure had two next elements?
- In a way, we started with the most restrictive structure (arrays), which we are progressively relaxing.

It hardly even makes sense to talk about a node having more than one successor in an array. What would data[2] be here? It’s not clear.

4

2

1

5

Linked lists make more sense, but what is 1->next now? We need more information.

6

3

7

- There are now two next nodes, not one.
- Clearly, we need two pointers to model them.
- So let’s rotate that list…
- Let’s call the pointers “left” and “right”.
- This structure is known as a binary tree.
- Binary: branches into two nodes.
- Higher-order trees exist too; we’ll talk about these later.

- Why we call it a tree should be obvious.

- Binary: branches into two nodes.

1

Left

Right

3

2

6

7

4

5

- Some from botany, some from genealogy.
- Root: The “highest” node in the tree; that is, the one without a parent.
- Almost all tree algorithms start at the root.

- Child/Subtree: A node one level below the current node. Traversing the left or right pointers will bring you to a node’s “left” or “right” child.
- Leaf: A node at the “bottom” of the tree; i.e. one without children (or really with two null children).
- Parent: The node one level above.
- Siblings: Nodes with the same parent.
- “Complete” tree: a tree where every node has either 0 or 2 children and all leaves are at the same level. (Basically, it’s fully “filled in”).

- Binary trees have a nice recursive definition:
- A binary tree is a value, a left binary tree, and a right binary tree.
- Thus, each individual node is itself a tree.
- Base case: the empty tree.
- Leaves’ left and right children are both empty.
- We usually represent this with nulls.

- All you must store is a reference to the root.
- You can get to the rest of the nodes by traversing the tree.
- Example: Accessing 5.
- There’s a problem.
- Anyone see it?

Root

1

3

2

6

7

4

5

- We have no way of knowing where 5 is.
- In order to find it, we need to check every node in the tree.
- So what’s the complexity of access?
- “Check every” should ring alarm bells by now.

- This is a nonlinear data structure, so we have more than one way to traverse, however.
- There are three common tree traversals:
- Preorder, inorder, and postorder.

- There are some more exotic ones, too: traversals based on pointer inversion, threaded traversal, Robson traversal…
- Since binary trees are recursive structures, tree algorithms are usually recursive. Traversals are no exception.
- Remember how we reversed the output of printTo(n) by moving the output above or below the recursive call?
- It turns out you can change the order of the traversal in the same way.

- You can do anything to a node inside of a tree traversal algorithm!
- You certainly can search for a value.
- But you can also output its value, modify the its value, insert a node there, etc.
- This generic action is simply called “visiting” the node when discussing traversals.

- If I gave you a binary tree and asked you to search for an element, how would you go about it?
- You would check the value.
- You would search the left subtree.
- You would search the right subtree.

- This is how preorder traversal works.
- We check/output/do something with the node value.
- We recurse on the left subtree.
- We recurse on the right subtree.

- We stop once we run out of nodes.
- This is also called depth-first traversal, because it first focuses on traversing down specific nodes before broadly visiting others.
- Like taking a lot of CS courses first before satisfying a core curriculum.
- Preorder traversal is a special case of depth-first search on data structures called graphs, which we will discuss soon.

BinaryTree preorder(BinaryTree node, inttargetvalue) {

if (node == null)//Base case.

return null;

else if (node.value == targetvalue)//Found it.

return node;

BinaryTree lchild = preorder(node.left);//Traverse left.

if (lchild != null)

return lchild;//Found on the left.

return preorder(node.right);//Traverse right.

}

Root

1

3

2

6

7

4

5

Order: [1 2 4 5 3 6 7]

The root is always the first node to be visited.

- We have two recursive calls in the preorder traversal: left and right.
- In preorder, we checked the node before calling either of them.
- In an inorder traversal, we check in-between the two calls.
- We dive down all the way on the left before outputting, then we visit the right.
- To use recursive stack language, we output after popping on the left but before pushing on the right.

- There is no inherent advantage to choosing one traversal over another on a regular binary tree unless you deliberately want a certain ordering.
- However, inorder traversal is important on a variation of the binary tree. More on that in just a moment.

BinaryTree inorder(BinaryTree node, inttargetvalue) {

if (node == null)//Base case.

return null;

//All we did was swap the order of these two lines.

BinaryTree lchild = inorder(node.left);//Traverse left.

if (node.value == targetvalue)//Found it.

return node;

if (lchild != null)

return lchild;//Found on the left.

return inorder(node.right);//Traverse right.

}

Root

1

3

2

6

7

4

5

Order: [4 2 5 1 6 3 7]

The root is always the middle node.

- The obvious next step: output after both recursive calls.
- This causes the algorithm to dive down to the bottom of the tree and output/visit the node when going back up.
- Similar to what we did in printTo(), actually.
- We are outputting on the pop.

BinaryTree postorder(BinaryTree node, inttargetvalue) {

if (node == null)//Base case.

return null;

BinaryTree lchild = postorder(node.left);//Traverse left.

BinaryTree rchild = postorder(node.right);//Traverse right.

if (node.value == targetvalue)//Found it.

return node;

if (lchild != null)

return lchild;//Found on the left.

//Found on the right or not at all.

return rchild;

}

Root

1

3

2

6

7

4

5

Order: [4 5 2 6 7 3 1]

The root is always the last node to be visited.

- Insertion:?
- Access:?
- Updating an element:?
- Deleting an element:?
- Search/Traversal:O(n).
- All three traversals are linear: They visit every node in sequence.
- They each just follow different sequences.
- You can search by traversing, so search is also O(n).

- How long would it take to access a node, though?
- If I knew I wanted the left child’s left child, how many pointers would I need to follow to get to it?

- To analyze worst-case access, we need to talk about tree height.
- The height of a tree is the number of vertical levels it contains, not including the root level.
- Or you can think of it as the number of times you’d have to traverse down the tree to get from the root to the lowest leaf node.
- Nodes in the tree are said to have a depth, based on how many vertical levels they are down from the root.
- The root itself has a depth of 0.
- The root’s children have a depth of 1.
- Their children have a depth of 2…
- Etc.

- The height is thus also the depth of the lowest node.

Root

1

Depth = 0

3

2

Depth = 1

6

7

4

5

Depth = 2

Height = 2

Remember, don’t count the root level.

- A tree is considered balanced (or height-balanced) if the depth of the highest and lowest leaves differs by no more than 1.
- This turns out to be an important property because it forms a lower bound on the access time of the tree and lets us find the height.
- Question: If we have n nodes in a balanced binary tree, what is the height of the tree?
- floor(log2 n)
- Note that we had 7 nodes in the previous tree, but a height of 2. The tree was full; adding an 8th node would take the height to 3.

- The time to access a node depends on the height, thus we know it is O(log n) on a balanced tree.

- Trees with only left or right pointers degenerate into linked lists.
- Which gives you another perspective on why Quicksort became quadratic with everything on one side of the pivot.
- Access on linked lists is O(n).

- Performance gets worse even as we approach this condition, so we want to keep trees balanced.

1

1

2

2

3

3

- Insertion:?
- Access:O(log n).
- Updating an element:?
- Deleting an element:?
- Search/Traversal:O(n).
- Once we know where to insert, insertion is simple.
- Just add a new leaf there: O(1).

- However, discovering where to insert is a bit trickier.
- Anywhere that a null child used to be will work.
- We don’t want to upset the balance of the tree.
- A good strategy is to traverse down the tree based on the value of each node. This creates a partitioning at each level.

void insert(BinaryTree root, BinaryTree newtree) {

//This can only happen now if the user passes in an empty tree.

if (root == null)

root = newtree;//Empty. Insert the root.

else if (newtree.value < root.value) {//Go left if <.

if (root.left == null)//Found a place to insert.

root.left = newtree;

else

insert(root.left, newtree);//Keep traversing.

}

else {//Go right if >=.

if (root.right == null)

root.right = newtree;//Found a place to insert.

else

insert(root.right, newtree);//Keep traversing.

}

}

- This is similar to a traversal, but guided by the value of the node.
- We choose left or right based on whether the node is < or >=.
- We split into one subproblem of size n/2 each time we traverse.
- What recurrence would we have for this?
- What would be the solution?

- Insertion:O(log n).
- Access:O(log n).
- Updating an element:O(1).
- Deleting an element:?
- Search/Traversal:O(n).
- If we’re already at the element we need to update, we can just change the value, thus O(1).
- Note that we can say the same for insertion, but finding a place to put the node is usually considered part of it.

- Deletion is quite complex, on the other hand.
- If there are no children, just remove the node – O(1).
- If there is one child, just replace the node with its child – O(1).
- If there are two… well, that’s the tricky case.

- If we need to delete a node with two children, we need to find a suitable node to replace it with.
- One good choice is the inorder successor of the node, which will be the leftmost leaf of the right child we’re deleting from.
- Inorder successor meaning the next node in an inorder traversal.

- So our course is clear: inorder traverse, stop at the next node, swap.

void deleteWithTwoSubtrees(BinaryTreetargetnode) {

if (targetnode == null)//Deleting a null is a no-op.

return;

//Find the inorder successor and its parent.

BinaryTree inorder_succ;

BinaryTree inorder_parent = targetnode;

for (inorder_succ = targetnode.right; inorder_succ.left != null; inorder_succ = inorder_succ.left)

inorder_parent = inorder_succ;//Keep track of the parent.

//Set the value of the parent to that of the inorder successor…

targetnode.value = inorder_succ.value;

//Delete the inorder successor (here’s why we needed the parent):

inorder_parent.left = null;

}

- Insertion:O(log n).
- Access:O(log n).
- Updating an element:O(1).
- Deleting an element:O(log n).
- Search/Traversal:O(n).
- Finding the inorder successor requires time proportional to the height of the tree. If the tree is balanced, this is O(log n).

- Insertion:O(n).
- Access:O(n).
- Updating an element:O(1).
- Deleting an element:O(1).
- Search/Traversal:O(n).
- The worst sort of unbalanced tree is just a linked list.
- The deletion algorithm would always hit the second case (only one child), so we’d never experience O(log n) behavior…
- But the insertion algorithm is not as efficient as that of a linked list.
- Unless we check for this condition explicitly, in which case we get O(1).

- Binary Search Trees (BSTs) capture the notion of “splitting into two”.
- Or, to use the Quicksort term, partitioning.
- The value of a node is the pivot.
- The left tree contains elements < the pivot.
- The right tree contains elements >= the pivot.

- They are simply binary trees that are kept sorted in the manner stated above.

- Like priority queues and sorted arrays, binary search trees are inherently sorted containers.
- This means inserting a sequence of elements and then reading them back will get them out in sorted order.
- Ah, but this time we have three ways to read them back out. All three can’t give us the same order.
- The elements of an inorder traversal are sorted in binary search trees.

- It also means that we’ll have to do some extra work to ensure that this guarantee is true.
- But will this work influence the asymptotic performance?

Root

4

6

2

5

7

1

3

Inorder traversal: [1 2 3 4 5 6 7]

< on the left, >= on the right.

- Insertion:O(log n).
- Access:O(log n).
- Updating an element:?
- Deleting an element:O(log n).
- Search:O(log n).
- Traversal:O(n).
- Search and traversal are no longer the same operation!
- Traversal is analogous to linear search: look at every element, one at a time, and try to find the target.
- Search on a BST is analogous to binary search: the data is sorted around the value of the node we’re at, so it guides us to eliminate half of the remaining elements at each step.
- Just like other unsorted containers, we have to traverse to search a standard binary tree. And like other sorted containers, a BST lets us do a binary search.

- Remember, BSTs are sorted in an inorder traversal.
- Therefore, the deletion algorithm we previously specified will preserve the ordering.

- Use the same strategy we used in binary search:
- Compare the node.
- If the target value is less than the node’s value, go left (eliminates the right subtree).
- If the target value is greater than the node’s value, go right (eliminates the left subtree).
- If it’s equal, we’ve found the target.
- If we hit a NULL, the target isn’t in the tree.

- This exhibits the same performance: O(log n).
- If the tree is balanced. In the degenerate case, we are binary searching a linked list, which is O(n).

- The algorithm I gave you for insertion was actually the BST insertion algorithm as well.
- That was one of the reasons why I chose that strategy, although it does result in a fairly balanced tree if the data distribution is uniform.

- In order to keep elements partitioned around the pivot, we need to traverse left when the new element has a value < the pivot and right when it’s >=.
- It was O(log n) before, and it still is.

- I also gave you the BST deletion algorithm.
- As the inorder traversal is in sorted order, the inorder successor is the next element after the one we’re deleting in sorted order.
- If we replace the element we’re deleting with the next element in the sequence, the sequence is still sorted.
- e.g., [ 1 3 5 8 13 ] after deleting 3 -> [ 1 5 8 13].

- It was O(log n) before, and it still is.

- Ah, here’s something different.
- Updating unsorted containers is usually a constant-time operation, while updating sorted containers usually takes longer.
- When we change the value of a node in a BST, we may be required to change the node’s position in the tree to preserve the ordering.
- This is why updating sorted containers is usually a slow operation.

- No one seems to want to deal with updating these, so most sources (including your textbook) just define it as “delete and reinsert”.
- Which fully works and is very simple to do.
- Don’t be afraid to do “quick and dirty” things if they don’t harm your performance.

- So does this harm performance?
- Insertion is O(log n).
- Deletion is O(log n).
- Unless we can update in O(1) on a BST (we can’t), then no.

- Insertion:O(log n).
- Access:O(log n).
- Updating an element:O(log n).
- Deleting an element:O(log n).
- Search:O(log n).
- Traversal:O(n).
- This is the ultimate compromise data structure.
- Arrays, Lists, Stacks, and Queues all did some things in constant time and other things in linear time.
- This does everything (except traversal, which is inherently a linear operation) in logarithmic time.
- But remember, logarithmic time isn’t much worse than constant.
- So these are pretty good data structures.

- As usual, there’s a catch…

- Every operation on a tree begins to degenerate when balance is lost.
- And in the worst case, you end up with a less efficient linked list.

- Keeping the tree balanced is thus important.
- There is one who is prophesized to bring balance to the Force, but I don’t think that includes your trees.
- So the burden falls on you, my young padawan.

- Since BSTs are the structural analogue of Quicksort, you may have an idea of what insertion sequence will produce the worst case.
- Yep, sorted or inverse-sorted, just as in Quicksort.

- Most data is not arranged like this already, and on average, BSTs stay fairly well balanced.
- But this is enough of a problem where various self-balancing structures have been invented. We will discuss these next week.

- Although I put numbers in most of my examples, any sort of data can go in these.
- Strings, Objects, Employees.

- Caveat: When using Java’s sorted containers, make sure your class implements Comparable.
- Java doesn’t give you a BinaryTree class outright, but it does give you TreeSet and TreeMap.
- TreeMap in particular is very neat; check it out.
- We’ll do some things with these in Thursday’s lab.

- The study of trees goes very deep.
- We’ve just scratched the surface.
- We’ll come back to self-balancing trees, heaps, and perhaps splay trees.

- The lesson:
- Ideas are universal. They can come from your study. They can come from outside of your study. They can come from nature. They can come from anywhere.

- Next class: Linear-time sorting, B+ trees, lab.