cs503 eighth lecture fall 2008 binary trees
Download
Skip this Video
Download Presentation
CS503: Eighth Lecture, Fall 2008 Binary Trees

Loading in 2 Seconds...

play fullscreen
1 / 46

CS503: Eighth Lecture, Fall 2008 Binary Trees - PowerPoint PPT Presentation


  • 122 Views
  • Uploaded on

CS503: Eighth Lecture, Fall 2008 Binary Trees. Michael Barnathan. Project Ideas. General idea: something that won’t take more than a couple of weeks but that you can use in a portfolio. These are just my ideas; feel free to come up with your own. Web crawler. Sockets, Trees, Recursion.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CS503: Eighth Lecture, Fall 2008 Binary Trees' - dreama


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
project ideas
Project Ideas
  • General idea: something that won’t take more than a couple of weeks but that you can use in a portfolio. These are just my ideas; feel free to come up with your own.
  • Web crawler.
    • Sockets, Trees, Recursion.
    • Caveats: dead links, status codes, redirects.
  • Fast file indexer based on frequent words.
    • File I/O, Trees, Analysis of Algorithms.
    • Compression (storing the index).
      • Works well because of Zipf/power law distributions.
    • Caveats: binary files, errors opening files. Make sure you open read-only.
  • Trend analysis tool using regression.
    • Sorting, Recursion, Analysis.
    • Predictors: I can help you with these.
  • AI opponent for a simple game (such as checkers or Reversi)
    • Trees, Storage, Recursion, Heuristic Search, Analysis of Algorithms.
  • Distributed command server (to order around a bunch of machines).
    • Sockets, File I/O, Priority Queues (especially for synchronization), Analysis of Algorithms.
grading
Grading
  • Not having exams is going to require adjusting the percentages a bit.
  • Assignments 40%, Project 30%, Labs 20%, Participation 10%?
here s what we ll be learning
Here’s what we’ll be learning:
  • Data structures:
    • Binary Trees.
    • Binary Search Trees.
  • Theory:
    • Tree traversals.
      • Preorder.
      • Inorder.
      • Postorder.
    • Balanced and complete trees.
    • Recursion on Binary Trees.
linear structures
Linear Structures
  • Arrays, Linked Lists, Stacks, and Queues are linear data structures.
    • Even circularly linked lists.
  • One element follows another: there is always just one “next” element.
  • As we mentioned, these usually yield recurrences of the form T(n) = T(n-1) + f(n).
  • What if we violate this assumption? What if a structure had two next elements?
    • In a way, we started with the most restrictive structure (arrays), which we are progressively relaxing.
frankenstein s data structures
Frankenstein’s Data Structures

It hardly even makes sense to talk about a node having more than one successor in an array. What would data[2] be here? It’s not clear.

4

2

1

5

Linked lists make more sense, but what is 1->next now? We need more information.

6

3

7

binary trees
Binary Trees
  • There are now two next nodes, not one.
  • Clearly, we need two pointers to model them.
  • So let’s rotate that list…
  • Let’s call the pointers “left” and “right”.
  • This structure is known as a binary tree.
    • Binary: branches into two nodes.
      • Higher-order trees exist too; we’ll talk about these later.
    • Why we call it a tree should be obvious.

1

Left

Right

3

2

6

7

4

5

nomenclature
Nomenclature
  • Some from botany, some from genealogy.
  • Root: The “highest” node in the tree; that is, the one without a parent.
    • Almost all tree algorithms start at the root.
  • Child/Subtree: A node one level below the current node. Traversing the left or right pointers will bring you to a node’s “left” or “right” child.
  • Leaf: A node at the “bottom” of the tree; i.e. one without children (or really with two null children).
  • Parent: The node one level above.
  • Siblings: Nodes with the same parent.
  • “Complete” tree: a tree where every node has either 0 or 2 children and all leaves are at the same level. (Basically, it’s fully “filled in”).
recursive definition
Recursive Definition
  • Binary trees have a nice recursive definition:
  • A binary tree is a value, a left binary tree, and a right binary tree.
  • Thus, each individual node is itself a tree.
  • Base case: the empty tree.
    • Leaves’ left and right children are both empty.
    • We usually represent this with nulls.
node access
Node Access
  • All you must store is a reference to the root.
  • You can get to the rest of the nodes by traversing the tree.
  • Example: Accessing 5.
  • There’s a problem.
    • Anyone see it?

Root

1

3

2

6

7

4

5

traversal
Traversal
  • We have no way of knowing where 5 is.
  • In order to find it, we need to check every node in the tree.
  • So what’s the complexity of access?
    • “Check every” should ring alarm bells by now.
  • This is a nonlinear data structure, so we have more than one way to traverse, however.
  • There are three common tree traversals:
    • Preorder, inorder, and postorder.
  • There are some more exotic ones, too: traversals based on pointer inversion, threaded traversal, Robson traversal…
  • Since binary trees are recursive structures, tree algorithms are usually recursive. Traversals are no exception.
  • Remember how we reversed the output of printTo(n) by moving the output above or below the recursive call?
    • It turns out you can change the order of the traversal in the same way.
traversals and visiting
Traversals and “Visiting”
  • You can do anything to a node inside of a tree traversal algorithm!
  • You certainly can search for a value.
  • But you can also output its value, modify the its value, insert a node there, etc.
  • This generic action is simply called “visiting” the node when discussing traversals.
preorder traversal
Preorder Traversal
  • If I gave you a binary tree and asked you to search for an element, how would you go about it?
    • You would check the value.
    • You would search the left subtree.
    • You would search the right subtree.
  • This is how preorder traversal works.
    • We check/output/do something with the node value.
    • We recurse on the left subtree.
    • We recurse on the right subtree.
  • We stop once we run out of nodes.
  • This is also called depth-first traversal, because it first focuses on traversing down specific nodes before broadly visiting others.
    • Like taking a lot of CS courses first before satisfying a core curriculum.
    • Preorder traversal is a special case of depth-first search on data structures called graphs, which we will discuss soon.
preorder traversal1
Preorder Traversal

BinaryTree preorder(BinaryTree node, inttargetvalue) {

if (node == null) //Base case.

return null;

else if (node.value == targetvalue) //Found it.

return node;

BinaryTree lchild = preorder(node.left); //Traverse left.

if (lchild != null)

return lchild; //Found on the left.

return preorder(node.right); //Traverse right.

}

preorder traversal illustration
Preorder Traversal: Illustration

Root

1

3

2

6

7

4

5

Order: [1 2 4 5 3 6 7]

The root is always the first node to be visited.

inorder traversal
Inorder Traversal
  • We have two recursive calls in the preorder traversal: left and right.
  • In preorder, we checked the node before calling either of them.
  • In an inorder traversal, we check in-between the two calls.
  • We dive down all the way on the left before outputting, then we visit the right.
    • To use recursive stack language, we output after popping on the left but before pushing on the right.
  • There is no inherent advantage to choosing one traversal over another on a regular binary tree unless you deliberately want a certain ordering.
  • However, inorder traversal is important on a variation of the binary tree. More on that in just a moment.
inorder traversal1
Inorder Traversal

BinaryTree inorder(BinaryTree node, inttargetvalue) {

if (node == null) //Base case.

return null;

//All we did was swap the order of these two lines.

BinaryTree lchild = inorder(node.left); //Traverse left.

if (node.value == targetvalue) //Found it.

return node;

if (lchild != null)

return lchild; //Found on the left.

return inorder(node.right); //Traverse right.

}

inorder traversal illustration
Inorder Traversal: Illustration

Root

1

3

2

6

7

4

5

Order: [4 2 5 1 6 3 7]

The root is always the middle node.

postorder traversal
Postorder Traversal
  • The obvious next step: output after both recursive calls.
  • This causes the algorithm to dive down to the bottom of the tree and output/visit the node when going back up.
    • Similar to what we did in printTo(), actually.
    • We are outputting on the pop.
postorder traversal1
Postorder Traversal

BinaryTree postorder(BinaryTree node, inttargetvalue) {

if (node == null) //Base case.

return null;

BinaryTree lchild = postorder(node.left); //Traverse left.

BinaryTree rchild = postorder(node.right); //Traverse right.

if (node.value == targetvalue) //Found it.

return node;

if (lchild != null)

return lchild; //Found on the left.

//Found on the right or not at all.

return rchild;

}

postorder traversal illustration
Postorder Traversal: Illustration

Root

1

3

2

6

7

4

5

Order: [4 5 2 6 7 3 1]

The root is always the last node to be visited.

crud binary trees
CRUD: Binary Trees.
  • Insertion: ?
  • Access: ?
  • Updating an element: ?
  • Deleting an element: ?
  • Search/Traversal: O(n).
  • All three traversals are linear: They visit every node in sequence.
    • They each just follow different sequences.
    • You can search by traversing, so search is also O(n).
  • How long would it take to access a node, though?
    • If I knew I wanted the left child’s left child, how many pointers would I need to follow to get to it?
tree height
Tree Height.
  • To analyze worst-case access, we need to talk about tree height.
  • The height of a tree is the number of vertical levels it contains, not including the root level.
  • Or you can think of it as the number of times you’d have to traverse down the tree to get from the root to the lowest leaf node.
  • Nodes in the tree are said to have a depth, based on how many vertical levels they are down from the root.
    • The root itself has a depth of 0.
    • The root’s children have a depth of 1.
    • Their children have a depth of 2…
    • Etc.
  • The height is thus also the depth of the lowest node.
height
Height

Root

1

Depth = 0

3

2

Depth = 1

6

7

4

5

Depth = 2

Height = 2

Remember, don’t count the root level.

height balance
Height Balance
  • A tree is considered balanced (or height-balanced) if the depth of the highest and lowest leaves differs by no more than 1.
  • This turns out to be an important property because it forms a lower bound on the access time of the tree and lets us find the height.
  • Question: If we have n nodes in a balanced binary tree, what is the height of the tree?
    • floor(log2 n)
    • Note that we had 7 nodes in the previous tree, but a height of 2. The tree was full; adding an 8th node would take the height to 3.
  • The time to access a node depends on the height, thus we know it is O(log n) on a balanced tree.
degeneracy
Degeneracy
  • Trees with only left or right pointers degenerate into linked lists.
    • Which gives you another perspective on why Quicksort became quadratic with everything on one side of the pivot.
    • Access on linked lists is O(n).
  • Performance gets worse even as we approach this condition, so we want to keep trees balanced.

1

1

2

2

3

3

crud balanced binary trees
CRUD: Balanced Binary Trees.
  • Insertion: ?
  • Access: O(log n).
  • Updating an element: ?
  • Deleting an element: ?
  • Search/Traversal: O(n).
  • Once we know where to insert, insertion is simple.
    • Just add a new leaf there: O(1).
  • However, discovering where to insert is a bit trickier.
    • Anywhere that a null child used to be will work.
    • We don’t want to upset the balance of the tree.
    • A good strategy is to traverse down the tree based on the value of each node. This creates a partitioning at each level.
binary tree insertion
Binary Tree Insertion

void insert(BinaryTree root, BinaryTree newtree) {

//This can only happen now if the user passes in an empty tree.

if (root == null)

root = newtree; //Empty. Insert the root.

else if (newtree.value < root.value) { //Go left if <.

if (root.left == null) //Found a place to insert.

root.left = newtree;

else

insert(root.left, newtree); //Keep traversing.

}

else { //Go right if >=.

if (root.right == null)

root.right = newtree; //Found a place to insert.

else

insert(root.right, newtree); //Keep traversing.

}

}

insertion analysis
Insertion Analysis
  • This is similar to a traversal, but guided by the value of the node.
  • We choose left or right based on whether the node is < or >=.
  • We split into one subproblem of size n/2 each time we traverse.
    • What recurrence would we have for this?
    • What would be the solution?
crud balanced binary trees1
CRUD: Balanced Binary Trees.
  • Insertion: O(log n).
  • Access: O(log n).
  • Updating an element: O(1).
  • Deleting an element: ?
  • Search/Traversal: O(n).
  • If we’re already at the element we need to update, we can just change the value, thus O(1).
    • Note that we can say the same for insertion, but finding a place to put the node is usually considered part of it.
  • Deletion is quite complex, on the other hand.
    • If there are no children, just remove the node – O(1).
    • If there is one child, just replace the node with its child – O(1).
    • If there are two… well, that’s the tricky case.
deletion
Deletion
  • If we need to delete a node with two children, we need to find a suitable node to replace it with.
  • One good choice is the inorder successor of the node, which will be the leftmost leaf of the right child we’re deleting from.
    • Inorder successor meaning the next node in an inorder traversal.
  • So our course is clear: inorder traverse, stop at the next node, swap.
deletion1
Deletion

void deleteWithTwoSubtrees(BinaryTreetargetnode) {

if (targetnode == null) //Deleting a null is a no-op.

return;

//Find the inorder successor and its parent.

BinaryTree inorder_succ;

BinaryTree inorder_parent = targetnode;

for (inorder_succ = targetnode.right; inorder_succ.left != null; inorder_succ = inorder_succ.left)

inorder_parent = inorder_succ; //Keep track of the parent.

//Set the value of the parent to that of the inorder successor…

targetnode.value = inorder_succ.value;

//Delete the inorder successor (here’s why we needed the parent):

inorder_parent.left = null;

}

crud balanced binary trees2
CRUD: Balanced Binary Trees.
  • Insertion: O(log n).
  • Access: O(log n).
  • Updating an element: O(1).
  • Deleting an element: O(log n).
  • Search/Traversal: O(n).
  • Finding the inorder successor requires time proportional to the height of the tree. If the tree is balanced, this is O(log n).
crud unbalanced binary trees
CRUD: Unbalanced Binary Trees.
  • Insertion: O(n).
  • Access: O(n).
  • Updating an element: O(1).
  • Deleting an element: O(1).
  • Search/Traversal: O(n).
  • The worst sort of unbalanced tree is just a linked list.
  • The deletion algorithm would always hit the second case (only one child), so we’d never experience O(log n) behavior…
  • But the insertion algorithm is not as efficient as that of a linked list.
    • Unless we check for this condition explicitly, in which case we get O(1).
binary search trees
Binary Search Trees
  • Binary Search Trees (BSTs) capture the notion of “splitting into two”.
  • Or, to use the Quicksort term, partitioning.
    • The value of a node is the pivot.
    • The left tree contains elements < the pivot.
    • The right tree contains elements >= the pivot.
  • They are simply binary trees that are kept sorted in the manner stated above.
bsts what do they entail
BSTs: What do they entail?
  • Like priority queues and sorted arrays, binary search trees are inherently sorted containers.
  • This means inserting a sequence of elements and then reading them back will get them out in sorted order.
    • Ah, but this time we have three ways to read them back out. All three can’t give us the same order.
    • The elements of an inorder traversal are sorted in binary search trees.
  • It also means that we’ll have to do some extra work to ensure that this guarantee is true.
    • But will this work influence the asymptotic performance?
a binary search tree
A Binary Search Tree

Root

4

6

2

5

7

1

3

Inorder traversal: [1 2 3 4 5 6 7]

< on the left, >= on the right.

crud binary search trees
CRUD: Binary Search Trees.
  • Insertion: O(log n).
  • Access: O(log n).
  • Updating an element: ?
  • Deleting an element: O(log n).
  • Search: O(log n).
  • Traversal: O(n).
  • Search and traversal are no longer the same operation!
    • Traversal is analogous to linear search: look at every element, one at a time, and try to find the target.
    • Search on a BST is analogous to binary search: the data is sorted around the value of the node we’re at, so it guides us to eliminate half of the remaining elements at each step.
    • Just like other unsorted containers, we have to traverse to search a standard binary tree. And like other sorted containers, a BST lets us do a binary search.
  • Remember, BSTs are sorted in an inorder traversal.
    • Therefore, the deletion algorithm we previously specified will preserve the ordering.
access on a bst
Access on a BST
  • Use the same strategy we used in binary search:
    • Compare the node.
    • If the target value is less than the node’s value, go left (eliminates the right subtree).
    • If the target value is greater than the node’s value, go right (eliminates the left subtree).
    • If it’s equal, we’ve found the target.
    • If we hit a NULL, the target isn’t in the tree.
  • This exhibits the same performance: O(log n).
    • If the tree is balanced. In the degenerate case, we are binary searching a linked list, which is O(n).
insertion on a bst
Insertion on a BST
  • The algorithm I gave you for insertion was actually the BST insertion algorithm as well.
    • That was one of the reasons why I chose that strategy, although it does result in a fairly balanced tree if the data distribution is uniform.
  • In order to keep elements partitioned around the pivot, we need to traverse left when the new element has a value < the pivot and right when it’s >=.
  • It was O(log n) before, and it still is.
deletion on a bst
Deletion on a BST
  • I also gave you the BST deletion algorithm.
  • As the inorder traversal is in sorted order, the inorder successor is the next element after the one we’re deleting in sorted order.
  • If we replace the element we’re deleting with the next element in the sequence, the sequence is still sorted.
    • e.g., [ 1 3 5 8 13 ] after deleting 3 -> [ 1 5 8 13].
  • It was O(log n) before, and it still is.
updating a bst
Updating a BST
  • Ah, here’s something different.
  • Updating unsorted containers is usually a constant-time operation, while updating sorted containers usually takes longer.
  • When we change the value of a node in a BST, we may be required to change the node’s position in the tree to preserve the ordering.
    • This is why updating sorted containers is usually a slow operation.
  • No one seems to want to deal with updating these, so most sources (including your textbook) just define it as “delete and reinsert”.
    • Which fully works and is very simple to do.
    • Don’t be afraid to do “quick and dirty” things if they don’t harm your performance.
  • So does this harm performance?
    • Insertion is O(log n).
    • Deletion is O(log n).
    • Unless we can update in O(1) on a BST (we can’t), then no.
crud binary search trees1
CRUD: Binary Search Trees.
  • Insertion: O(log n).
  • Access: O(log n).
  • Updating an element: O(log n).
  • Deleting an element: O(log n).
  • Search: O(log n).
  • Traversal: O(n).
  • This is the ultimate compromise data structure.
    • Arrays, Lists, Stacks, and Queues all did some things in constant time and other things in linear time.
    • This does everything (except traversal, which is inherently a linear operation) in logarithmic time.
    • But remember, logarithmic time isn’t much worse than constant.
    • So these are pretty good data structures.
  • As usual, there’s a catch…
the important of balance
The Important of Balance
  • Every operation on a tree begins to degenerate when balance is lost.
    • And in the worst case, you end up with a less efficient linked list.
  • Keeping the tree balanced is thus important.
    • There is one who is prophesized to bring balance to the Force, but I don’t think that includes your trees.
    • So the burden falls on you, my young padawan.
  • Since BSTs are the structural analogue of Quicksort, you may have an idea of what insertion sequence will produce the worst case.
    • Yep, sorted or inverse-sorted, just as in Quicksort.
  • Most data is not arranged like this already, and on average, BSTs stay fairly well balanced.
  • But this is enough of a problem where various self-balancing structures have been invented. We will discuss these next week.
a general note
A General Note
  • Although I put numbers in most of my examples, any sort of data can go in these.
    • Strings, Objects, Employees.
  • Caveat: When using Java’s sorted containers, make sure your class implements Comparable.
    • Java doesn’t give you a BinaryTree class outright, but it does give you TreeSet and TreeMap.
    • TreeMap in particular is very neat; check it out.
    • We’ll do some things with these in Thursday’s lab.
in all my life i ll never see a thing so beautiful as a tree
“In all my life I’ll never see / a thing so beautiful as a tree.”
  • The study of trees goes very deep.
    • We’ve just scratched the surface.
    • We’ll come back to self-balancing trees, heaps, and perhaps splay trees.
  • The lesson:
    • Ideas are universal. They can come from your study. They can come from outside of your study. They can come from nature. They can come from anywhere.
  • Next class: Linear-time sorting, B+ trees, lab.
ad