Cs503 eighth lecture fall 2008 binary trees
This presentation is the property of its rightful owner.
Sponsored Links
1 / 46

CS503: Eighth Lecture, Fall 2008 Binary Trees PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on
  • Presentation posted in: General

CS503: Eighth Lecture, Fall 2008 Binary Trees. Michael Barnathan. Project Ideas. General idea: something that won’t take more than a couple of weeks but that you can use in a portfolio. These are just my ideas; feel free to come up with your own. Web crawler. Sockets, Trees, Recursion.

Download Presentation

CS503: Eighth Lecture, Fall 2008 Binary Trees

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cs503 eighth lecture fall 2008 binary trees

CS503: Eighth Lecture, Fall 2008Binary Trees

Michael Barnathan


Project ideas

Project Ideas

  • General idea: something that won’t take more than a couple of weeks but that you can use in a portfolio. These are just my ideas; feel free to come up with your own.

  • Web crawler.

    • Sockets, Trees, Recursion.

    • Caveats: dead links, status codes, redirects.

  • Fast file indexer based on frequent words.

    • File I/O, Trees, Analysis of Algorithms.

    • Compression (storing the index).

      • Works well because of Zipf/power law distributions.

    • Caveats: binary files, errors opening files. Make sure you open read-only.

  • Trend analysis tool using regression.

    • Sorting, Recursion, Analysis.

    • Predictors: I can help you with these.

  • AI opponent for a simple game (such as checkers or Reversi)

    • Trees, Storage, Recursion, Heuristic Search, Analysis of Algorithms.

  • Distributed command server (to order around a bunch of machines).

    • Sockets, File I/O, Priority Queues (especially for synchronization), Analysis of Algorithms.


Grading

Grading

  • Not having exams is going to require adjusting the percentages a bit.

  • Assignments 40%, Project 30%, Labs 20%, Participation 10%?


Here s what we ll be learning

Here’s what we’ll be learning:

  • Data structures:

    • Binary Trees.

    • Binary Search Trees.

  • Theory:

    • Tree traversals.

      • Preorder.

      • Inorder.

      • Postorder.

    • Balanced and complete trees.

    • Recursion on Binary Trees.


Linear structures

Linear Structures

  • Arrays, Linked Lists, Stacks, and Queues are linear data structures.

    • Even circularly linked lists.

  • One element follows another: there is always just one “next” element.

  • As we mentioned, these usually yield recurrences of the form T(n) = T(n-1) + f(n).

  • What if we violate this assumption? What if a structure had two next elements?

    • In a way, we started with the most restrictive structure (arrays), which we are progressively relaxing.


Frankenstein s data structures

Frankenstein’s Data Structures

It hardly even makes sense to talk about a node having more than one successor in an array. What would data[2] be here? It’s not clear.

4

2

1

5

Linked lists make more sense, but what is 1->next now? We need more information.

6

3

7


Binary trees

Binary Trees

  • There are now two next nodes, not one.

  • Clearly, we need two pointers to model them.

  • So let’s rotate that list…

  • Let’s call the pointers “left” and “right”.

  • This structure is known as a binary tree.

    • Binary: branches into two nodes.

      • Higher-order trees exist too; we’ll talk about these later.

    • Why we call it a tree should be obvious.

1

Left

Right

3

2

6

7

4

5


Nomenclature

Nomenclature

  • Some from botany, some from genealogy.

  • Root: The “highest” node in the tree; that is, the one without a parent.

    • Almost all tree algorithms start at the root.

  • Child/Subtree: A node one level below the current node. Traversing the left or right pointers will bring you to a node’s “left” or “right” child.

  • Leaf: A node at the “bottom” of the tree; i.e. one without children (or really with two null children).

  • Parent: The node one level above.

  • Siblings: Nodes with the same parent.

  • “Complete” tree: a tree where every node has either 0 or 2 children and all leaves are at the same level. (Basically, it’s fully “filled in”).


Recursive definition

Recursive Definition

  • Binary trees have a nice recursive definition:

  • A binary tree is a value, a left binary tree, and a right binary tree.

  • Thus, each individual node is itself a tree.

  • Base case: the empty tree.

    • Leaves’ left and right children are both empty.

    • We usually represent this with nulls.


Node access

Node Access

  • All you must store is a reference to the root.

  • You can get to the rest of the nodes by traversing the tree.

  • Example: Accessing 5.

  • There’s a problem.

    • Anyone see it?

Root

1

3

2

6

7

4

5


Traversal

Traversal

  • We have no way of knowing where 5 is.

  • In order to find it, we need to check every node in the tree.

  • So what’s the complexity of access?

    • “Check every” should ring alarm bells by now.

  • This is a nonlinear data structure, so we have more than one way to traverse, however.

  • There are three common tree traversals:

    • Preorder, inorder, and postorder.

  • There are some more exotic ones, too: traversals based on pointer inversion, threaded traversal, Robson traversal…

  • Since binary trees are recursive structures, tree algorithms are usually recursive. Traversals are no exception.

  • Remember how we reversed the output of printTo(n) by moving the output above or below the recursive call?

    • It turns out you can change the order of the traversal in the same way.


Traversals and visiting

Traversals and “Visiting”

  • You can do anything to a node inside of a tree traversal algorithm!

  • You certainly can search for a value.

  • But you can also output its value, modify the its value, insert a node there, etc.

  • This generic action is simply called “visiting” the node when discussing traversals.


Preorder traversal

Preorder Traversal

  • If I gave you a binary tree and asked you to search for an element, how would you go about it?

    • You would check the value.

    • You would search the left subtree.

    • You would search the right subtree.

  • This is how preorder traversal works.

    • We check/output/do something with the node value.

    • We recurse on the left subtree.

    • We recurse on the right subtree.

  • We stop once we run out of nodes.

  • This is also called depth-first traversal, because it first focuses on traversing down specific nodes before broadly visiting others.

    • Like taking a lot of CS courses first before satisfying a core curriculum.

    • Preorder traversal is a special case of depth-first search on data structures called graphs, which we will discuss soon.


Preorder traversal1

Preorder Traversal

BinaryTree preorder(BinaryTree node, inttargetvalue) {

if (node == null)//Base case.

return null;

else if (node.value == targetvalue)//Found it.

return node;

BinaryTree lchild = preorder(node.left);//Traverse left.

if (lchild != null)

return lchild;//Found on the left.

return preorder(node.right);//Traverse right.

}


Preorder traversal illustration

Preorder Traversal: Illustration

Root

1

3

2

6

7

4

5

Order: [1 2 4 5 3 6 7]

The root is always the first node to be visited.


Inorder traversal

Inorder Traversal

  • We have two recursive calls in the preorder traversal: left and right.

  • In preorder, we checked the node before calling either of them.

  • In an inorder traversal, we check in-between the two calls.

  • We dive down all the way on the left before outputting, then we visit the right.

    • To use recursive stack language, we output after popping on the left but before pushing on the right.

  • There is no inherent advantage to choosing one traversal over another on a regular binary tree unless you deliberately want a certain ordering.

  • However, inorder traversal is important on a variation of the binary tree. More on that in just a moment.


Inorder traversal1

Inorder Traversal

BinaryTree inorder(BinaryTree node, inttargetvalue) {

if (node == null)//Base case.

return null;

//All we did was swap the order of these two lines.

BinaryTree lchild = inorder(node.left);//Traverse left.

if (node.value == targetvalue)//Found it.

return node;

if (lchild != null)

return lchild;//Found on the left.

return inorder(node.right);//Traverse right.

}


Inorder traversal illustration

Inorder Traversal: Illustration

Root

1

3

2

6

7

4

5

Order: [4 2 5 1 6 3 7]

The root is always the middle node.


Postorder traversal

Postorder Traversal

  • The obvious next step: output after both recursive calls.

  • This causes the algorithm to dive down to the bottom of the tree and output/visit the node when going back up.

    • Similar to what we did in printTo(), actually.

    • We are outputting on the pop.


Postorder traversal1

Postorder Traversal

BinaryTree postorder(BinaryTree node, inttargetvalue) {

if (node == null)//Base case.

return null;

BinaryTree lchild = postorder(node.left);//Traverse left.

BinaryTree rchild = postorder(node.right);//Traverse right.

if (node.value == targetvalue)//Found it.

return node;

if (lchild != null)

return lchild;//Found on the left.

//Found on the right or not at all.

return rchild;

}


Postorder traversal illustration

Postorder Traversal: Illustration

Root

1

3

2

6

7

4

5

Order: [4 5 2 6 7 3 1]

The root is always the last node to be visited.


Crud binary trees

CRUD: Binary Trees.

  • Insertion:?

  • Access:?

  • Updating an element:?

  • Deleting an element:?

  • Search/Traversal:O(n).

  • All three traversals are linear: They visit every node in sequence.

    • They each just follow different sequences.

    • You can search by traversing, so search is also O(n).

  • How long would it take to access a node, though?

    • If I knew I wanted the left child’s left child, how many pointers would I need to follow to get to it?


Tree height

Tree Height.

  • To analyze worst-case access, we need to talk about tree height.

  • The height of a tree is the number of vertical levels it contains, not including the root level.

  • Or you can think of it as the number of times you’d have to traverse down the tree to get from the root to the lowest leaf node.

  • Nodes in the tree are said to have a depth, based on how many vertical levels they are down from the root.

    • The root itself has a depth of 0.

    • The root’s children have a depth of 1.

    • Their children have a depth of 2…

    • Etc.

  • The height is thus also the depth of the lowest node.


Height

Height

Root

1

Depth = 0

3

2

Depth = 1

6

7

4

5

Depth = 2

Height = 2

Remember, don’t count the root level.


Height balance

Height Balance

  • A tree is considered balanced (or height-balanced) if the depth of the highest and lowest leaves differs by no more than 1.

  • This turns out to be an important property because it forms a lower bound on the access time of the tree and lets us find the height.

  • Question: If we have n nodes in a balanced binary tree, what is the height of the tree?

    • floor(log2 n)

    • Note that we had 7 nodes in the previous tree, but a height of 2. The tree was full; adding an 8th node would take the height to 3.

  • The time to access a node depends on the height, thus we know it is O(log n) on a balanced tree.


Degeneracy

Degeneracy

  • Trees with only left or right pointers degenerate into linked lists.

    • Which gives you another perspective on why Quicksort became quadratic with everything on one side of the pivot.

    • Access on linked lists is O(n).

  • Performance gets worse even as we approach this condition, so we want to keep trees balanced.

1

1

2

2

3

3


Crud balanced binary trees

CRUD: Balanced Binary Trees.

  • Insertion:?

  • Access:O(log n).

  • Updating an element:?

  • Deleting an element:?

  • Search/Traversal:O(n).

  • Once we know where to insert, insertion is simple.

    • Just add a new leaf there: O(1).

  • However, discovering where to insert is a bit trickier.

    • Anywhere that a null child used to be will work.

    • We don’t want to upset the balance of the tree.

    • A good strategy is to traverse down the tree based on the value of each node. This creates a partitioning at each level.


Binary tree insertion

Binary Tree Insertion

void insert(BinaryTree root, BinaryTree newtree) {

//This can only happen now if the user passes in an empty tree.

if (root == null)

root = newtree;//Empty. Insert the root.

else if (newtree.value < root.value) {//Go left if <.

if (root.left == null)//Found a place to insert.

root.left = newtree;

else

insert(root.left, newtree);//Keep traversing.

}

else {//Go right if >=.

if (root.right == null)

root.right = newtree;//Found a place to insert.

else

insert(root.right, newtree);//Keep traversing.

}

}


Insertion analysis

Insertion Analysis

  • This is similar to a traversal, but guided by the value of the node.

  • We choose left or right based on whether the node is < or >=.

  • We split into one subproblem of size n/2 each time we traverse.

    • What recurrence would we have for this?

    • What would be the solution?


Crud balanced binary trees1

CRUD: Balanced Binary Trees.

  • Insertion:O(log n).

  • Access:O(log n).

  • Updating an element:O(1).

  • Deleting an element:?

  • Search/Traversal:O(n).

  • If we’re already at the element we need to update, we can just change the value, thus O(1).

    • Note that we can say the same for insertion, but finding a place to put the node is usually considered part of it.

  • Deletion is quite complex, on the other hand.

    • If there are no children, just remove the node – O(1).

    • If there is one child, just replace the node with its child – O(1).

    • If there are two… well, that’s the tricky case.


Deletion

Deletion

  • If we need to delete a node with two children, we need to find a suitable node to replace it with.

  • One good choice is the inorder successor of the node, which will be the leftmost leaf of the right child we’re deleting from.

    • Inorder successor meaning the next node in an inorder traversal.

  • So our course is clear: inorder traverse, stop at the next node, swap.


Deletion1

Deletion

void deleteWithTwoSubtrees(BinaryTreetargetnode) {

if (targetnode == null)//Deleting a null is a no-op.

return;

//Find the inorder successor and its parent.

BinaryTree inorder_succ;

BinaryTree inorder_parent = targetnode;

for (inorder_succ = targetnode.right; inorder_succ.left != null; inorder_succ = inorder_succ.left)

inorder_parent = inorder_succ;//Keep track of the parent.

//Set the value of the parent to that of the inorder successor…

targetnode.value = inorder_succ.value;

//Delete the inorder successor (here’s why we needed the parent):

inorder_parent.left = null;

}


Crud balanced binary trees2

CRUD: Balanced Binary Trees.

  • Insertion:O(log n).

  • Access:O(log n).

  • Updating an element:O(1).

  • Deleting an element:O(log n).

  • Search/Traversal:O(n).

  • Finding the inorder successor requires time proportional to the height of the tree. If the tree is balanced, this is O(log n).


Crud unbalanced binary trees

CRUD: Unbalanced Binary Trees.

  • Insertion:O(n).

  • Access:O(n).

  • Updating an element:O(1).

  • Deleting an element:O(1).

  • Search/Traversal:O(n).

  • The worst sort of unbalanced tree is just a linked list.

  • The deletion algorithm would always hit the second case (only one child), so we’d never experience O(log n) behavior…

  • But the insertion algorithm is not as efficient as that of a linked list.

    • Unless we check for this condition explicitly, in which case we get O(1).


Binary search trees

Binary Search Trees

  • Binary Search Trees (BSTs) capture the notion of “splitting into two”.

  • Or, to use the Quicksort term, partitioning.

    • The value of a node is the pivot.

    • The left tree contains elements < the pivot.

    • The right tree contains elements >= the pivot.

  • They are simply binary trees that are kept sorted in the manner stated above.


Bsts what do they entail

BSTs: What do they entail?

  • Like priority queues and sorted arrays, binary search trees are inherently sorted containers.

  • This means inserting a sequence of elements and then reading them back will get them out in sorted order.

    • Ah, but this time we have three ways to read them back out. All three can’t give us the same order.

    • The elements of an inorder traversal are sorted in binary search trees.

  • It also means that we’ll have to do some extra work to ensure that this guarantee is true.

    • But will this work influence the asymptotic performance?


A binary search tree

A Binary Search Tree

Root

4

6

2

5

7

1

3

Inorder traversal: [1 2 3 4 5 6 7]

< on the left, >= on the right.


Crud binary search trees

CRUD: Binary Search Trees.

  • Insertion:O(log n).

  • Access:O(log n).

  • Updating an element:?

  • Deleting an element:O(log n).

  • Search:O(log n).

  • Traversal:O(n).

  • Search and traversal are no longer the same operation!

    • Traversal is analogous to linear search: look at every element, one at a time, and try to find the target.

    • Search on a BST is analogous to binary search: the data is sorted around the value of the node we’re at, so it guides us to eliminate half of the remaining elements at each step.

    • Just like other unsorted containers, we have to traverse to search a standard binary tree. And like other sorted containers, a BST lets us do a binary search.

  • Remember, BSTs are sorted in an inorder traversal.

    • Therefore, the deletion algorithm we previously specified will preserve the ordering.


Access on a bst

Access on a BST

  • Use the same strategy we used in binary search:

    • Compare the node.

    • If the target value is less than the node’s value, go left (eliminates the right subtree).

    • If the target value is greater than the node’s value, go right (eliminates the left subtree).

    • If it’s equal, we’ve found the target.

    • If we hit a NULL, the target isn’t in the tree.

  • This exhibits the same performance: O(log n).

    • If the tree is balanced. In the degenerate case, we are binary searching a linked list, which is O(n).


Insertion on a bst

Insertion on a BST

  • The algorithm I gave you for insertion was actually the BST insertion algorithm as well.

    • That was one of the reasons why I chose that strategy, although it does result in a fairly balanced tree if the data distribution is uniform.

  • In order to keep elements partitioned around the pivot, we need to traverse left when the new element has a value < the pivot and right when it’s >=.

  • It was O(log n) before, and it still is.


Deletion on a bst

Deletion on a BST

  • I also gave you the BST deletion algorithm.

  • As the inorder traversal is in sorted order, the inorder successor is the next element after the one we’re deleting in sorted order.

  • If we replace the element we’re deleting with the next element in the sequence, the sequence is still sorted.

    • e.g., [ 1 3 5 8 13 ] after deleting 3 -> [ 1 5 8 13].

  • It was O(log n) before, and it still is.


Updating a bst

Updating a BST

  • Ah, here’s something different.

  • Updating unsorted containers is usually a constant-time operation, while updating sorted containers usually takes longer.

  • When we change the value of a node in a BST, we may be required to change the node’s position in the tree to preserve the ordering.

    • This is why updating sorted containers is usually a slow operation.

  • No one seems to want to deal with updating these, so most sources (including your textbook) just define it as “delete and reinsert”.

    • Which fully works and is very simple to do.

    • Don’t be afraid to do “quick and dirty” things if they don’t harm your performance.

  • So does this harm performance?

    • Insertion is O(log n).

    • Deletion is O(log n).

    • Unless we can update in O(1) on a BST (we can’t), then no.


Crud binary search trees1

CRUD: Binary Search Trees.

  • Insertion:O(log n).

  • Access:O(log n).

  • Updating an element:O(log n).

  • Deleting an element:O(log n).

  • Search:O(log n).

  • Traversal:O(n).

  • This is the ultimate compromise data structure.

    • Arrays, Lists, Stacks, and Queues all did some things in constant time and other things in linear time.

    • This does everything (except traversal, which is inherently a linear operation) in logarithmic time.

    • But remember, logarithmic time isn’t much worse than constant.

    • So these are pretty good data structures.

  • As usual, there’s a catch…


The important of balance

The Important of Balance

  • Every operation on a tree begins to degenerate when balance is lost.

    • And in the worst case, you end up with a less efficient linked list.

  • Keeping the tree balanced is thus important.

    • There is one who is prophesized to bring balance to the Force, but I don’t think that includes your trees.

    • So the burden falls on you, my young padawan.

  • Since BSTs are the structural analogue of Quicksort, you may have an idea of what insertion sequence will produce the worst case.

    • Yep, sorted or inverse-sorted, just as in Quicksort.

  • Most data is not arranged like this already, and on average, BSTs stay fairly well balanced.

  • But this is enough of a problem where various self-balancing structures have been invented. We will discuss these next week.


A general note

A General Note

  • Although I put numbers in most of my examples, any sort of data can go in these.

    • Strings, Objects, Employees.

  • Caveat: When using Java’s sorted containers, make sure your class implements Comparable.

    • Java doesn’t give you a BinaryTree class outright, but it does give you TreeSet and TreeMap.

    • TreeMap in particular is very neat; check it out.

    • We’ll do some things with these in Thursday’s lab.


In all my life i ll never see a thing so beautiful as a tree

“In all my life I’ll never see / a thing so beautiful as a tree.”

  • The study of trees goes very deep.

    • We’ve just scratched the surface.

    • We’ll come back to self-balancing trees, heaps, and perhaps splay trees.

  • The lesson:

    • Ideas are universal. They can come from your study. They can come from outside of your study. They can come from nature. They can come from anywhere.

  • Next class: Linear-time sorting, B+ trees, lab.


  • Login