Binary Trees

1 / 29

# Binary Trees - PowerPoint PPT Presentation

Binary Trees. CS 1037 Fundamentals of Computer Science II. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A. What is a “Tree”?. A tree is a graph with no cycles A rooted tree is a tree with one node r designated the root

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Binary Trees

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Binary Trees

CS 1037

Fundamentals of Computer Science II

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.: AAAAAAAAAAA

What is a “Tree”?
• A tree is a graph with no cycles
• A rooted tree is a tree with one node r designated the root
• Choice of root defines ancestry relationships

two trees (a “forest”)

a tree

not a tree

r

r

Tree Properties
• Any two nodes are connected by a path
• A tree with n nodes has n¡1 edges
• If any edges are removed, the tree becomes a forest
• If any edge is added, the graph has a cycle and is no longer a tree
Rooted Tree Terminology
• Node y is an ancestor of x if it appears on the path r!x
• Node x is a descendent of y if y is an ancestor of x
• The subtree rooted aty is the tree of y and its descendents
• Node y is the parent of x if it is immediate ancestor of x
• Node x is a child of y if the parent of x is y

r

y

x

r

y

Rooted Tree Terminology
• A node w/ descendents is internal
• A node w/o descendents is a leaf
• The depth of node x is the number of edges on the path r!x
• The height of a tree is the largest depth of any node

r

r

x

depth(x)=2

Binary Tree Terminology
• A tree is k-ary if all its nodes have ·k children
• A tree is binary if it is 2-ary
• A child node is called either the left child or the right child
• A k-ary tree is full if all internal nodes have k children
• A tree is complete if it is full and all leaves have same depth

3-ary tree

left

right

left

right

Why Study Trees?
• Abstract representation of hierarchy
• Hierarchies natural in many applications
• networks, taxonomies, decision making, graphics
• Even when not natural, hierarchies crucial for performance (binary search trees)

C:

com

ca

edu

users

programs

music

uwo

delong

skype

gaga

a “shader tree” for 3D rendering

eng

csd

cs1037

Binary Tree Data Structures
• Each node has up to two successors: its left child and right child

struct node {

node* left; // pointer to root of left subtree

node* right; // pointer to root of right subtree

... // application-specific stuff };

root

left

right

data

data

data

data

data

data

Binary Search Trees (BSTs)
• A BST is binary tree where nodes contain an item and are ordered in a special way
• Main goal: data structure that supports fast insert, erase, and search
• Array-based binary search does not support fast insert/erase!

height

¼lgn

* assuming uniformly distributed items (completely random)

The Binary Search Tree Property
• A tree is a BST if it satisfies the binary search tree property:
• Let x be a node in a BST.
• If y is a node in the left subtree of x, then y.item <= x.item.
• If y is a node in the right subtree of x, then y.item >= x.item

3

2

7

4

9

9

2

7

3

4

4

3

7

2

9

BST Search (iterative)

struct node {

node* left; // left subtree has values <= item

node* right; // right subtree has values >= item

int item; // item for this node};

node* root = ...;

node* search(int x) {

node* n = root;

while (n) {

if (x < n->item)

n = n->left; // look in left subtree

elseif (x > n->item)

n = n->right; // look in right subtree

else

break; // found match!

}

return n;

}

BST Search (recursive)
• Follows one path down the tree
• At most 2(h+1) tests, where h is height of the BST

node* search(node* n, int x) {

if (!n)

return 0; // fell off bottom of tree; no match

if (x < n->item)

return search(n->left,x); // search left subtree

if (n->item < x)

return search(n->right,x); // search right subtree

return n;

}

3

2

7

4

9

node* result = search(root,4);

cout << result->item; // prints "4"

Running Time of BST Search
• If tree is well-balanced, at most clgn time
• If items added in random order, tree well-balanced on average
• If tree is highly skewed, up to cn time
• If items added in sorted order, tree will be completely skewed
BST Insert (recursive)
• At most h+1 tests, where h is height of the BST
• May increase height of tree!

void insert(node*& n, int item) {

if (!n) {

n = new node; // we hit bottom,

n->item = item; // so insert here

n->left = n->right = 0; // (no children yet)

} else if (item < n->item)

insert(n->left,item); // item belongs to left

else

insert(n->right,item); // item belongs to right

}

insert(root,5);

root

3

2

7

4

9

5

BST Erase Examples

erasemust maintain binary search tree property!

erase(root,5)

erase(root,4)

erase(root,3)

case 1: node is a leaf (trivial: delete 5)

case 2: has only one child (easy: unlink, then delete 4)

case 3: has two children (hard: can’t just unlink 3)

3

3

3

3

3

4

2

7

2

2

2

2

2

7

7

7

7

7

4

9

4

5

5

4

4

9

9

9

9

9

5

5

5

BST Erase Examples (Memory Diagram)

case 2

root

left

right

3

2

7

4

9

5

case 3

root

left

right

4

3

3

3

2

7

2

2

7

7

4

4

9

9

4

9

5

5

5

(you don’t want to see the “efficient” version!)

BST Erase (simple version)

void erase(node*& n, int item) {

if (!n)

return; // no match, ignore erase

if (item < n->item)

erase(n->left,item); // match must be on left

else if (n->item < item)

erase(n->right,item); // match must be on right

else if (!n->right) {

node* temp = n; // case 1 or 2:

n = n->left;// bypass n to left subtree

delete temp; // (possibly NULL)

} else if (!n->left) {

node* temp = n; // case 2:

n = n->right;// bypass n to right subtree

delete temp;

} else {

node* successor = n->right; // case 3: get smallest

while (successor->left) // value in right subtree

successor = successor->left; // by descending leftward;

n->item = successor->item; // copy its value to n and

erase(n->right,successor->item); // delete the easy node instead

}

}

### EXERCISE IN VISUAL STUDIO

See Snippet #1

Binary Tree Exercise
• 39. [6 marks] Write a function to print the items of a binary tree in
• level-order (all items at depth 0, then all items at depth 1,
• then all items at depth 2…). Hint: use a queue!
• void print_levelorder(node* root) {
• }
• queue<node*> q;
• if (root)
• q.push_back(root);
• while (!q.empty()) {
• }

G

GDYAWZ

D

Y

A

W

Z

node* n = q.front(); q.pop_front();

cout << n->item;

if (n->left)

q.push_back(n->left);

if (n->right)

q.push_back(n->right);

### Huffman Trees: Binary Trees for Compression

A Totally Different Application/ Interpretation of Binary Trees

0

1

a

0

1

b

c

Compression Problem

Q: Given list of symbols {a,b,c,...} of size n,

what is the shortest string of {0,1} bits

that uniquely identifies string baabac?

Use fixed-length binary code of dlgne bits

baabac

a:00

b:01

c:10

{a,b,c}

n=3

01¢00¢00¢01¢00¢10

12 bits

binary code

Compression Problem

Frequent symbols should have shorter

binary codes than infrequent symbols

Need estimate of symbol frequencies!

a: 3 times, b: 2 times, c: 1 time

baabac

baabac

10¢0¢0¢10¢0¢11

10¢11¢11¢10¢11¢0

good prefix code

9 bits

11 bits

a:0

b:10

c:11

a:11

b:10

c:0

Optimal Binary CodeProblem
• Given symbols S={a,b,c,...} and expected frequencies f(x), which binary code achieves best expected compression?
• Answer discovered in 1951 by MIT student David A. Huffman
• Build a special binary tree:

0

1

a:0

b:10

c:11

f(a)=3

f(b)=2

f(c)=1

)

)

a

0

1

David Huffman, 1991

b

c

frequencies

Huffman tree

optimalcode

Huffman Codes
• Observation: 1-to-1 correspondence of possibly optimal codes & full binary trees
• Huffman’s algorithm uses f(x) to build an optimal binary tree (a Huffman tree), and thereby optimal binary code!

a:0

b:10

c:110

d:111

1

0

a:00

b:01

c:10

d:11

a:00

b:010

c:011

d:1

1

0

1

0

a

1

0

d

1

1

1

0

0

0

b

1

0

a

b

c

d

a

1

0

c

d

b

c

Optimal Binary Code Problem (Formal)
• Input: set of symbols S={a,b,c,...}, and

frequencies f(x) for each x2S

• Output: binary codes c(x) such that, for string s[0..n-1] its compressed size

is minimized.

|y| means length of code y=c(s[i]) for string character s[i]

e.g. recall min size(baabac) = 9 bits

Huffman’s Algorithm
• Take roots i and j with smallest f and make new root with f =fi+ fj
• While not a single tree, repeat step 2

x:f(x)

a:3

b:2

c:1

d:7

3

a:3

d:7

b:2

c:1

a:00

b:010

c:011

d:1

13

6

d:7

a:3

3

b:2

c:1

Huffman Tree in C++

struct node {

node* left; // ptr to root of left subtree

node* right; // ptr to root of right subtree

char symbol; // symbol represented by this node

double frequency; // total frequency of symbols in

}; // subtree rooted at this node

internal nodes (no symbol)

root

1.0

0

1

0.6

b:0.4

-1

1.0

0

1

leaf nodes

-1

0.6

a:0.3

c:0.3

'b'

0.4

'a'

0.3

'c'

0.3

left

right

sym

freq

Huffman Tree Operations
• build(map<char,double> f)
• build optimal tree with each symbol cand its frequency estimate f[c]
• string encode(string s)
• build string of binary codes from symbols s[i]
• string decode(string b)
• build string of symbols from binary string b

std::map is STL data structure

"baabac"

10¢0¢0¢10¢0¢11

"baabac"

10¢0¢0¢10¢0¢11

Huffman Code Summary
• Optimal way to compress when each symbol is independently sampled from distribution f
• however, most real data is not independent!
• in English, is any particular letter likely to be 'u'? what if you knew preceding letter was 'q'? ...
• Used everywhere in compression:
• image compression (JPEG/PNG/ZIP), networking, text compression (English compressed to ~40% of size)
• Totally different from BST, yet still binary tree!
• http://en.wikipedia.org/wiki/Huffman_coding