Binary Trees

Binary Trees CS 1037 Fundamentals of Computer Science II TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAA

What is a “Tree”? • A tree is a graph with no cycles • A rooted tree is a tree with one node r designated the root • Choice of root defines ancestry relationships two trees (a “forest”) a tree not a tree r r

Tree Properties • Any two nodes are connected by a path • A tree with n nodes has n¡1 edges • If any edges are removed, the tree becomes a forest • If any edge is added, the graph has a cycle and is no longer a tree

Rooted Tree Terminology • Node y is an ancestor of x if it appears on the path r!x • Node x is a descendent of y if y is an ancestor of x • The subtree rooted aty is the tree of y and its descendents • Node y is the parent of x if it is immediate ancestor of x • Node x is a child of y if the parent of x is y r y x r y

Rooted Tree Terminology • A node w/ descendents is internal • A node w/o descendents is a leaf • The depth of node x is the number of edges on the path r!x • The height of a tree is the largest depth of any node r r x depth(x)=2

Binary Tree Terminology • A tree is k-ary if all its nodes have ·k children • A tree is binary if it is 2-ary • A child node is called either the left child or the right child • A k-ary tree is full if all internal nodes have k children • A tree is complete if it is full and all leaves have same depth 3-ary tree left right left right

Why Study Trees? • Abstract representation of hierarchy • Hierarchies natural in many applications • networks, taxonomies, decision making, graphics • Even when not natural, hierarchies crucial for performance (binary search trees) C: com ca edu users programs music uwo delong skype gaga a “shader tree” for 3D rendering eng csd cs1037 radiohead

Binary Tree Data Structures • A linked data structure (like linked list) • Each node has up to two successors: its left child and right child struct node { node* left; // pointer to root of left subtree node* right; // pointer to root of right subtree ... // application-specific stuff }; root left right data data data data data data

Binary Search Trees (BSTs) • A BST is binary tree where nodes contain an item and are ordered in a special way • Main goal: data structure that supports fast insert, erase, and search • Array-based binary search does not support fast insert/erase! height ¼lgn * assuming uniformly distributed items (completely random)

The Binary Search Tree Property • A tree is a BST if it satisfies the binary search tree property: • Let x be a node in a BST. • If y is a node in the left subtree of x, then y.item <= x.item. • If y is a node in the right subtree of x, then y.item >= x.item 3 2 7 4 9 9 2 7 3 4 4 3 7 2 9

BST Search (iterative) struct node { node* left; // left subtree has values <= item node* right; // right subtree has values >= item int item; // item for this node}; node* root = ...; node* search(int x) { node* n = root; while (n) { if (x < n->item) n = n->left; // look in left subtree elseif (x > n->item) n = n->right; // look in right subtree else break; // found match! } return n; }

BST Search (recursive) • Follows one path down the tree • At most 2(h+1) tests, where h is height of the BST node* search(node* n, int x) { if (!n) return 0; // fell off bottom of tree; no match if (x < n->item) return search(n->left,x); // search left subtree if (n->item < x) return search(n->right,x); // search right subtree return n; } 3 2 7 4 9 node* result = search(root,4); cout << result->item; // prints "4"

Running Time of BST Search • If tree is well-balanced, at most clgn time • If items added in random order, tree well-balanced on average • If tree is highly skewed, up to cn time • If items added in sorted order, tree will be completely skewed

BST Insert (recursive) • At most h+1 tests, where h is height of the BST • May increase height of tree! void insert(node*& n, int item) { if (!n) { n = new node; // we hit bottom, n->item = item; // so insert here n->left = n->right = 0; // (no children yet) } else if (item < n->item) insert(n->left,item); // item belongs to left else insert(n->right,item); // item belongs to right } insert(root,5); root 3 2 7 4 9 5

BST Erase Examples erasemust maintain binary search tree property! erase(root,5) erase(root,4) erase(root,3) case 1: node is a leaf (trivial: delete 5) case 2: has only one child (easy: unlink, then delete 4) case 3: has two children (hard: can’t just unlink 3) 3 3 3 3 3 4 2 7 2 2 2 2 2 7 7 7 7 7 4 9 4 5 5 4 4 9 9 9 9 9 5 5 5

BST Erase Examples (Memory Diagram) case 2 root left right 3 2 7 4 9 5 case 3 root left right 4 3 3 3 2 7 2 2 7 7 4 4 9 9 4 9 5 5 5

(you don’t want to see the “efficient” version!) BST Erase (simple version) void erase(node*& n, int item) { if (!n) return; // no match, ignore erase if (item < n->item) erase(n->left,item); // match must be on left else if (n->item < item) erase(n->right,item); // match must be on right else if (!n->right) { node* temp = n; // case 1 or 2: n = n->left;// bypass n to left subtree delete temp; // (possibly NULL) } else if (!n->left) { node* temp = n; // case 2: n = n->right;// bypass n to right subtree delete temp; } else { node* successor = n->right; // case 3: get smallest while (successor->left) // value in right subtree successor = successor->left; // by descending leftward; n->item = successor->item; // copy its value to n and erase(n->right,successor->item); // delete the easy node instead } }

EXERCISE IN VISUAL STUDIO See Snippet #1

Binary Tree Exercise • 39. [6 marks] Write a function to print the items of a binary tree in • level-order (all items at depth 0, then all items at depth 1, • then all items at depth 2…). Hint: use a queue! • void print_levelorder(node* root) { • } • queue<node*> q; • if (root) • q.push_back(root); • while (!q.empty()) { • } G GDYAWZ D Y A W Z node* n = q.front(); q.pop_front(); cout << n->item; if (n->left) q.push_back(n->left); if (n->right) q.push_back(n->right);

Huffman Trees: Binary Trees for Compression A Totally Different Application/ Interpretation of Binary Trees 0 1 a 0 1 b c

Compression Problem Q: Given list of symbols {a,b,c,...} of size n, what is the shortest string of {0,1} bits that uniquely identifies string baabac? Easy Answer: Use fixed-length binary code of dlgne bits baabac a:00 b:01 c:10 {a,b,c} n=3 01¢00¢00¢01¢00¢10 12 bits binary code

Compression Problem Smart Answer: Use variable-length codes... Frequent symbols should have shorter binary codes than infrequent symbols Need estimate of symbol frequencies! a: 3 times, b: 2 times, c: 1 time baabac baabac 10¢0¢0¢10¢0¢11 10¢11¢11¢10¢11¢0 good prefix code bad prefix code 9 bits 11 bits a:0 b:10 c:11 a:11 b:10 c:0

Optimal Binary CodeProblem • Given symbols S={a,b,c,...} and expected frequencies f(x), which binary code achieves best expected compression? • Answer discovered in 1951 by MIT student David A. Huffman • Build a special binary tree: 0 1 a:0 b:10 c:11 f(a)=3 f(b)=2 f(c)=1 ) ) a 0 1 David Huffman, 1991 b c frequencies Huffman tree optimalcode

Huffman Codes • Observation: 1-to-1 correspondence of possibly optimal codes & full binary trees • Huffman’s algorithm uses f(x) to build an optimal binary tree (a Huffman tree), and thereby optimal binary code! a:0 b:10 c:110 d:111 1 0 a:00 b:01 c:10 d:11 a:00 b:010 c:011 d:1 1 0 1 0 a 1 0 d 1 1 1 0 0 0 b 1 0 a b c d a 1 0 c d b c

Optimal Binary Code Problem (Formal) • Input: set of symbols S={a,b,c,...}, and frequencies f(x) for each x2S • Output: binary codes c(x) such that, for string s[0..n-1] its compressed size is minimized. |y| means length of code y=c(s[i]) for string character s[i] e.g. recall min size(baabac) = 9 bits

s=bbaaddddcaddd Huffman’s Algorithm • Start with list of single-node trees • Take roots i and j with smallest f and make new root with f =fi+ fj • While not a single tree, repeat step 2 x:f(x) a:3 b:2 c:1 d:7 3 a:3 d:7 b:2 c:1 a:00 b:010 c:011 d:1 13 6 d:7 a:3 3 b:2 c:1

Huffman Tree in C++ struct node { node* left; // ptr to root of left subtree node* right; // ptr to root of right subtree char symbol; // symbol represented by this node double frequency; // total frequency of symbols in }; // subtree rooted at this node internal nodes (no symbol) root 1.0 0 1 0.6 b:0.4 -1 1.0 0 1 leaf nodes -1 0.6 a:0.3 c:0.3 'b' 0.4 'a' 0.3 'c' 0.3 left right sym freq

Huffman Tree Operations • build(map<char,double> f) • build optimal tree with each symbol cand its frequency estimate f[c] • string encode(string s) • build string of binary codes from symbols s[i] • string decode(string b) • build string of symbols from binary string b std::map is STL data structure "baabac" 10¢0¢0¢10¢0¢11 "baabac" 10¢0¢0¢10¢0¢11

Huffman Code Summary • Optimal way to compress when each symbol is independently sampled from distribution f • however, most real data is not independent! • in English, is any particular letter likely to be 'u'? what if you knew preceding letter was 'q'? ... • Used everywhere in compression: • image compression (JPEG/PNG/ZIP), networking, text compression (English compressed to ~40% of size) • Totally different from BST, yet still binary tree! • http://en.wikipedia.org/wiki/Huffman_coding

Binary Trees

Binary Trees

Presentation Transcript

Trees, Binary Trees, and Binary Search Trees

Binary Trees

Binary Trees

Binary Trees

Binary Trees

Binary Trees

Binary Trees

Binary Trees

Binary Trees

Binary Trees, Binary Search Trees

Binary Trees

Binary Trees

Binary Trees

Binary Trees, Binary Search Trees

BINARY TREES

Binary Trees, Binary Search Trees

Binary Trees

Binary Trees and Binary Search Trees

Binary Trees, Binary Search Trees

Trees, Binary Trees, and Binary Search Trees

Binary Trees, Binary Search Trees

Binary Trees, Binary Search Trees