Balanced Search Trees

1 / 35

# Balanced Search Trees - PowerPoint PPT Presentation

Balanced Search Trees. 15-211 Fundamental Data Structures and Algorithms. Margaret Reid-Miller 3 February 2005. Plan. Today 2-3-4 trees Red-Black trees Reading: For today: Chapters 13.3-4 Reminder: HW1 due tonight!!! HW2 will be available soon.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Balanced Search Trees

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Balanced Search Trees

15-211 Fundamental Data Structures and Algorithms

Margaret Reid-Miller

3 February 2005

Plan
• Today
• 2-3-4 trees
• Red-Black trees
• For today: Chapters 13.3-4
• Reminder: HW1 due tonight!!!

HW2 will be available soon

5

5

6

3

3

2

2

7

7

8

7

5

4

5

2

1

1

8

4

4

4

6

6

9

9

3

AVL-Trees

What is the key restriction on a binary search tree that keeps an AVL tree balanced?

OK

not OK

AVL-Trees
• Height balanced:
• For each node the heights of left and right subtrees differ by at most 1, a representational invariance.
• What is the mechanism to rebalance an out-of-balanced AVL tree caused by an insert?

X

Y

Z

The single rotation
• Rotate the deepest out-of-balanced node. “Pulls” the child up one level.

Z

X

Y

The double rotation
• First rotate around child node, then around the parent node.

Z

Z

X

Y2

Y1

Y2

X

Y1

Double rotation cont’d
• Result is to “pull” the grandchild node up two levels.

Z

X

X

Y1

Y2

Z

Y1

Y2

AVL Tree Summary
• In each node maintains a lazy deletion flag and the height of its subtree.
• The height of an AVL tree is at most 45% greater than the minimum.
• Requires at most one single or double rotation to regain balance after an insert.
• Thus, guarantees O(log N) time for search and insert.
Balanced 2-3-4 Trees
• Maintain height balance in all subtrees. Depth property.
• But allow nodes in the tree to expand to accommodate inserts.
• In particular, nodes can have 2, 3 or 4 children. Node-size property.
• E.g., a 4-node would have 3 keys that splits the keys into 4 intervals.
2-3-4 tree search
• Search is similar to a binary search.
• E.g., search for B

G M Q

A C

H

R

S W

G M Q

A C

H

R

S W

2-3-4 tree search
• Search is similar to a binary search.
• E.g., search for B

G M Q

A C

H

O

S U W

2-3-4 Tree Insert
• To insert, first search for a leaf node in which to put the key.
• E.g., insert U

G M Q

A C

H

R

S W

H

S U W

A C

H

2-3-4 Tree Insert
• May need to split a node
• E.g., insert T

G Q

T

A C

G Q U

S T

W

2-3-4 Tree Insert

/* Either returns an empty node or a new root */

public Node BUinsert(int key) {

if isEmptyNode() return new Node(key);

/* Search for leaf to put key into */

Node subtree = findChild(key); // down which link?

Node upNode = child.BUinsert(key);

/* upNode is empty, the key at a leaf node, or

* the result of a 4-node split that needs to be

* propagated up. */

if upNode.isEmptyNode() return upNode;

else

}

• When inserting a key into a 4-node, the 4-node splits and a key moves up to the parent node.
• This new key may in turn cause the parent to split, moving a key up to the grandparent, and so on up to the root.
• When would this happen?
• Is there a way to avoid these cascading splits?
Bottom-up 2-3-4 trees
• This BUinsert is called a bottom-up version of insert, since splits occur as we go back up the tree after the recursive calls.
• Work occurs before and after the recursive calls.
Preemptive Split
• Every time we find a 4-node while traveling down a search path, we split the 4-node.
• Note: Two 2-nodes have the same number of children as one 4-node.
• Changes are local to the split node (no cascading).
• Guaranteed to find a 2-node or 3-node at the leaf.
• Splitting a root node creates a new root.
2-3-4 Tree Height
• What is the height of the tree?

At most log2 N + 1

• Why?

The maximum depth is when every node is a 2-node. Since every leaf has the same depth, the tree is complete and has depth log2 N + 1.

Number of splits
• How many splits does an insertion require?

At most log2 N + 1 splits.

• Seems to require less than one split on average when tree is built from a random permutation. Trees tend to have few 4-nodes.
Top-down 2-4-5 trees
• The second method is called top-down as splits occur on the way down the tree.
• All the work occurs before the recursive calls and no work occurs after the recursive calls.
• Called tail-recursion, which is much more efficient.
• Can AVL trees be made tail recursive?
2-3-4 trees
• Guaranteed O(log N) time for search and insert.
• Issues:
• Awkward to maintain three types of nodes.
• Need to modify the standard search on binary trees.
• Splits need to move links between nodes.
• Code has many cases to handle.

G

B F H

D I

G

Red-Black trees
• A red-black tree is binary tree representation of a 2-3-4 tree using red and black nodes.

I

D

F

OR

D

I

B

H

Red-black tree properties

A Red-Black tree is a binary search tree where

• Every node is colored either red or black.
• Note: Every 2-3-4 node corresponds to one black node.
• The root node is black.
• Red nodes always have black parents (children)
• Every path from the root to a leaf has same number of black nodes.

7

3

6

9

Red-black tree height

5

• What is the height of a red-black tree?
• It is at most 2 log N + 2 since it can be at most twice as high as its corresponding 2-3-4 tree, which has height at most log N + 1.
Red-black Tree Search
• Search is the same as for binary search trees.
• Color is irrelevant.
• Search guaranteed to take O(log N) time.
• Search typically occurs more frequently than insert.
Red-black Tree Insert
• Simple 4-node test (2 red children?)
• Few splits as most 4-nodes tend to be near the leaves.
• Some 4-node splits require only changing the color of three nodes.
• Rotations needed only when a 4-node has a 3-node parent.
Red-black Tree Summary
• Guaranteed O(log N) time for search and insert.
• Trees are nearly optimal.
• Top-down implementation can be made tail-recursive, so very efficient.
B-trees
• A generalization of 2-3-4 trees.
• Used for very large dictionaries where the data are maintained on disks.
• Since disk lookups are very SLOW, want to read as few disk pages as possible.

Want really shallow depth trees!

B-trees Key Idea
• Make the nodes in the trees have a huge number of links, k-way.
• Typically choose k so that a node fills a disk page.
• As with 2-3-4 trees, not all the nodes have k links. Some may have as few as k/2 links.
• When a node overflows, split the node.
B-trees
• Takes O(log k/2 N) probes for search and insert.
• Typically about 2-3 probes (disk accesses)
• E.g., for N < 125 million and k = 1000, the height of the tree is less than 3.
• As all searches go through the root node, usually keep the root node in memory.
• Many variants
• Common in many large data base systems.
Conclusion
• AVL trees have the disadvantage that insert is not tail recursive.
• 2-3-4 trees are not practical, but are a good way to think about other approaches.
• Red-black trees are very efficient and have guaranteed O(log N) insert and search.
• B-trees have very shallow depth to minimize the number of disk reads needed for huge data bases.