Insertion into B + -trees

Insertion into B+-trees • When we want to insert a new key/value pair into a B+-tree we insert the key/value pair into the file on disk that is being used to store the records. This is a straightforward operation. We now consider how the B+-tree changes as a result. We assume that we are inserting into a B+-tree of order m. • by searching for the key value in the tree, we arrive at the appropriate leaf node in which it should be inserted. If the leaf node is not full, we insert the key and a pointer to the corresponding value on disk and we’re done. • if there is no room in the leaf node, we split the leaf in two so that each node is at least half full (see notes that follow) • when we split the nodes at one level, a new key value needs to be inserted into the parent level. We recursively apply our insertion strategy at the parent and continue up the tree.

The root is a special case. If the root node fills, we split it into two nodes and create a new root at the next highest level that has the two nodes resulting from the split as its children. Note: When inserting into a leaf node that is already full, we end up with m key/pointer pairs (ie. 1 too many for the node). We therefore create a new node on the right and distribute the m key/pointer pairs in such a way that the first m / 2 pairs remain with the original node and the others move to the new node. In other words: at least half of them remain with the original node. Note: When inserting a new child into an interior node that is already full, we end up with m + 1 children (ie. 1 too many for the node). We again create a new node on the right and distribute the m + 1 children in such a way that (m + 1) / 2 of them remain with the original node and the other move to the new node. In other words: at least half of them remain with the original node.

12 60 27 27 43 60 43 68 18 29 50 68 71 62 64 20 Example: insert the key 31 into the following B+-tree.

60 12 27 27 68 43 60 43 68 18 29 62 71 50 64 20 31 Example: insert the key 33 into the following B+-tree. need to insert into this node but it is full so split the node…

12 31 27 43 60 68 27 60 43 68 71 33 18 50 29 62 20 64 …the new node becomes a sibling of the node that was split so a new key value is inserted at the parent level…

12 31 27 43 60 68 27 60 43 68 71 33 18 50 29 62 20 64 Given that there was enough room in the parent node for the additional key value, there was no need to split the parent node and so the process is complete.

12 74 43 27 43 60 60 68 27 18 29 50 77 68 62 71 74 20 64 Exercise: insert the key 66 into the following B+-tree. need to insert into this node, but it is full so split…

43 43 74 68 50 77 71 For convenience we draw only the right half of the tree as the left half will not be changed… to left half of tree

Deletion from B+-trees • When we want to delete a key/value pair from a B+-tree we first search for the key in the tree to determine the leaf node that contains the key, we then delete the value from the disk (using the pointer that is found in the leaf node) and finally we remove the key/pointer pair from the leaf node. • if the leaf node from which the deletion occurred is still at least half full then we are done • if deletion from the leaf node causes the node to become less than half full then there are two cases to consider…

case 1: if one of the adjacent siblings has more than the minimum number of key/pointer pairs, then one pair can be taken from a sibling so that the node from which the deletion occurred remains at least half full. Appropriate adjustments will need to be made to keys at the parent level. Note the emphasis on the word sibling – a sibling of a node is another node that has the same parent. The algorithm becomes more complicated if you borrow a key/pointer pair from an adjacent node that has a different parent because changes in key values then propagate up a larger segment of the tree.

43 60 68 60 60 43 60 50 62 71 68 50 62 64 Example: delete key 68 from the following B+-tree. Note that only part of the tree is represented.

case 2: neither adjacent sibling has enough key/value pairs that one can be removed. However, in this case the node N from which the deletion occurred can be combined with one of the siblings M. Why? Note that node N is now less than half full. Node M is such that if we remove a key it will also be less than half full. Hence the keys in both N and M can be stored in a single node. When we combine the nodes (effectively deleting one of them) a key and pointer will have to be removed from the parent node. If the parent node still has the required minimum number of children, we are done. Otherwise we recursively apply the deletion algorithm at the parent level and continue up the tree.

60 12 27 68 60 74 27 43 43 50 18 29 77 68 71 62 54 64 74 Example: delete key 27 from the following B+-tree. Note that only part of the tree is represented. Delete from here Not a sibling node!

12 60 74 68 43 18 29 62 77 71 50 64 54

Efficiency of B+-trees B+-trees allow for searching, insertion and deletion of key/value pairs while needing to access the disk very few times for each operation. If the order of the tree is very large, it is rare for a node to have to be split or for two nodes to be merged. When this happens, it is extremely rare that this requires a split or merge operation at the parent level. Hence the number of disk accesses required to reorganize the B+-tree as a result of an insertion or deletion is negligible compared to the disk access required for the actual insert or delete operation. Keep in mind that even databases with millions of records can have corresponding B+-trees that have only three levels because the order of such trees can be in the hundreds (ie. each node can have hundreds of child nodes). By minimizing the number of levels, we minimize the number of disk accesses required for a search, insert or delete operation.

Insertion into B + -trees

Insertion into B + -trees

Presentation Transcript

Phylogenetic Trees Lecture 1

Chapter 5 Trees

Binomial heaps, Fibonacci heaps, and applications

Abstract Syntax

Chapter 6

Classification with Multiple Decision Trees

Advanced Data Structures

B-trees

From Gene Trees to Species Trees

Trees

Trees

Data Structures

Chapter 3 Graphs, Trees, and Tours

Discrete Mathematics

Search Trees: BSTs and B-Trees

Chapter 26 Supplement AVL Trees, Splay Trees, 2-4 Trees, Red - Black Trees

What Do People See When They Look At Trees ?

CHAPTER 9: Trees

“…as the days of a tree…”