Efficient Disk-Based B+ Trees for Maximum Performance and Disk Utilization

Ch 4dB+ trees Mark McKenney

Lots of trees, but what happens when memory fills up? • Performance tanks! • All the trees we have seen so far assume that they fit in memory • When memory fills….. Disk paging comes into play. • To traverse a tree, we need to access nodes that are stored non-sequentially in memory • How big is a node? • (a couple of ints and pointers are around 4+4+8+8 = 24 bytes) • What is the minimum amount of data that can be read from memory • (usually a word) • What is the minimum amount of memory that can be read from disk? • (usually a page: 4kb) • So, if a node is stored on a unique page, we are wasting 4096-24 = 4072 bytes per read • 257 reads requires 1 MB data transfer for 6kb of actual data

So, lets generalize a binary tree to Disk… a B tree • Actually, a B+ tree • B trees came out first, are harder, and more complicated\ • Approach: • Make a node the size of a disk page (fixed!) • Make sure that no node is too empty • Make sure that the tree is balanced • What if actual data is too big to fit in a disk page • Use a Key to index the actual data, and store the data on disk in a separate file • Advantages • Maximum disk performance • Persistence!!!! • Buffering in terms of disk pages • This is all very database oriented

B+Tree Example n=3 100 Root Keys in the tree (stored in its own file) 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35 Data in a separate file

Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81k<95 95

Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85

Size of nodes: n+1 pointers n keys (fixed)

Don’t want nodes to be too empty • Use at least Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data

n=3 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 counts even if null

B+tree rules tree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer”

(3) Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n+1 n (n+1)/2 (n+1)/2- 1 Leaf (non-root) n+1 n (n+1)/2 (n+1)/2 Root n+1 n 1 1

Insert into B+tree (a) simple case • space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root

n=3 100 (a) Insert key = 32 30 3 5 11 30 31

32 n=3 100 (a) Insert key = 32 30 3 5 11 30 31

n=3 100 (a) Insert key = 7 30 3 5 11 30 31

3 5 7 n=3 100 (a) Insert key = 7 30 3 5 11 30 31

7 3 5 7 n=3 100 (a) Insert key = 7 30 3 5 11 30 31

n=3 100 (c) Insert key = 160 120 150 180 180 200 150 156 179

160 179 n=3 100 (c) Insert key = 160 120 150 180 180 200 150 156 179

180 160 179 n=3 100 (c) Insert key = 160 120 150 180 180 200 150 156 179

160 180 160 179 n=3 100 (c) Insert key = 160 120 150 180 180 200 150 156 179

n=3 (d) New root, insert 45 10 20 30 1 2 3 10 12 20 25 30 32 40

40 45 n=3 (d) New root, insert 45 10 20 30 1 2 3 10 12 20 25 30 32 40

40 40 45 n=3 (d) New root, insert 45 10 20 30 1 2 3 10 12 20 25 30 32 40

30 new root 40 40 45 n=3 (d) New root, insert 45 10 20 30 1 2 3 10 12 20 25 30 32 40

Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf

n=4 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50

40 n=4 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50

n=4 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50

35 35 n=4 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50

(d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

30 • (d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

40 30 • (d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

new root 40 25 30 • (d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

B+tree deletions in practice • Often, coalescing is not implemented • Too hard and not worth it!

Characteristics • B+ trees are typically short and bushy • Want searches to touch few nodes since they are on disk • For 100 elements in a node • A tree of height 1 can index 100 items • A tree of height 2 can index 100 * 100 items = 10,000 • A tree of height 3 can index 100*100*100 items = 1,000,000 • So, we can find an item in that tree by looking at 3 nodes, despite the huge number of items • Equates to 3 disk reads. Very IO efficient • Databases make heavy use of B trees (usually B+ trees)

A final note • How to locate an element in a node? • They are sorted… use a binary search!

So.. Complexity? • We now have a new type of complexity • IO complexity • IO’s are disk (secondary storage) IO’s, the slowest IO’s in a computer system… • So we need an IO complexity as well as a computational complexity, but IO complexity reigns • So, for a B+ tree with a min nodes and b max nodes and block size (disk page size) of B • Number of leaf blocks is O(n/B) • IO complexity for all operations is O(logBn) • Height of tree is Ω(logan) and O(logbn) • Time complexity to find is between Ω( f(a) logan ) and O( f(b) logbn ) • Where f(b)is the time to find an element in a node

Always remember your bandwidth http://hothardware.com/News/Homing-Pigeon-Faster-Than-Internet-in-Data-Transfer/ Time to transfer 4GB at 2.04MB per second is …… 4 hours, 39 minutes, and 37 sec Time to transfer 2.57 PB == 2570000GB at 2.04Mbits per second is 130821 Days 12 Hours 32 Minutes 13.54 Seconds == 358 years! Size of a hard drive: .01 cubic foot Cargo capacity of a Toyota Yaris: 25.7 cubic feet Number of hard drives I can transport: 2570 If these are 1 TB hard drives, that’s 2.57 PB == roughly 20.56 peta bits Time to drive to Chicago: 5hrs == 18000 seconds Which gives a bandwidth of 1.14 Tbits/second == 142 GB/second And so the saying is: “Never underestimate the bandwidth of a station wagon loaded with hard drives hurtling down the highway at 70mph”

Efficient Disk-Based B+ Trees for Maximum Performance and Disk Utilization

Efficient Disk-Based B+ Trees for Maximum Performance and Disk Utilization

Presentation Transcript

4d 6d

Ch 7 B

4D Lotto

4D Live

4d lottery

Check 4d

4d result

4d Malaysia

4d Result

check 4d

4D toto

4D History

4d-kinG

4D ONLINE