1 / 69

Data Structures

Data Structures. Lecture 5 B-Trees. Eran Halperin and Hanoch Levy March 2014. How does a binary tree compare with k- ary tree?. Binary worse: Higher height Cost is logk Binary better: Lower width Cost is k OVERALL: BINARY BETTER! SO WHY BOTHER WITH K- ary ? .

eamon
Download Presentation

Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Structures Lecture 5 B-Trees Eran Halperin and Hanoch LevyMarch 2014

  2. How does a binary tree compare with k-ary tree? • Binary worse: Higher height • Cost is logk • Binary better: Lower width • Cost is k • OVERALL: BINARY BETTER! • SO WHY BOTHER WITH K-ary?

  3. Idealized computation model CPU RAM Each instruction takes one unit of time Each memory access takes one unit of time

  4. A more realistic model CPU Cache Disk Each level much larger but much slower RAM Information moved in blocks

  5. A simplified I/O mode CPU RAM Disk Each block is of size m. Count both operations and I/O operations

  6. Data structures in the I/O model Linked list and search trees behave poorly in the I/O model. Each pointer followed may cause a disk access We need an alternative for binary search treesthat is more suited to the I/O model B-Trees !

  7. A 4-node 10 25 42 key< 10 10 < key < 25 25 < key < 42 42 < key 3 keys 4-way branch

  8. An r-node … k0 • k1 • k2 • kr−3 • kr−2 c0 c1 c2 cr−2 cr−1 r−1 keys r-way branch

  9. B-Trees (with minimum degree d) Each node holds between d−1 and 2d −1 keys Each non-leaf node has between d and 2d children The root is special:has between 1and 2d −1 keys and between 2 and 2d children (if not a leaf) All leaves are at the same depth

  10. A 2-4 tree B-Tree with minimal degree d=2 13 4 6 10 15 28 1 3 30 40 50 14 5 7 11 16 17

  11. Node structure … k0 • k1 • k2 • kr-3 • kr-2 r –the degree c0 c1 c2 cr−2 cr−1 key[0],…key[r−2] –the keys item[0],…item[r−2] –the associated items child[0],…child[r−1] –the children leaf –is the node a leaf? Possibly a different representation for leaves

  12. The height of B-Trees • At depth 1 we have at least 2 nodes • At depth 2 we have at least 2dnodes • At depth 3 we have at least 2d2nodes • … • At depth h we have at least 2dh−1nodes

  13. Red-BlackTrees vs. B-Trees n = 230  109 30 ≤ height of Red-BlackTree ≤ 60 Up to 60pages read from disk Height of B-Tree with d=1000 is only 3 Each B-Tree node resides in a block/page Only 3 (or 4) pages read from disk Disk access  1 millisecond (10-3 sec) Memory access 100 nanosecond (10-7 sec)

  14. Look for k in node x Look for k in the subtree of node x Number of I/Os - logdn Number of operations – O(d logdn) Number of ops with binary search – O(log2d logdn) = O(log2n)

  15. B-Trees – What are they good for? • Large degree B-treesare used to represent very large disk dictionaries. The minimum degree d is chosen according to the size of a disk block. • Smaller degree B-trees used for internal-memory dictionaries to overcome cache-miss penalties. • B-trees with d=2, i.e., 2-4 trees, are very similar to Red-Black trees.

  16. Rotate right A B B A       Rotate left

  17. Split (a full node) B A C B A C d−1 d−1 d−1 d−1     Join

  18. Insert 13 5 10 15 28 1 3 30 40 50 14 6 11 16 17 Insert(T,2)

  19. Insert 13 5 10 15 28 1 2 3 30 40 50 14 6 11 16 17 Insert(T,2)

  20. Insert 13 5 10 15 28 1 2 3 30 40 50 14 6 11 16 17 Insert(T,4)

  21. Insert 13 5 10 15 28 1 2 3 4 30 40 50 14 6 11 16 17 Insert(T,4)

  22. Split 13 5 10 15 28 1 2 3 4 30 40 50 14 6 11 16 17 Insert(T,4)

  23. Split 13 5 10 15 28 2 30 40 50 14 1 3 4 6 11 16 17 Insert(T,4)

  24. Split 13 2 5 10 15 28 1 30 40 50 14 3 4 6 11 16 17 Insert(T,4)

  25. Splitting an overflowing node B A C B A C d d−1 d d−1    

  26. Another insert 13 2 5 10 15 28 1 30 40 50 14 3 4 6 11 16 17 Insert(T,7)

  27. Another insert 13 2 5 10 15 28 1 30 40 50 14 6 7 3 4 11 16 17 Insert(T,7)

  28. and another insert 13 2 5 10 15 28 1 30 40 50 14 6 7 3 4 11 16 17 Insert(T,8)

  29. and another insert 13 2 5 10 15 28 1 30 40 50 14 3 4 11 16 17 6 7 8 Insert(T,8)

  30. and the last for today 13 2 5 10 15 28 1 30 40 50 14 3 4 11 16 17 6 7 89 Insert(T,9)

  31. Split 13 2 5 10 15 28 7 1 30 40 50 14 3 4 8 9 11 6 16 17 Insert(T,9)

  32. Split 13 2 5 7 10 15 28 1 30 40 50 14 3 4 8 9 11 6 16 17 Insert(T,9)

  33. Split 13 5 2 7 10 15 28 1 30 40 50 14 3 4 8 9 11 6 16 17 Insert(T,9)

  34. Split 5 13 2 7 10 15 28 1 30 40 50 14 3 4 8 9 11 6 16 17 Insert(T,9)

  35. Insert – Bottom up • Find the insertion point by a downward search • Insert the key in the appropriate place • If the current node isoverflowing, split it • If its parent is now overflowing, split it, etc. • Disadvantages: • Need both a downward scan and an upward scan • Nodes are temporarily overflowing • Need to keep parents on a stack

  36. Split-Root(T) T.root C T.root C d−1 d−1 d−1 d−1     Number of I/Os – O(1) Number of operations – O(d)

  37. Split-Child(x,i) x key[i] x key[i] B A C B A x.child[i] x.child[i] C d−1 d−1 d−1 d−1     Number of I/Os – O(1) Number of operations – O(d)

  38. Insert – Top down • While conducting the search,splitfull children on the search pathbefore descending to them! Number of I/Os – O(logdn) Number of operations – O(d logdn) Amortized no. of splits – O(1)

  39. Insert – Top down Number of I/Os – O(logdn) Number of operations – O(d logdn) Amortized no. of splits – O(1) • Argument: • Each split increases # nodes by 1 • # nodes <= # values = #inserts •  # splits <= # inserts

  40. Bottom-UpDeletions fromB-Trees As always, similar, but slightly more complicated than insertions To delete an item in an internal node, replace it by its successor(or predecessor) and delete successor (or predecessor) To delete a leaf, delete the relevant key, and if the leaf has too few keys, fix the tree using rotations and joins.

  41. Split (a full node) B A C B A C d−1 d−1 d−1 d−1     Join

  42. Rotate right A B B A       Rotate left

  43. Delete 7 15 3 10 13 22 28 30 40 50 20 24 26 14 1 2 4 6 11 12 8 9 delete(T,26)

  44. Delete 7 15 3 10 13 22 28 30 40 50 20 24 14 1 2 4 6 11 12 8 9 delete(T,26)

  45. Delete 7 15 3 10 13 22 28 30 40 50 20 24 14 1 2 4 6 11 12 8 9 delete(T,13)

  46. Delete (Replace with predecessor) 7 15 3 10 12 22 28 30 40 50 20 24 14 1 2 4 6 11 12 8 9 delete(T,13)

  47. Delete 7 15 3 10 12 22 28 30 40 50 20 11 24 14 1 2 4 6 8 9 delete(T,13)

  48. Delete 7 15 3 10 12 22 28 30 40 50 20 11 24 14 1 2 4 6 8 9 delete(T,24)

  49. Delete 7 15 3 10 12 22 28 30 40 50 20 11 14 1 2 4 6 8 9 delete(T,24)

  50. Delete (steal from sibling) 7 15 3 10 12 22 30 40 50 20 11 28 14 1 2 4 6 8 9 delete(T,24)

More Related