1 / 32

CSE 326 Killer B ee -Trees

CSE 326 Killer B ee -Trees. Where was that root?. David Kaplan Dept of Computer Science & Engineering Autumn 2001. Beyond Binary Trees. One of the most important applications for search trees is databases

frankgarcia
Download Presentation

CSE 326 Killer B ee -Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 326Killer Bee-Trees Where was that root? David Kaplan Dept of Computer Science & Engineering Autumn 2001

  2. Beyond Binary Trees • One of the most important applications for search trees is databases • If the DB is small enough to fit into RAM, almost any scheme for balanced trees (e.g. AVL) is okay 2000 (WalMart) RAM – 1,000 MB (gigabyte) DB – 1,000,000 MB (terabyte) 1980 RAM – 1MB DB – 100 MB gap between disk and main memory growing! CSE 326 Autumn 2001 2

  3. Time Gap • For many corporate and scientific databases, the search tree must mostly be on disk • Accessing disk 200,000x slower than RAM! • Visiting node  accessing disk • Even perfectly balanced binary trees a disaster! log2( 10,000,000 ) = 24 disk accesses • What matters here? The relative speeds of memory and disk operations Goal: Decrease Height of Tree CSE 326 Autumn 2001 3

  4. Food Pyramidfor Silicon-Based Lifeforms cache on-chip silicon speed memory off-chip silicon “seams” disk ferro-magnetic space CSE 326 Autumn 2001 4

  5. Interpreting the Food Pyramid Speed becomes plentiful as we move up the pyramid Space gets (dirt) cheap as we move down the pyramid Physical seams • Disk access ~105 x slower than memory access • Memory access ~102 x slower than cache access Computational seam • Serialization/de-serialization can impose ~10x penalty Seams argue for data structures like B-Trees, because seams make the constants matter! CSE 326 Autumn 2001 5

  6. M-ary Search Tree • Maximum branching factor of M • Complete tree has depth = logMN • Each internal node in a complete tree has M - 1 keys runtime: CSE 326 Autumn 2001 6

  7. 3 7 12 21 B-Trees • B-Trees are specialized M-ary search trees • Each node has many keys • subtree between two keys x and y contains values v such that x  v < y • binary search within a node to find correct subtree • Each node takes one full {page, block, line} of memory x<3 3x<7 7x<12 12x<21 21x CSE 326 Autumn 2001 7

  8. B-Tree Properties‡ ‡Technically, B+-Trees • Properties • maximum branching factor of M • the root has between 2 and M children or at most L keys • other internal nodes have between M/2 and M children • internal nodes contain only search keys (no data) • smallest datum between search keys x and y equals x • each (non-root) leaf contains between L/2 andL keys • all leaves are at the same depth • Result • tree is (log n) deep • all operations run in (log n) time • operations pull in about M or L items at a time CSE 326 Autumn 2001 8

  9. B-Tree Properties • Properties • maximum branching factor of M • the root has between 2 and M children or at most L keys • other internal nodes have between M/2 and M children • internal nodes contain only search keys (no data) • smallest datum between search keys x and y equals x • each (non-root) leaf contains between L/2 andL keys • all leaves are at the same depth • Result • tree is (log n) deep • all operations run in (log n) time • operations pull in about M or L items at a time CSE 326 Autumn 2001 9

  10. B-Tree Properties • Properties • maximum branching factor of M • the root has between 2 and M children or at most L keys • other internal nodes have between M/2 and M children • internal nodes contain only search keys (no data) • smallest datum between search keys x and y equals x • each (non-root) leaf contains between L/2 andL keys • all leaves are at the same depth • Result • tree is (log n) deep • all operations run in (log n) time • operations pull in about M or L items at a time CSE 326 Autumn 2001 10

  11. B-Tree Properties • Properties • maximum branching factor of M • the root has between 2 and M children or at most L keys • other internal nodes have between M/2 and M children • internal nodes contain only search keys (no data) • smallest datum between search keys x and y equals x • each (non-root) leaf contains between L/2 andL keys • all leaves are at the same depth • Result • tree is (log n) deep • all operations run in (log n) time • operations pull in about M or L items at a time CSE 326 Autumn 2001 11

  12. When constants matter, or …When Big-O is Not Enough B-Tree is about logM/2n/(L/2) deep = logM/2n - logM/2L/2 = O(logM/2n) = O(logn) steps per operation (same as BST!) Where’s the beef?! log2( 10,000,000 ) = 24 disk accesses log200/2( 10,000,000 ) < 4 disk accesses Recall: disk access ~ 200,000x slower than RAM access, so 4 vs. 24 disk accesses is huge! CSE 326 Autumn 2001 12

  13. k2 … __ __ k1 k2 k1 … … kj … __ ki __ B-Tree Node Structure Internal node • i search keys; i+1 subtrees; M - i - 1 inactive entries 1 2 i M - 1 Leaf • j data keys; L - j inactive entries 1 2 j L CSE 326 Autumn 2001 13

  14. 26 25 1 2 3 5 6 9 10 11 12 15 17 20 30 50 70 32 33 36 40 42 50 60 30 10 40 3 15 20 Example B-Tree B-Tree with M = 4 and L = 4 CSE 326 Autumn 2001 14

  15. The empty B-Tree 3 3 14 M = 3 L = 2 Making a B-Tree Insert(3) Insert(14) Now, Insert(1)? CSE 326 Autumn 2001 15

  16. 14 1 3 14 3 14 Splitting the Root Too many keys in a leaf! 1 3 14 And create a new root Insert(1) 1 3 14 So, split the leaf. CSE 326 Autumn 2001 16

  17. 14 1 59 59 14 1 3 14 59 1 3 14 14 14 14 59 26 14 3 26 14 3 1 26 Insertions and Split Ends Too many keys in a leaf! Insert(26) Insert(59) 59 So, split the leaf. And add a new child

  18. 14 59 3 14 26 3 59 59 1 3 14 26 59 14 14 26 14 1 1 14 5 59 14 26 5 1 59 3 59 59 5 1 3 5 5 5 59 Propagating Splits Insert(5) Add new child 5 Too many keys in an internal node! Create a new root So, split the node.

  19. Insert the key in its leaf If the leaf ends up with L+1 items, overflow! Split the leaf into two nodes: original with (L+1)/2 items new one with (L+1)/2 items Add the new child to the parent If the parent ends up with M+1 items, overflow! If an internal node ends up with M+1 items, overflow! Split the node into two nodes: original with (M+1)/2 items new one with (M+1)/2 items Add the new child to the parent If the parent ends up with M+1 items, overflow! If root overflows, split it in two, hang new nodes under a new root Only way tree height can increase B-Tree Insertion CSE 326 Autumn 2001 19

  20. 5 14 89 5 1 3 5 14 26 59 59 89 1 3 5 14 26 59 79 59 14 After More Routine Inserts Insert(89) Insert(79) CSE 326 Autumn 2001 20

  21. Delete(59) 89 5 1 3 5 14 26 59 79 59 89 14 89 5 1 3 5 14 26 79 79 89 14 Deletion CSE 326 Autumn 2001 21

  22. 26 89 79 5 1 3 5 14 26 79 79 89 14 89 89 1 ? 1 79 26 14 3 3 3 3 14 89 79 79 14 14 89 Deletion and Adoption A leaf has too few keys! Delete(5) So, borrow from a neighbor

  23. 79 89 ? 3 1 3 14 26 79 79 89 89 89 14 79 79 26 14 1 89 14 1 79 14 26 14 89 Deletion with Propagation A leaf has too few keys! Delete(3) And no neighbor with surplus! But now a node has too few subtrees! So, delete the leaf

  24. Adopt a neighbor 79 89 1 14 26 79 79 89 89 14 1 14 26 79 89 14 Finishing the Propagation (More Adoption) CSE 326 Autumn 2001 24

  25. 79 89 14 1 14 26 79 89 89 26 14 26 79 89 79 A Bit More Adoption Delete(1) (adopt a neighbor) CSE 326 Autumn 2001 25

  26. 79 89 26 14 26 79 89 79 14 79 79 89 89 14 79 89 79 89 79 14 89 89 Pulling out the Root A leaf has too few keys! And no neighbor with surplus! So, delete the leaf Delete(26) But now the root has just one subtree! A node has too few subtrees and no neighbor with surplus! Delete the leaf

  27. 14 79 79 89 89 14 79 79 89 89 Pulling out the Root (cont.) The root has just one subtree! Just make the one child the new root! But that’s silly!

  28. Remove the key from its leaf If the leaf ends up with fewer than L/2 items, underflow! Adopt data from a neighbor; update the parent If borrowing won’t work, delete node and divide keys between neighbors If the parent ends up with fewer thanM/2 items, underflow! B-Tree Deletion Why will dumping keys always work if borrowing doesn’t? CSE 326 Autumn 2001 28

  29. If a node ends up with fewer than M/2 items, underflow! Adopt subtrees from a neighbor; update the parent If borrowing won’t work, delete node and divide subtrees between neighbors If the parent ends up with fewer than M/2 items, underflow! If the root ends up with only one child, make the child the new root of the tree Only way tree height can decrease B-Tree Deletion, part Bee CSE 326 Autumn 2001 29

  30. Thinking about B-Trees • B-Tree insertion can cause (expensive) splitting and propagation • Deletion can cause (cheap) borrowing or (expensive) propagation • Repeated insertions and deletion can cause thrashing • Propagation is rare if M and L are large (Why?) • Disk reads/writes occur on: • Find, but finds will often get filled from virtual memory • Insert or Delete, but usually affects only the leaf node (unless we over/underflow) • Overflow or underflow, but we size M and L to make these rare Here’s the beef! M = L = 128, h = 4  1285 = 30+ billion items! M = L = 128, h = 5  1286 = 4+ trillion items! … and, disk accesses are bounded by height! CSE 326 Autumn 2001 30

  31. Where was that root? Flavors of B- B-Trees with M=3, L=x are called 2-3 trees B-Trees withM=4,L=x are called 2-3-4 trees (internal nodes have 2, 3 or 4 children) Useful for achieving logM(n) bound, prototyping Disk-based systems typically require much larger M,L Base M,L on {page, block, line} and key sizes Branching factors from 50 – 2000 are common Paged out to disk, I think … CSE 326 Autumn 2001 31

  32. Dictionary Alternatives • BST: fast finds, inserts, and deletes O(log n) on average (if data is random!) • AVL trees: guaranteed O(log n) operations • Splay trees: guaranteed amortized O(log n) operations • B-Trees: guaranteed O(logMn) • Shallower depth better for disk-based databases • Hairy1 in practice: many cases, intricate code • What would be even better? • How about O(1) finds and inserts? 1hairy - Slang. Fraught with difficulties; hazardous: a hairy escape; hairy problems. CSE 326 Autumn 2001 32

More Related