1 / 36

CS4432: Database Systems II

CS4432: Database Systems II. Lecture #10. Professor Elke A. Rundensteiner. 39. 32. 38. 31. 33. 35. 34. 36. overflow area (not sequential). 10. 20. 30. Example Sequential Index continuous free space. 40. 50. 60. 70. 80. 90. Problems … Problems … Problems ….

langer
Download Presentation

CS4432: Database Systems II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner lecture #10 - b+ tree indexing

  2. 39 32 38 31 33 35 34 36 overflow area (not sequential) 10 20 30 Example Sequential Index continuous free space 40 50 60 70 80 90 lecture #10 - b+ tree indexing

  3. Problems … Problems … Problems … • Without re-organization we get unpredictable performance • Too much/often re-organization brings too much overhead • DBA does not know when to reorganize • DBA does not know how full to load pages of new index lecture #10 - b+ tree indexing

  4. Another type of index • Give up “sequentiality” of index • Predictable performance under updates • Achieve always balance of “tree” • Automate restructuring under updates lecture #10 - b+ tree indexing

  5. B+Tree Example n=3 100 Root 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35 lecture #10 - b+ tree indexing

  6. Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81k<95 95 lecture #10 - b+ tree indexing

  7. B+ Trees in Practice • Typical order: 100. Typical fill-factor: 67%. • average fanout = 133 • Typical capacities: • Height 4: 1334 = 312,900,700 records • Height 3: 1333 = 2,352,637 records • Can often hold top levels in buffer pool: • Level 1 = 1 page = 8 Kbytes • Level 2 = 133 pages = 1 Mbyte • Level 3 = 17,689 pages = 133 MBytes lecture #10 - b+ tree indexing

  8. Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81k<95 95 lecture #10 - b+ tree indexing

  9. Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85 lecture #10 - b+ tree indexing

  10. In textbook’s notation n=3 Leaf: Non-leaf: 30 35 30 35 30 30 lecture #10 - b+ tree indexing

  11. Size of nodes: n+1 pointers n keys (fixed) lecture #10 - b+ tree indexing

  12. Non-leaf: (n+1)/2 pointers • Leaf: (n+1)/2 pointers to data n=3 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 counts even if null lecture #10 - b+ tree indexing

  13. B+tree rules tree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer” lecture #10 - b+ tree indexing

  14. B+Tree Example : Searches 100 Root 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35 lecture #10 - b+ tree indexing

  15. Insert into B+tree (a) simple case • space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root lecture #10 - b+ tree indexing

  16. 32 n=3 100 (a) Insert key = 32 30 3 5 11 30 31 lecture #10 - b+ tree indexing

  17. 7 3 5 7 n=3 100 (a) Insert key = 7 30 3 5 11 30 31 lecture #10 - b+ tree indexing

  18. 160 180 160 179 n=3 100 (c) Insert key = 160 120 150 180 180 200 150 156 179 lecture #10 - b+ tree indexing

  19. 30 new root 40 40 45 n=3 (d) New root, insert 45 10 20 30 1 2 3 10 12 20 25 30 32 40 lecture #10 - b+ tree indexing

  20. Recap: Insert Data into B+ Tree • Find correct leaf L. • Put data entry onto L. • If L has enough space, done! • Else, must splitL (into L and a new node L2) • Redistribute entries evenly, copy upmiddle key. • Insert index entry pointing to L2 into parent of L. • This can happen recursively • To split index node, redistribute entries evenly, but push upmiddle key. (Contrast with leaf splits.) • Splits “grow” tree; root split increases height. • Tree growth: gets wider or one level taller at top. lecture #10 - b+ tree indexing

  21. Deletion from B+tree (a) Simple case (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf lecture #10 - b+ tree indexing

  22. n=3 100 (a) Delete key = 11 30 3 5 11 30 31 lecture #10 - b+ tree indexing

  23. n=4 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50 40 lecture #10 - b+ tree indexing

  24. n=4 (c) Redistribute keys • Delete 50 10 40 100 35 10 20 30 35 40 50 35 lecture #10 - b+ tree indexing

  25. new root 40 25 30 • (d) Coalese and Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45 lecture #10 - b+ tree indexing

  26. Delete Data from B+ Tree • Start at root, find leaf L where entry belongs. • Remove the entry. • If L is at least half-full, done! • If L has only d-1 entries, • Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). • If re-distribution fails, mergeL and sibling. • If merge occurred, must delete entry (pointing to L or sibling) from parent of L. • Merge could propagate to root, decreasing height. lecture #10 - b+ tree indexing

  27. Discussion of B-trees (vs. static indexed sequential files) • Concurrency control harder in B-Trees • B-tree consumes more space • B-tree automatically decides : • when to reorganize • how full to load pages of new index • Buffering • B-tree: has fixed buffer requirements • Static index: must read several overflow blocks to be efficient (variable size buffers) lecture #10 - b+ tree indexing

  28. Less space, so lookup faster Inserts managed by overflow area Requires temporary restructuring Unpredictable performance ComparisonB-tree vs. indexed seq. file • Consumes more space, so lookup slower • Each insert/delete potentially restructures • Build-in restructuring • Predictable performance lecture #10 - b+ tree indexing

  29. Speaking of buffering… Is LRU a good policy for B+tree buffers? • Of course not! •  Should try to keep root in memory at all times • (and perhaps some nodes from second level) lecture #10 - b+ tree indexing

  30. Interesting problem: For B+tree, how large should n be? … n is number of keys / node lecture #10 - b+ tree indexing

  31. assumptions:n children per node and N records in database • Time to read B-Tree node from disk is (tseek + tread*n) msec. • Once in main memory, use binary search to locate key, (a + b log_2 n) msec • Need to search (read) log_n (N) tree nodes • t-search = (tseek + tread*n + (a + b*log_2(n)) * log n (N) lecture #10 - b+ tree indexing

  32. Can get:f(n) = time to find a record f(n) noptn  FIND nopt by f’(n) = 0 • What happens to nopt as: • Disk gets faster? CPU get faster? … lecture #10 - b+ tree indexing

  33. Bulk Loading of B+ Tree • For large collection of records, create B+ tree. • Method 1: Repeatedly insert records  slow. • Method 2: Bulk Loading more efficient. lecture #10 - b+ tree indexing

  34. 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35* Bulk Loading of B+ Tree • Initialization: • Sort all data entries • Insert pointer to first (leaf) page in new (root) page. Root Sorted pages of data entries; not yet in B+ tree lecture #10 - b+ tree indexing

  35. Bulk Loading (Contd.) Root 10 20 • Index entries for leaf pages always entered into right-most index page • When this fills up, it splits. Split may go up right-most path to root. Data entry pages 6 12 23 35 not yet in B+ tree 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35* Root 20 10 Data entry pages 35 not yet in B+ tree 6 23 12 38 lecture #10 - b+ tree indexing 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35*

  36. Summary of Bulk Loading • Method 1: multiple inserts. • Slow. • Does not give sequential storage of leaves. • Method 2:Bulk Loading • Has advantages for concurrency control. • Fewer I/Os during build. • Leaves will be stored sequentially (and linked) • Can control “fill factor” on pages. lecture #10 - b+ tree indexing

More Related