cs4432 database systems ii n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CS4432: Database Systems II PowerPoint Presentation
Download Presentation
CS4432: Database Systems II

Loading in 2 Seconds...

play fullscreen
1 / 38
kameko-best

CS4432: Database Systems II - PowerPoint PPT Presentation

94 Views
Download Presentation
CS4432: Database Systems II
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner lecture #10 - b+ tree indexing

  2. 100 30 20 80 90 50 70 40 10 60 50 10 10 20 50 60 70 30 90 40 ... ... high Level (always sparse) Hierarchy of index structures Sequence field 1 2 3 4 5 first level (dense, if non- sequential) lecture #8 - indexing

  3. Conventional indexes : pros/cons ? Advantage: - Simple - Index is sequential file good for scans - Search efficient for static data Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance - Then search time unpredictable lecture #8 - indexing

  4. 39 32 38 31 33 35 34 36 overflow area (not sequential) 10 20 30 Example Sequential Index continuous free space 40 50 60 70 80 90 lecture #10 - b+ tree indexing

  5. Problems … Problems … Problems … • Without re-organization we get unpredictable performance • Too much/often re-organization brings too much overhead • DBA does not know when to reorganize • DBA does not know how full to load pages of new index lecture #10 - b+ tree indexing

  6. So Let’s Try Another Index . . . • Give up “sequentiality” of index • Predictable performance under updates • Achieve always balance of “tree” • Automate restructuring under updates lecture #10 - b+ tree indexing

  7. B+Tree Example n=3 100 Root 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35 lecture #10 - b+ tree indexing

  8. B+ Trees in Practice • Typical order: 100. Typical fill-factor: 67%. • average fanout = 133 • Typical capacities: • Height 4: 1334 = 312,900,700 records • Height 3: 1333 = 2,352,637 records • Can often hold top levels in buffer pool: • Level 1 = 1 page = 8 Kbytes • Level 2 = 133 pages = 1 Mbyte • Level 3 = 17,689 pages = 133 Mbytes lecture #10 - b+ tree indexing

  9. Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81k<95 95 lecture #10 - b+ tree indexing

  10. Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 95 lecture #10 - b+ tree indexing

  11. In textbook’s notation n=3 Leaf: Non-leaf: 30 35 30 35 30 30 lecture #10 - b+ tree indexing

  12. Size of node n: n+1 pointers n keys (fixed) lecture #10 - b+ tree indexing

  13. Non-leaf: (n+1)/2 pointers • Leaf: (n+1)/2 pointers to data n=3 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 counts even if null lecture #10 - b+ tree indexing

  14. B+tree rules tree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records; except for the “sequence pointer” lecture #10 - b+ tree indexing

  15. B+Tree Example : Searches 100 Root 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35 lecture #10 - b+ tree indexing

  16. Insert into B+tree (a) simple case • space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root lecture #10 - b+ tree indexing

  17. 32 n=3 100 (a) Insert key = 32 30 3 5 11 30 31 lecture #10 - b+ tree indexing

  18. 7 3 5 7 n=3 100 (a) Insert key = 7 30 3 5 11 30 31 lecture #10 - b+ tree indexing

  19. 160 180 160 179 n=3 100 (c) Insert key = 160 120 150 180 180 200 150 156 179 lecture #10 - b+ tree indexing

  20. 30 new root 40 40 45 n=3 (d) New root, insert 45 10 20 30 1 2 3 10 12 20 25 30 32 40 lecture #10 - b+ tree indexing

  21. Recap: Insert Data into B+ Tree • Find correct leaf L. • Put data entry onto L. • If L has enough space, done! • Else, must splitL (into L and a new node L2) • Redistribute entries evenly, copy upmiddle key. • Insert index entry pointing to L2 into parent of L. • This can happen recursively • To split index node, redistribute entries evenly, but push upmiddle key. (Contrast with leaf splits.) • Splits “grow” tree; root split increases height. • Tree growth: gets wider or one level taller at top. lecture #10 - b+ tree indexing

  22. Deletion from B+tree (a) Simple case (b) Leaf-node: Coalesce with neighbor (sibling) (c) Leaf-node: Re-distribute keys (d) Cases (b) or (c) at non-leaf lecture #10 - b+ tree indexing

  23. n=3 100 (a) Delete key = 11 30 3 5 11 30 31 lecture #10 - b+ tree indexing

  24. n=4 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50 40 lecture #10 - b+ tree indexing

  25. n=4 (c) Redistribute keys • Delete 50 10 40 100 35 10 20 30 35 40 50 35 lecture #10 - b+ tree indexing

  26. new root 40 25 30 • (d) Coalese and Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45 lecture #10 - b+ tree indexing

  27. Delete Data from B+ Tree • Start at root, find leaf L where entry belongs. • Remove the entry. • If L is at least half-full, done! • If L has only d-1 entries, • Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). • If re-distribution fails, mergeL and sibling. • If merge occurred, must delete entry (pointing to L or sibling) from parent of L. • Merge could propagate to root, decreasing height. lecture #10 - b+ tree indexing

  28. Discussion of B-trees (vs. static indexed sequential files) • Concurrency control harder in B-Trees • B-tree consumes more space • B-tree automatically decides : • when to reorganize • how full to load pages of new index lecture #10 - b+ tree indexing

  29. Less space, so lookup faster Inserts managed by overflow area Requires temporary restructuring Unpredictable performance ComparisonB-tree vs. indexed seq. file • Consumes more space, so lookup slower • Each insert/delete potentially restructures • Build-in restructuring • Predictable performance lecture #10 - b+ tree indexing

  30. Speaking of buffering… Is LRU a good policy for B+tree buffers? • Of course not! • Should try to keep root in memory at all times • (and perhaps some nodes from second level) • Should keep the “path” when going down to leaves • (just in case of restructuring) lecture #10 - b+ tree indexing

  31. Interesting problem: For B+tree, how large should n be? … n is number of keys / node lecture #10 - b+ tree indexing

  32. assumptions:n children per node and N records in database • Time to read B-Tree node from disk is (tseek + tread*n) msec. • Once in main memory, use binary search to locate key, (a + b log_2 n) msec • Need to search (read) log_n (N) tree nodes • t-search = (tseek + tread*n + (a + b*log_2(n)) * log n (N) lecture #10 - b+ tree indexing

  33. Can get:f(n) = time to find a record f(n) noptn  FIND nopt by f’(n) = 0 • What happens to nopt as: • Disk gets faster? CPU get faster? … lecture #10 - b+ tree indexing

  34. Bulk Loading of B+ Tree • For large collection of records, create B+ tree. • Method 1: Repeatedly insert records  slow. • Method 2: Bulk Loading more efficient. lecture #10 - b+ tree indexing

  35. 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35* Bulk Loading of B+ Tree • Initialization: • Sort all data entries • Insert pointer to first (leaf) page in new (root) page. Root Sorted pages of data entries; not yet in B+ tree lecture #10 - b+ tree indexing

  36. Bulk Loading (Contd.) Root 10 20 • Index entries for leaf pages always entered into right-most index page • When this fills up, it splits. Split may go up right-most path to root. Data entry pages 6 12 23 35 not yet in B+ tree 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35* Root 20 10 Data entry pages 35 not yet in B+ tree 6 23 12 38 lecture #10 - b+ tree indexing 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35*

  37. Summary of Bulk Loading • Method 1: multiple inserts. • Slow. • Does not give sequential storage of leaves. • Method 2:Bulk Loading • Has advantages for concurrency control. • Fewer I/Os during build. • Leaves will be stored sequentially (and linked) • Can control “fill factor” on pages. lecture #10 - b+ tree indexing

  38. Summary B+ tree idea: self-balancing index structure that supports both search and insert/delete in log_n time. B+ tree is versatile : handles equality and range searches B+ tree and its variants: common index structure in industrial DBMSs lecture #10 - b+ tree indexing