Download
carnegie mellon univ dept of computer science 15 415 database applications n.
Skip this Video
Loading SlideShow in 5 Seconds..
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications PowerPoint Presentation
Download Presentation
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

143 Views Download Presentation
Download Presentation

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Carnegie Mellon Univ.Dept. of Computer Science15-415 - Database Applications Lecture#9: Indexing (R&G ch. 10)

  2. Outline • Motivation • ISAM • B-trees (not in book) • B+ trees • duplicates • B+ trees in practice CMU SCS 15-415

  3. Introduction • How to support range searches? • equality searches? CMU SCS 15-415

  4. Range Searches • ``Find all students with gpa > 3.0’’ • may be slow, even on sorted file • What to do? Data File Page N Page 3 Page 1 Page 2 CMU SCS 15-415

  5. Range Searches • ``Find all students with gpa > 3.0’’ • may be slow, even on sorted file • Solution: Create an `index’ file. Index File kN k2 k1 Data File Page N Page 3 Page 1 Page 2 CMU SCS 15-415

  6. Range Searches • More details: • if index file is small, do binary search there • Otherwise?? Index File kN k2 k1 Data File Page N Page 3 Page 1 Page 2 CMU SCS 15-415

  7. ISAM • Repeat recursively! Non-leaf Pages Leaf Pages CMU SCS 15-415

  8. ISAM • OK - what if there are insertions and overflows? Non-leaf Pages Leaf Pages CMU SCS 15-415

  9. Overflow page ISAM • Overflow pages, linked to the primary page Non-leaf Pages Leaf Pages Primary pages CMU SCS 15-415

  10. Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* Example ISAM Tree • 2 entries per page CMU SCS 15-415

  11. ISAM Details • format of an index page? • how full would a newly created ISAM be? CMU SCS 15-415

  12. P K P P K P K m 2 0 1 m 1 2 ISAM Details • format of an index page? • how full would a newly created ISAM be? • ~80-90% (not 100%) CMU SCS 15-415

  13. ISAM is a STATIC Structure • that is, index pages don’t change • File creation: Leaf (data) pages allocated sequentially, sorted by search key; then index pages allocated, then overflow pgs. CMU SCS 15-415

  14. ISAM is a STATIC Structure • Search: Start at root; use key comparisons to go to leaf. • Cost = log F N ; • F = # entries/pg (i.e., fanout), • N = # leaf pgs CMU SCS 15-415

  15. ISAM is a STATIC Structure Insert: Find leaf that data entry belongs to, and put it there. Overflow page if necessary. Delete: Find and remove from leaf; if empty page, de-allocate. CMU SCS 15-415

  16. 48* 23* 42* Example: Insert 23*, 48*, 41*, 42* Root 40 Index Pages 51 63 20 33 Primary Leaf 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* Pages 41* Overflow Pages CMU SCS 15-415

  17. Root 40 Index Pages 51 63 20 33 Primary Leaf 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* Pages 41* 48* 23* Overflow Pages 42* ... then delete 42*, 51*, 97* • Note that 51* appears in index levels, but not in leaf! CMU SCS 15-415

  18. ISAM ---- Issues? • Pros • ???? • Cons • ???? CMU SCS 15-415

  19. Outline • Motivation • ISAM • B-trees (not in book) • B+ trees • duplicates • B+ trees in practice CMU SCS 15-415

  20. B-trees • the most successful family of index schemes (B-trees, B+-trees, B*-trees) • Can be used for primary/secondary, clustering/non-clustering index. • balanced “n-way” search trees CMU SCS 15-415

  21. B-trees [Rudolf Bayer and McCreight, E. M. Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1, 173-189, 1972.] CMU SCS 15-415

  22. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Eg., B-tree of order d=1: CMU SCS 15-415

  23. pn p1 … vn-1 v1 v2 B - tree properties: • each node, in a B-tree of order d: • Key order • at most n=2d keys • at least d keys (except root, which may have just 1 key) • all leaves at the same level • if number of pointers is k, then node has exactly k-1 keys • (leaves are empty) CMU SCS 15-415

  24. Properties • “block aware” nodes: each node -> disk page • O(log (N)) for everything! (ins/del/search) • typically, if d = 50 - 100, then 2 - 3 levels • utilization >= 50%, guaranteed; on average 69% CMU SCS 15-415

  25. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • Algo for exact match query? (eg., ssn=8?) CMU SCS 15-415

  26. JAVA animation! http://slady.net/java/bt/ strongly recommended! CMU SCS 15-415

  27. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13 CMU SCS 15-415

  28. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13 CMU SCS 15-415

  29. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13 CMU SCS 15-415

  30. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 H steps (= disk accesses) >9 <9 >6 3 1 7 13 CMU SCS 15-415

  31. Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415

  32. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415

  33. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415

  34. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415

  35. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415

  36. B-trees: Insertion • Insert in leaf; on overflow, push middle up (recursively) • split: preserves B - tree properties CMU SCS 15-415

  37. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Easy case: Tree T0; insert ‘8’ CMU SCS 15-415

  38. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Tree T0; insert ‘8’ 8 CMU SCS 15-415

  39. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Hardest case: Tree T0; insert ‘2’ 2 CMU SCS 15-415

  40. 13 B-trees Hardest case: Tree T0; insert ‘2’ 6 9 2 1 3 7 push middle up CMU SCS 15-415

  41. 2 13 1 3 B-trees Hardest case: Tree T0; insert ‘2’ Ovf; push middle 2 6 9 7 CMU SCS 15-415

  42. 13 1 3 B-trees Hardest case: Tree T0; insert ‘2’ 6 Final state 9 2 7 CMU SCS 15-415

  43. B-trees: Insertion • Insert in leaf; on overflow, push middle up (recursively – ‘propagate split’) • split: preserves all B - tree properties (!!) • notice how it grows: height increases when root overflows & splits • Automatic, incremental re-organization (contrast with ISAM!) CMU SCS 15-415

  44. Pseudo-code INSERTION OF KEY ’K’ find the correct leaf node ’L’; if ( ’L’ overflows ){ split ’L’, and push middle key to parent node ’P’; if (’P’ overflows){ repeat the split recursively; } else{ add the key ’K’ in node ’L’; /* maintaining the key order in ’L’ */ } CMU SCS 15-415

  45. Overview • ... • B – trees • Dfn, Search, insertion, deletion • ... CMU SCS 15-415

  46. Deletion Rough outline of algo: • Delete key; • on underflow, may need to merge In practice, some implementors just allow underflows to happen… CMU SCS 15-415

  47. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ CMU SCS 15-415

  48. 6 9 <6 >9 <9 >6 1 7 13 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ CMU SCS 15-415

  49. B-trees – Deletion • Case1: delete a key at a leaf – no underflow • Case2: delete non-leaf key – no underflow • Case3: delete leaf-key; underflow, and ‘rich sibling’ • Case4: delete leaf-key; underflow, and ‘poor sibling’ CMU SCS 15-415

  50. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees – Deletion • Case1: delete a key at a leaf – no underflow (delete 3 from T0) CMU SCS 15-415