CS4432: Database Systems II

CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner lecture #10 - b+ tree indexing

100 30 20 80 90 50 70 40 10 60 50 10 10 20 50 60 70 30 90 40 ... ... high Level (always sparse) Hierarchy of index structures Sequence field 1 2 3 4 5 first level (dense, if non- sequential) lecture #8 - indexing

Conventional indexes : pros/cons ? Advantage: - Simple - Index is sequential file good for scans - Search efficient for static data Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance - Then search time unpredictable lecture #8 - indexing

39 32 38 31 33 35 34 36 overflow area (not sequential) 10 20 30 Example Sequential Index continuous free space 40 50 60 70 80 90 lecture #10 - b+ tree indexing

Problems … Problems … Problems … • Without re-organization we get unpredictable performance • Too much/often re-organization brings too much overhead • DBA does not know when to reorganize • DBA does not know how full to load pages of new index lecture #10 - b+ tree indexing

So Let’s Try Another Index . . . • Give up “sequentiality” of index • Predictable performance under updates • Achieve always balance of “tree” • Automate restructuring under updates lecture #10 - b+ tree indexing

B+Tree Example n=3 100 Root 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35 lecture #10 - b+ tree indexing

B+ Trees in Practice • Typical order: 100. Typical fill-factor: 67%. • average fanout = 133 • Typical capacities: • Height 4: 1334 = 312,900,700 records • Height 3: 1333 = 2,352,637 records • Can often hold top levels in buffer pool: • Level 1 = 1 page = 8 Kbytes • Level 2 = 133 pages = 1 Mbyte • Level 3 = 17,689 pages = 133 Mbytes lecture #10 - b+ tree indexing

Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81k<95 95 lecture #10 - b+ tree indexing

Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 95 lecture #10 - b+ tree indexing

In textbook’s notation n=3 Leaf: Non-leaf: 30 35 30 35 30 30 lecture #10 - b+ tree indexing

Size of node n: n+1 pointers n keys (fixed) lecture #10 - b+ tree indexing

Non-leaf: (n+1)/2 pointers • Leaf: (n+1)/2 pointers to data n=3 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 counts even if null lecture #10 - b+ tree indexing

B+tree rules tree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records; except for the “sequence pointer” lecture #10 - b+ tree indexing

B+Tree Example : Searches 100 Root 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35 lecture #10 - b+ tree indexing

Insert into B+tree (a) simple case • space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root lecture #10 - b+ tree indexing

32 n=3 100 (a) Insert key = 32 30 3 5 11 30 31 lecture #10 - b+ tree indexing

7 3 5 7 n=3 100 (a) Insert key = 7 30 3 5 11 30 31 lecture #10 - b+ tree indexing

160 180 160 179 n=3 100 (c) Insert key = 160 120 150 180 180 200 150 156 179 lecture #10 - b+ tree indexing

30 new root 40 40 45 n=3 (d) New root, insert 45 10 20 30 1 2 3 10 12 20 25 30 32 40 lecture #10 - b+ tree indexing

Recap: Insert Data into B+ Tree • Find correct leaf L. • Put data entry onto L. • If L has enough space, done! • Else, must splitL (into L and a new node L2) • Redistribute entries evenly, copy upmiddle key. • Insert index entry pointing to L2 into parent of L. • This can happen recursively • To split index node, redistribute entries evenly, but push upmiddle key. (Contrast with leaf splits.) • Splits “grow” tree; root split increases height. • Tree growth: gets wider or one level taller at top. lecture #10 - b+ tree indexing

Deletion from B+tree (a) Simple case (b) Leaf-node: Coalesce with neighbor (sibling) (c) Leaf-node: Re-distribute keys (d) Cases (b) or (c) at non-leaf lecture #10 - b+ tree indexing

n=3 100 (a) Delete key = 11 30 3 5 11 30 31 lecture #10 - b+ tree indexing

n=4 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50 40 lecture #10 - b+ tree indexing

n=4 (c) Redistribute keys • Delete 50 10 40 100 35 10 20 30 35 40 50 35 lecture #10 - b+ tree indexing

new root 40 25 30 • (d) Coalese and Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45 lecture #10 - b+ tree indexing

Delete Data from B+ Tree • Start at root, find leaf L where entry belongs. • Remove the entry. • If L is at least half-full, done! • If L has only d-1 entries, • Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). • If re-distribution fails, mergeL and sibling. • If merge occurred, must delete entry (pointing to L or sibling) from parent of L. • Merge could propagate to root, decreasing height. lecture #10 - b+ tree indexing

Discussion of B-trees (vs. static indexed sequential files) • Concurrency control harder in B-Trees • B-tree consumes more space • B-tree automatically decides : • when to reorganize • how full to load pages of new index lecture #10 - b+ tree indexing

Less space, so lookup faster Inserts managed by overflow area Requires temporary restructuring Unpredictable performance ComparisonB-tree vs. indexed seq. file • Consumes more space, so lookup slower • Each insert/delete potentially restructures • Build-in restructuring • Predictable performance lecture #10 - b+ tree indexing

Speaking of buffering… Is LRU a good policy for B+tree buffers? • Of course not! • Should try to keep root in memory at all times • (and perhaps some nodes from second level) • Should keep the “path” when going down to leaves • (just in case of restructuring) lecture #10 - b+ tree indexing

Interesting problem: For B+tree, how large should n be? … n is number of keys / node lecture #10 - b+ tree indexing

assumptions:n children per node and N records in database • Time to read B-Tree node from disk is (tseek + tread*n) msec. • Once in main memory, use binary search to locate key, (a + b log_2 n) msec • Need to search (read) log_n (N) tree nodes • t-search = (tseek + tread*n + (a + b*log_2(n)) * log n (N) lecture #10 - b+ tree indexing

Can get:f(n) = time to find a record f(n) noptn  FIND nopt by f’(n) = 0 • What happens to nopt as: • Disk gets faster? CPU get faster? … lecture #10 - b+ tree indexing

Bulk Loading of B+ Tree • For large collection of records, create B+ tree. • Method 1: Repeatedly insert records  slow. • Method 2: Bulk Loading more efficient. lecture #10 - b+ tree indexing

3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35* Bulk Loading of B+ Tree • Initialization: • Sort all data entries • Insert pointer to first (leaf) page in new (root) page. Root Sorted pages of data entries; not yet in B+ tree lecture #10 - b+ tree indexing

Bulk Loading (Contd.) Root 10 20 • Index entries for leaf pages always entered into right-most index page • When this fills up, it splits. Split may go up right-most path to root. Data entry pages 6 12 23 35 not yet in B+ tree 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35* Root 20 10 Data entry pages 35 not yet in B+ tree 6 23 12 38 lecture #10 - b+ tree indexing 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35*

Summary of Bulk Loading • Method 1: multiple inserts. • Slow. • Does not give sequential storage of leaves. • Method 2:Bulk Loading • Has advantages for concurrency control. • Fewer I/Os during build. • Leaves will be stored sequentially (and linked) • Can control “fill factor” on pages. lecture #10 - b+ tree indexing

Summary B+ tree idea: self-balancing index structure that supports both search and insert/delete in log_n time. B+ tree is versatile : handles equality and range searches B+ tree and its variants: common index structure in industrial DBMSs lecture #10 - b+ tree indexing

CS4432: Database Systems II

CS4432: Database Systems II

Presentation Transcript

Mobile Database Systems

Chapter 4 The Database Management System Concept

Real-Time Database Systems and Data Services: Issues and Challenges

70420403- Database Systems 数据库管理系统

Principles of Database Management Systems 8: Concurrency Control

Using Management Information Systems

CS 328 Database Systems Chapter 8: SQL

CS411 Database Systems

CSC321: Database Management Systems

CS411 Database Systems

ICOM 5016 – Introduction to Database Systems

COP 4710: Database Systems Fall 2013 Chapter 19 – Normalization

Database Management Systems

Chapter 5 Relational Database Management Systems and SQL

数据库管理系统 Database Management Systems

Storage Systems

CS 253: Topics in Database Systems: C2

Database Systems The Relational Data Model

Temple University – CIS Dept. CIS616– Principles of Database Systems

Introduction to Database Systems Queries in SQL