OS Support / Indexing

OS Support / Indexing Lecture 3

Query Execution Query or update User/ Application Query compiler Query execution plan Execution engine Record, index requests Index/record mgr. Page commands Buffer manager Read/write pages Storage manager Logging/Recovery disk

OS Support • Can't rely on OS • not sure if it will be done • inefficient • Lesson: if you want to make sure it will be done, check it yourself • Same thing going on with storage systems? • In general: • how much can we assume others will do for us? • how much we will have to do ourselves?

Indexes

Indexes • An index on a file speeds up selections on the search key field(s) • Search key = any subset of the fields of a relation • Search key is not the same as key(minimal set of fields that uniquely identify a record in a relation). • Entries in an index: (k, r), where: • k = the key • r = the record OR record id OR record ids

Index Classification • Clustered/unclustered • Clustered = records sorted in the key order • Unclustered = no • Dense/sparse • Dense = each record has an entry in the index • Sparse = only some records have • Primary/secondary • Primary = on the primary key • Secondary = on any key • Some books interpret these differently • B+ tree / Hash table / …

Clustered vs. Unclustered Index Data entries Dataentries (Index File) (Data file) DataRecords Data Records CLUSTERED UNCLUSTERED

B+ Trees • Search trees • Idea in B Trees: • make 1 node = 1 block • Idea in B+ Trees: • Make leaves into a linked list (range queries are easier)

B+ Trees Basics • Parameter d = the degree • Each node has >= d and <= 2d keys (except root) • Each leaf has >=d and <= 2d keys: Keys k < 30 Keys 120<=k<240 Keys 240<=k Keys 30<=k<120 Next leaf 40 50 60

B+ Tree Example d = 2 10 15 18 20 30 40 50 60 65 80 85 90

B+ Tree Design • How large d ? • Example: • Key size = 4 bytes • Pointer size = 8 bytes • Block size = 4096 byes • 2d x 4 + (2d+1) x 8 <= 4096 • d = 170

Searching a B+ Tree • Exact key values: • Start at the root • Proceed down, to the leaf • Range queries: • As above • Then sequential traversal Select name From people Where age = 25 Select name From people Where 20 <= age and age <= 30

B+ Trees in Practice • Typical order: 100. Typical fill-factor: 67%. • average fanout = 133 • Typical capacities: • Height 4: 1334 = 312,900,700 records • Height 3: 1333 = 2,352,637 records • Can often hold top levels in buffer pool: • Level 1 = 1 page = 8 Kbytes • Level 2 = 133 pages = 1 Mbyte • Level 3 = 17,689 pages = 133 MBytes

Insertion in a B+ Tree Insert (K, P) • Find leaf where K belongs, insert • If no overflow (2d keys or less), halt • If overflow (2d+1 keys), split node, insert in parent: • If leaf, keep K3 too in right node • When root splits, new root has 1 key only (K3, ) to parent

Insertion in a B+ Tree Insert K=19 10 15 18 20 30 40 50 60 65 80 85 90

Insertion in a B+ Tree After insertion 10 15 18 19 20 30 40 50 60 65 80 85 90

Insertion in a B+ Tree Now insert 25 10 15 18 19 20 30 40 50 60 65 80 85 90

Insertion in a B+ Tree After insertion 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Insertion in a B+ Tree But now have to split ! 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Insertion in a B+ Tree After the split 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Deletion from a B+ Tree Delete 30 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Deletion from a B+ Tree After deleting 30 May change to 40, or not 10 15 18 19 20 25 40 50 60 65 80 85 90

Deletion from a B+ Tree Now delete 25 10 15 18 19 20 25 40 50 60 65 80 85 90

Deletion from a B+ Tree After deleting 25 Need to rebalance Rotate 10 15 18 19 20 40 50 60 65 80 85 90

Deletion from a B+ Tree Now delete 40 10 15 18 19 20 40 50 60 65 80 85 90

Deletion from a B+ Tree After deleting 40 Rotation not possible Need to merge nodes 10 15 18 19 20 50 60 65 80 85 90

Deletion from a B+ Tree Final tree 10 15 18 19 20 50 60 65 80 85 90

B+ Tree Summary B+ tree and other indices ideal for range searches, good for equality searches • Inserts/deletes leave tree height-balanced; logF N cost • High fanout (F) means depth rarely more than 3 or 4 • Almost always better than maintaining a sorted file • Typically, 67% occupancy on average • Note: Order (d) concept replaced by physical space criterion in practice (“at least half-full”) • Records may be variable sized • Index pages typically hold more entries than leaves

Indexing One-Dimensional Data • Trees vs. hash tables • Trees • great for equality & range queries • Hash tables • faster for equality queries • cannot support range queries • Many commercial systems support only trees • though for certain contexts (e.g., index nested loop joins), hash-based index may be much better

Multi-Dimensional Data • Geographic info. systems • partial match queries, range, nearest-neighbor, where-am-i • Data cube • One-dimensional indexes offer a limited view, often not working well • This lead to the development of multi-dimensional indexes

Multi-Dimensional Indexes • Hash-table-like • grid files, partitioned hash functions • limit search to a subset of the buckets • Tree-like • kd trees, quad trees (for set of points), R-trees (for regions)

R-Trees • R trees • key = region, each node = a set of regions (actually a smallish region subsuming all such regions) • B+ trees • key = point, each node = a set of points

Root of R Tree Y Leaf level X The R-Tree • The R-tree is a tree-structured index that remains balanced on inserts and deletes. • Each key stored in a leaf entry is intuitively a box, or collection of intervals, with one interval per dimension. • Example in 2-D:

R-Tree Properties • Leaf entry = < n-dimensional box, rid > • Box is the tightest bounding box for a data object. • Non-leaf entry = < n-dim box, ptr to child node > • Box covers all boxes in child node (in fact, subtree) • All leaves at same distance from root (height balanced) • Nodes can be kept 50% full (except root) • Can choose a parameter m that is <= 50%, and ensure that every node is at least m% full

Example of an R-Tree Leaf entry Index entry R1 R4 Spatial object approximated by bounding box R8 R11 R3 R5 R13 R9 R8 R14 R10 R12 R7 R18 R17 R6 R16 R19 R15 R2

Example R-Tree (Contd.) R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R17 R18 R19 R16

Search for Objects Overlapping Box Q Start at root. 1. If current node is non-leaf, for each entry <E, ptr>, if boxE overlaps Q, search subtree identified by ptr 2. If current node is leaf, for each entry <E, rid>, if E overlaps Q, rid identifies an object that might overlap Q Note: May have to search severalsubtrees at each node! (In contrast, a B-tree equality search goes to just one leaf.)

Insert Entry <B, ptr> • Start at root and go down to “best-fit” leaf L • Go to child whose box needs least enlargement to cover B; resolve ties by going to smallest area child • This corresponds to the “penalty” in GiST • If best-fit leaf L has space, insert entry and stop • Otherwise, split L into L1 and L2 • Adjust entry for L in its parent so that the box now covers (only) L1 • Add an entry (in the parent node of L) for L2 • (This could cause the parent node to recursively split)

Splitting a Node During Insertion • The entries in node L plus the newly inserted entry must be distributed between L1 and L2 • Goal is to reduce likelihood of both L1 and L2 being searched on subsequent queries • Redistribute to minimize area of L1 plus area of L2 • Exhaustive algorithm is too slow; • quadratic and linear heuristics are • used GOOD SPLIT! BAD!

R-Tree Variants • TheR* tree uses forced reinserts to reduce overlap in tree nodes • When a node overflows, instead of splitting: • Remove some (say, 30% of the) entries and reinsert them into the tree • Could result in all reinserted entries fitting on some existing pages, avoiding a split • R* trees also use a different heuristic, minimizing box perimeters rather than box areas during insertion • Another variant, the R+ tree, avoids overlap by inserting an object into multiple leaves if necessary • Searches now take a single path to a leaf, at cost of redundancy

R Trees in Practice • One of many (dozens of!) multidimensional index structures • Perhaps the most popular, though not as ubiquitous as the B+ Tree • Implemented in a few DBMSs, such as Oracle, Illustra (Stonebraker startup, acquired by Informix, then IBM) and MySQL

Generalizing B+ Trees and R Trees: GiST • Fundamental goal: can we create an index creation toolkit that can be customized for any kind of index we’d like? • Doesn’t quite get there – but a nice framework for understanding the concepts • Clearly, there seem to be some aspects of the B+ Tree and R Tree that can be drawn out: • Height balanced – requires re-organization • Data on leaves; intermediate nodes help focus on a child • High fan-out • Used in PostgreSQL, SHORE

Key Ideas of GiST • What is *really* a search tree? • a hierarchy of partitions of a data set • each partition is characterized by, say, a "categorization" that is true for all data in the partition • elected-official(i) AND asian-american(i) • to search on a predicate q (e.g., range) • user provides a method to tell for each "categorization" if it is consistent with q • user also control partition creation with a node splitting method

Similarities and Differences • Key differences between B+ Trees and R Trees: • B+ Tree: • one-dimensional • full ordering among data • R Tree: • many-dimensional • no complete ordering among the data • means that intermediate nodes’ children may fit in more than one place • also, intermediate node’s bounding box may have lots of space that’s not occupied by child nodes (relates to “curse of dimensionality”) • needs to rely on heuristics about where to insert a node • may need to search many possible paths along the tree

Basic Operations that Need Customizing for Search • Contains(node, predicate) • Returns FALSE only if the node or its children can’t contain the predicate • Can return false positives • What does this correspond to in a B+ Tree? An R Tree? • What additional work needs to be done if contains() returns true?

Insertion – Simple Case • Union(entrySet) returns predicate • Returns a predicate (e.g., bounding box) that holds over a set (e.g., of children) – basically a disjunction • Compress(entry) returns entry’ • Simplifies the predicate of the entry, potentially increasing the chance of false positives • Decompress(entry) returns entry’ • The inverse of the above – may have false positives • Penalty(entry1, entry2) • Gives the cost of inserting entry2 into entry1

Insertion – Splitting • PickSplit(entrySet) returns <entrySet1, entrySet2> • Splits the set of children in an intermediate node into two sets • Typically split according to a “badness metric”

Basic Routines • How do we search for a node set satisfying a predicate? • Search(node, predicate) • For every nonleaf, if Consistent, call recursively on children; return union of result sets from children • For leaf, if Consistent, return singleton set, else empty set • Ordered domains: FindMin(root, predicate), Next(root, predicate, current) • Used to do a linear scan of the data at the leaves, if ordered

Basic Routines II • How do we insert a node? • Insert(node, new, level) • L = ChooseSubtree(node, new, level) • If room in L, insert new as a child, else invoke Split(node, L, new) • AdjustKeys(node, L) • ChooseSubtree(node, new, level) returns node at level • Recursively descend tree, minimizing Penalty • If at desired level, return node • Among child entries in node, find one with minimal Penalty, return ChooseSubtree on that child

Helper Functions • Split(root, node, new) • Invoke PickSplit on union of children of node and new • Put one partition into node and create a new node’ with the remainder • Insert all of node’ children into Parent(node) – if insufficient space, invoke PickSplit on this parent • Modify the predicate describing node • AdjustKeys(root, node) • If node = root, or predicate referring to node is correct, return • Otherwise, • modify predicate for node to contain the union of all keys of node • recursively call AdjustKeys(root, parent(node))

OS Support / Indexing