1 / 11

Indexing

Indexing. Lecture 15. HW#3 & Project. See course page for new instructions: submit source code and output of program on the given pairs of actors Can use unix machines (Makefile provided) Filenames/locations have changed Look at oracle info on course web page

beau
Download Presentation

Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexing Lecture 15

  2. HW#3 & Project • See course page for new instructions: submit source code and output of program on the given pairs of actors • Can use unix machines (Makefile provided) • Filenames/locations have changed • Look at oracle info on course web page • I also have CD of oracle client • PROJECT: please form teams of 4 students max. You may choose your partners, else random teams will be formed. • Email TA your group members • Deadline: Wednesday 20th March

  3. Data Organization & Indexing • Constraints imposed by disk architecture • Seek time, rotational latency, transfer time • Basic unit of data transfer: a page or block • Comparison: Heap versus sorted files • Heap files: simply append at end • Sorted files: keep records sorted on some attribute • Need overflow pages when a page is full • Also use fill-factor to control initial level of filling • Indexing to speedup equality or range searches on “search key” • Integrated storage or separate index file and data file • Clustered or unclustered index • Dense or sparse index

  4. Multiple Attribute Search Key • CREATE INDEX Inx ON Tbl (att1, att2) • Search key is a sequence of attributes; index entries are lexically ordered • Supports finer granularity equality search: • Find row with value (A1, A2) • Supports range search: • Find rows with values between (A1, A2) and (A1’, A2’) • Supports partial key searches: • Find rows with values of att1 between A1 and A1’ • But not Find rows with values of att2 between A2 and A2’

  5. Two-Level Index - Separator level is a sparse index over pages of index entries - Leaf level contains index entries - Index entries might contain rows (integrated storage structure) or pointers to a separate storage structure (data file) in which case the index might be unclustered - Cost of searching the separator level << cost of searching leaf level since separator level is sparse - Cost (Q leaf pages, 100 entries/index page) = log2(Q/100)+1

  6. Multilevel Index - Search cost = number of levels in tree - If  is the fanout of a separator page, cost is logQ + 1 - Example: if  = 100 and Q = 10,000, cost = 3 (reduced to 2 if root is kept in main memory)

  7. ISAM: Indexed Sequential Access Method • ISAM is a static multilevel index • Built once at creation time and not modified thereafter • Not suitable for dynamic insertions and deletions • Inserts handled by overflow pages

  8. B+ Tree • Supports equality and range searches, multiple attribute keys and partial key searches • Either a separate index or the basis for a storage structure • Responds to dynamic changes in the table

  9. B+ Tree Structure - Leaf level is a (sorted) linked list of index entries - Sibling pointers support range searches in spite of allocation and deallocation of leaf pages

  10. Insertion and Deletion in B+ Tree • Structure of tree changes to handle row insertion and deletion - no overflow chains • Tree remains balanced: all paths from root to index entries have same length • Algorithm guarantees that the number of separator entries in an index page is between /2 and  ( is called order) • Hence the maximum search cost is log/2Q + 1 (with ISAM search cost depends on length of overflow chain)

  11. Handling Deletions • Deletion can cause page to have fewer than/2 entries • Entries can be redistributed over adjacent pages to maintain minimum occupancy requirement • Ultimately, adjacent pages must be merged, and if merge propagates up the tree, height might be reduced • In practice, tables generally grow, and merge algorithm is often not implemented • Reconstruct tree to compact it

More Related