1 / 39

Indexes on Sequential Files

Indexes on Sequential Files. Source: our textbook, slides by Hector Garcia-Molina. How to Represent a Relation. Suppose we scatter its records arbitrarily among the blocks of the disk How to answer SELECT * FROM R? Scan every block: ridiculously slow

cforsythe
Download Presentation

Indexes on Sequential Files

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

  2. How to Represent a Relation • Suppose we scatter its records arbitrarily among the blocks of the disk • How to answer SELECT * FROM R? • Scan every block: • ridiculously slow • would require lots of overhead info in each block and each record header

  3. How to Represent a Relation • Reserve some blocks for the relation • No need to scan entire disk • How to answer SELECT * FROM R WHERE cond ? • Scan all the records in the reserved blocks • Still ridiculously slow

  4. Indexes • Use indexes -- special data structures -- that allow us to find all the records that satisfy a condition "efficiently" • Possible data structures: • simple indexes on sorted files • secondary indexes on unsorted files • B-trees • hash tables

  5. Sorted Files • Sorted file: records (tuples) of the file (relation) are in sorted order of the field (attribute) of interest. • This field might or might not be a key of the relation. • This field is called the search key. • A sorted file is also called a sequential file.

  6. Index on Sequential File • An index is another file containing key-pointer pairs of the form (K,a) • K is a search key • a is an address (pointer) • The record at address a has search key K • Particularly useful when the search key is the primary key of the relation

  7. Dense Indexes • An index with one entry for every key in the data file • What's the point? • Index is much smaller than data file when record contains much more than just the search key • If index is small enough to fit in main memory, record with a certain search key can be found quickly: binary search in memory, followed by only one disk I/O

  8. Sequential File Dense Index 90 70 50 30 10 110 120 40 20 80 100 60 70 50 30 10 90 20 100 80 60 40 Example of a Dense Index

  9. Some Numbers • relation with 1,000,000 tuples • block size is 4096 bytes • 10 records per block • thus 100,000 blocks, > 400 Mbytes • key field is 30 bytes • pointer is 8 bytes • thus at least 100 key-pointer pairs per block • thus dense index size is 10,000 blocks, about 40 Mbytes • since log(10,000) = 13, takes at most 14 disk I/O's for a search

  10. Sparse Index • Uses less space than a dense index • Requires more time to find a record with a given key • In a sparse index, there is just one (key,pointer) pair per data block. • The key is for the first record in the block.

  11. Sequential File Sparse Index 170 130 90 50 10 210 230 70 30 150 190 110 70 50 30 10 90 20 100 80 60 40 Sparse Index Example

  12. Using a Sparse Index • To find the record with key K, search the index for the largest key ≤ K • Use binary search to do this • Retrieve the indicated data block • Search the block for the record with key K

  13. Comparing Sparse and Dense Indexes • Sparse index uses much less space • In the previous numeric example, sparse index size is now only 1000 index blocks, about 4 Mbytes • Dense index, unlike sparse, lets us answer "is there a record with key K?" without having to retrieve a data block

  14. Multiple Levels of Index • Make an index for the index • Can continue this idea for more levels, but usually only two levels in practice • Second and higher level indexes must be sparse, otherwise no savings

  15. Sequential File Sparse 2nd level 330 170 10 90 130 10 50 490 210 170 70 110 30 570 230 410 150 250 90 190 90 70 50 30 10 100 80 60 40 20 Two-Level Index Example

  16. Numeric Example Again • Suppose we put a second-level index on the first-level sparse index • Since first-level index uses 1000 blocks and 100 key-pointer pairs fit per block, we need 10 blocks for second-level index • Very likely to keep the second-level index in memory • Thus search requires at most two disk I/O's (one for block of first-level index, one for data block)

  17. Duplicate Search Keys • What if more than one record has a given search key value? (Then the search key is not a key of the relation.) • Solution 1: Use a dense index and allow duplicate search keys in it. • To find all data records with search key K, follow all the pointers in the index with search key K

  18. 10 10 10 20 20 30 40 40 10 30 10 30 30 45 20 30 10 20 30 45 10 10 10 10 10 10 20 20 20 20 30 30 30 30 30 30 Solution 1 Example

  19. Duplicate Search Keys with Dense Index • Solution 2: only keep record in index for first data record with each search key value (saves some space in the index) • To find all data records with search key K, follow the one pointer in the index and then move forward in the data file

  20. 10 10 20 30 40 10 20 30 30 45 10 20 30 40 Solution 2 Example

  21. Duplicate Search Keys with Sparse Index • Recall that index has an entry for just the first data record in each block • To find all data records with key K: • find last entry (E1) in index with key ≤ K • move toward front of index until reaching entry (E2) with key < K • Check data blocks pointed to by entries from E2 to E1 for records with search key K

  22. 10 10 20 30 40 10 20 30 30 45 10 10 20 careful if looking for 20 or 30! 30 Dupl. Keys w/ Sparse Index

  23. Variation on Previous Scheme • Index entry for a data block holds smallest search key that is new (did not appear in a previous block) • If there is no new search key in that block, then index entry holds the lone search key in the block • To find all data record with key K: • search index for first entry whose key is either K, or < K but next key is > K • if a record with key K is in that block then scan forward from there

  24. 10 10 20 30 40 10 20 30 30 45 should this be 40? 10 20 30 30 Variation Example

  25. Inserting and Deleting Data Recall three main techniques: • create/delete overflow blocks • overflow blocks do not have entries in a sparse index • may be able to insert new blocks in sequential order • new block needs an entry in a sparse index • changing an index can create same problems • make room in a full block by sliding some data to an adjacent block; combine adjacent blocks if they get too empty

  26. General Strategy • When data file changes, index must adapt • Details depend on whether index is sparse or dense and how data file modifications are implemented • Index file is itself sequential, so same strategies as for modifying data files can be applied to index files

  27. Effects of Actions on Index

  28. Explanations for Actions • create/destroy empty overflow block has no effect on • dense index since it refers to records • sparse index since it refers to main records • create/destroy empty main block: • no effect on dense index as above • insert/delete entry in sparse index • insert/delete/slide record: • insert/delete/update entry in dense index • only change sparse index if affects first record in block

  29. 70 10 30 50 20 40 60 80 Deletion from sparse index 10 30 50 70 90 110 130 150

  30. 70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 40 10 30 50 70 90 110 130 150

  31. 10 30 50 70 20 40 60 80 40 40 Deletion from sparse index • delete record 30 10 30 50 70 90 110 130 150

  32. 10 30 50 70 20 40 60 80 50 70 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150

  33. 70 10 30 50 20 40 60 80 Deletion from dense index 10 20 30 40 50 60 70 80

  34. 10 70 50 20 60 80 40 40 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80

  35. 10 30 40 60 20 50 Insertion, sparse index case 10 30 40 60

  36. 10 30 40 60 20 50 34 • our lucky day! • we have free space • where we need it! Insertion, sparse index case • insert record 34 10 30 40 60

  37. 10 30 40 60 20 50 15 20 20 30 Insertion, sparse index case • insert record 15 10 30 40 60 • Illustrated: Immediate reorganization • Variation: • insert new block (chained file) • update index

  38. 10 30 40 60 25 20 50 overflow blocks (reorganize later...) Insertion, sparse index case • insert record 25 10 30 40 60

  39. Insertion, dense index case • Similar • Often more expensive . . .

More Related