1 / 37

CSE 480: Database Systems

CSE 480: Database Systems. Lecture 20: Indexing Structures. Index. Mechanism to efficiently locate row(s) of a table without having to scan the entire table Analogous to a book index. Index entry. Indexing Field. An index is built based on an indexing field

jhemphill
Download Presentation

CSE 480: Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 480: Database Systems • Lecture 20: Indexing Structures

  2. Index • Mechanism to efficiently locate row(s) of a table without having to scan the entire table • Analogous to a book index Index entry

  3. Indexing Field • An index is built based on an indexing field • Records having a particular value for their indexing field can be quickly located • The index can be built on one or more fields • Example: index on (CrsCode, Semester) • An index contains entries <field value, pointer to record>, ordered by field value • The index file occupies considerably less disk blocks than the data file because its entries are much smaller than the records themselves • A binary search on the index yields a pointer to the file record

  4. Example of an Index

  5. Types of Indexes • Dense vs sparse • Primary vs clustering vs secondary • Single-level vs multi-level • Static vs dynamic

  6. Types of Indexes • Dense index • has an index entry for each record in the data file. • Sparse (or nondense) index • has index entries for only a subset of the records in the data file

  7. Types of Indexes • Primary vs Clustering vs Secondary Index • Depends on the indexing and ordering fields • Indexing field is the field upon which an index is built • Ordering field is the field upon which the data file is sorted • Ordering key field is an ordering field that also corresponds to the key of the table Indexing field is ProfName Ordering field is ProfID ProfID is also the ordering key field (because it is a key)

  8. Types of Indexes • Primary Index • Specified on the ordering key fieldof an ordered file of records • Nondense (sparse) • One index entry for each block in the data file • The index entry has the key field value for the first record in the block, which is called the block anchor

  9. Primary Index Example

  10. Types of Indexes Clustering Index Specified on the ordering field that is not a key field Nondense (sparse) One index entry for each distinct value of the field; The index entry points to the first data block that contains records with that field value

  11. Clustering Index Example

  12. Types of Indexes • Secondary Index • Index defined on some non-ordering fieldof the data file • Dense • Includes one entry for each record in the data file • The index entry points to either a block or a record

  13. Secondary Index Example

  14. Example • EMPLOYEE(NAME, SSN, ADDRESS, JOB, SALARY, ... ) • Record size R=150 bytes • Block size B=512 bytes • No of records r = 30000 records • Blocking factor • Bfr = B  R = 512  150 = 3 records/block • Number of blocks needed to store the records • b = r/Bfr = 30000/3 = 10000 blocks

  15. Example • Suppose we need to perform the following query: SELECT * FROM EMPLOYEE WHERE SSN = ‘1234567890’; • Heap file (unsorted): • Average cost of linear search (assuming SSN exists): • (b/2) = 10000/2 = 5000 block accesses • Sequential file (sorted on SSN) • Average cost of binary search: • log2 b = log2 10000 = 14 block accesses

  16. Example • Suppose there is a (sparse) primaryindex on the SSN field • Field size for SSN, V = 10 bytes, • Record pointer size PR = 4 bytes • Index entry size • Ri = (V + PR) = (10 + 4) = 14 bytes • Index blocking factor • Bfri = B  Ri  = 512  14 = 36 entries/block • Number of index blocks • bi = b/Bfri = 10000/36 = 278 blocks • Binary search on index takes log2 bi = log2 278 = 9 block accesses Total cost = 10 block accesses (1 more to access the record itself)

  17. Example • Suppose there is a (dense) secondary index on the SSN field • Field size for SSN, V = 10 bytes, • Record pointer size PR = 4 bytes • Index entry size • Ri = (V + PR) = (10 + 4) = 14 bytes • Index blocking factor • Bfri = B  Ri  = 512  14 = 36 entries/block • Number of index blocks • bi = r/Bfri = 30000/36 = 834 blocks • Binary search needs log2 bi = log2 834 = 10 block accesses Total cost = 11 block accesses (1 more to access the record itself)

  18. Multi-Level Indexes • Because a single-level index is an ordered file, we can create an index to the index itself • In this case, the original index file is called the first-level index and the index to the index is called the second-level index • We can repeat the process, creating a third, fourth, ..., top level until all entries of the top level fit in one disk block • A multi-level index can be created for any type of first-level index (primary, secondary, clustering) as long as the first-level index has more than one disk block

  19. Example: Two-level Primary Index

  20. Example • Suppose there is a 2-level index on the SSN field (assume index at the first level is sparse, i.e., a primary index) • Field size for SSN, V = 10 bytes, • Record pointer size PR = 4 bytes • Index entry size • Ri = (V + PR) = (10 + 4) = 14 bytes • Index blocking factor • Bfri = B / Ri  = 512  14 = 36 entries/block • Number of index blocks at 1st level • b1 = b/Bfri = 10000/36 = 278 blocks • Number of index blocks at 2nd level • b2 = b1/Bfri = 278/36 = 8 blocks • Binary search needs log2 bi = log2 8 = 3 block accesses Total cost = 3 + 1 + 1 block accesses = 5 block accesses (2nd level) + (1st level) + (data block)

  21. Example • Suppose there is a 3-level index on the SSN field (assume index at first level is sparse) • Field size for SSN, V = 10 bytes, • Record pointer size PR = 4 bytes • Index entry size • Ri = (V + PR) = (10 + 4) = 14 bytes • Index blocking factor • Bfri = B  Ri  = 512  14 = 36 entries/block • Number of index blocks at 1st level • b1 = b/Bfri = 10000/36 = 278 blocks • Number of index blocks at 2nd level • b2 =  b1/Bfri = 278/36 = 8 blocks • Number of index blocks at 3rd level • b3 =  b2/Bfri = 8/32 = 1 block • Total cost = 1 + 1 + 1 + 1 block accesses = 4 block accesses (3rd level) + (2nd level) + (1st level) + (data block)

  22. Dynamic Multilevel Indexes • Insertion and deletion from multilevel indexes can be quite a severe problem • One possibility is to use overflow blocks and then do file reorganization periodically • A better strategy is to use dynamic multi-level indexes • Examples: B-tree and B+-tree (they are called search trees) • In B-Tree and B+-Tree, each node corresponds to a disk block • Each node is always kept at least half-full

  23. A Node in a Search Tree • Each node is stored in a disk block • Within each node: K1 < K2 < … < Kq-1 • For all values X in subtree pointed by Pi: • Ki-1 < X < Ki (B-tree) or Ki-1 X < Ki (B+-tree) A node in the search tree (B-tree or B+-tree)

  24. Search tree • A search tree of order p • each node contains at most p – 1 search values

  25. Nodes of a B+-tree • Difference between an internal node and a leaf node

  26. Example of a B+-tree Tree node pointer Data/record pointer Sibling pointer

  27. Example: Search for 9

  28. Largest value among those stored in the left subtree Example of Insertion in a B+-tree Insertion sequence: 8 5 1 7 3 12 9 6 ?

  29. Example of Insertion in a B+-tree Insertion sequence: 8 5 1 7 3 12 9 6 ?

  30. ? ? ? Example of Insertion in a B+-tree Insertion sequence: 8 5 1 7 3 12 9 6

  31. Example of Insertion in a B+-tree Insertion sequence: 8 5 1 7 312 9 6

  32. Example of Insertion in a B+-tree Insertion sequence: 8 5 1 7 312 9 6

  33. Example of Deletion in a B+-tree

  34. Example of Deletion in a B+-tree Deletion sequence: 5 12 9

  35. Underflow in a B+-tree • Underflow • when the number of entries in a node is below the minimum required • Redistribute entries with the left sibling or • This changes the search field values at higher levels of the tree • Redistribute entries with the right sibling or • Merge the 3 nodes into 2 nodes and redistribute the entries

  36. Example of Deletion in a B+-tree Deletion sequence: 5 12 9

  37. Example of Deletion in a B+-tree Final tree (after deletion)

More Related