1 / 44

Chapter 7 Indexing

File Structures by Folk, Zoellick, and Ricarrdi. Chapter 7 Indexing. Chapter Objectives. Index files. Operations Required to Maintain an Index File. Primary keys. Secondary keys. 7.1 What is an Index. Index. a tool for finding records in a file consists of: key field

msargent
Download Presentation

Chapter 7 Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File Structures by Folk, Zoellick, and Ricarrdi Chapter 7Indexing

  2. Chapter Objectives • Index files. • Operations Required to Maintain an Index File. • Primary keys. • Secondary keys.

  3. 7.1 What is an Index Index • a tool for finding records in a file • consists of: • key field • field on which the index is searched • reference (address or RRN) field • tells where to find the data file record associated with a particular key.

  4. Examples of an Index • book index • usually at the end of the book • arranged alphabetically by topic • The index in a library (an on-line catalog) allows you to locate items by an author, by a title, or by a call number. • photo thumbnails • usually represents a link to the actual photo • muchsmaller file, can be loaded quickly • actual photo takes much longer to load • if index was actual photos, would take long to load

  5. Book Index

  6. Example: Index in Databases • University uses an index file to keep track of its courses. • The data file consists of the following fields in each record: • Department • Title • Professor • Student List • Room & Time

  7. 7.2 A Simple Index Example: Primary key • Department • not specific enough • Course Number • not unique • Professor • not unique • Room & Time • possible • classes aren’t identified this way • Department + Course Number -> Obvious?

  8. Index file • It is used to provide rapid access to individual records in the data file via the keys • Example index file consists of the following fields: • key (e.g. CIS402) • reference (address) =address of the corresponding record in the data file

  9. 7.1 What Is an Index? Index File k1 k2 k4 k5 k7 k9 k1 k2 k4 k5 k7 k9 AAA ZZZ CCC XXX EEE FFF Data File Primary Index

  10. 7.2 A Simple Index for E-S Files Index Class Interface Class TextIndex{ public: TextIndex(int maxKeys = 100, int unique = 1); int Insert(const char*ckey, int recAddr); //add to index int Remove(const char* key); //remove key from index int Search(const char* key) const; //search for key, return recAddr void Print (ostream &) const; protected: int MaxKeys; // maximum num of entries int NumKeys;// actual num of entries char **Keys; // array of key values int* RecAddrs; // array of record references int Find (const chat* key) const; int Init (int maxKeys, int unique); int Unique;// if true --> each key must be unique }

  11. Operations on an Indexed File • Create (when data file is created) • Load into memory (whole file, if possible and prudent) • Write updated file to permanent storage • Record(s) added to data file • Record(s) deleted from data file • Update record(s) in data file • Searches

  12. Creating Files of Data Load Index • Create files • index file • data record file • via • buffer I/O • an array. Writing Back Index File • Can be part of the close operation for the index file • close function in index object can write the buffer/array to the disk before closing file

  13. Record addition • Adding a new data record to the data file requires adding a new record to the index file • If the index file is sorted: • adding a new record may require rearranging the records in this file. • depends upon index file representation in memory • if sort necessary, easily done if the indices are in main memory

  14. Record deletion • Deletion of a data record requires deletion of the corresponding index record. • Can space in data file be reclaimed? • Difficult, as with index file organization all data records are pinned • a pinned data record is one that has a reference to its address in an index file • Other consequences • Resorting difficulty • Solution: Sort the file via the indices

  15. Record Updating • Two categories of updates: • modification of key value • re-ordering of the index file might be required • two possible situations • modifying key reorders file • see below • modification of non-key value • might still require reordering of records in the data file. (WHY?) • size of data record might increase, requiring moving it to space that can hold it • must reset index for that record

  16. 7.5 Indexes That Are Too Large to Hold in Memory Indexes too large for Memory • kept on the secondary storage • disadvantages • timeconsumption • searching the index file • requires disk accesses instead of just memory accesses • rearranging indexes • requires disk accesses

  17. Solutions to Index Files in 2ndary storage • If the index file is too large to be kept in main memory than the following alternative organizations should be considered: • a hashed organization (if access speed is very important) • a tree structured organization, or a multilevel index such as a B-tree

  18. Pros of a simple index file • allows for use of binary search • sorting and maintaining an index is much easier than for a data file • true if index entries are much smaller than data records, • if data records are pinned, can rearrange keys without moving data records • apply them to multiple simple indexes...

  19. 7.6 Indexing to Provide Access by Multiple Keys Indexing with Multiple Key Access • unique primary key often used as a search keyword. • Example primary key • CS215 • What if you’d like to include the prof in the search? • Two keys: Course & Prof • Could also be: • Course & Time • Location & Time (?)

  20. Secondary key • A secondary key is a key for which multiple records may exist in the data file. • Example: • Sorting an Excel sheet using two fields (e.g. name & section) • A professor teaches more than one class

  21. Secondary Index File • create for the possible secondary indexes. • secondary keys can be shared • primary keys were unique • Example: • Professor El-Ramly secondary keys: • Primary keys containing this prof: • CS352 • CS215 • Can access those courses via the secondary key • What if course has multiple sections?

  22. Record Addition • Adding a record to the data file likely requires adding a record to the secondary index file. • Costs are similar to the cost of adding a record in the primary index file. • records might have to be shifted • indexes may have to be rearranged

  23. Record Deletion • must remove all references to that record in the file system. • search for primary key in primary index file • remove index • search in secondary index file • for the primary key of the record to be deleted • remove index from the secondary index file. • what if secondary keys are maintained? • secondary key refers to primary key • primary key will have been deleted, and will not exist • if we consider this possibility, don’t have to delete secondary key • pitfalls?

  24. Record Updating • There are three possible situations: • secondary key altered • may have to rearrange the secondary key index so it stays in sorted order • primary key altered • big impact on the primary key index • in the secondary key index only need to update the affected primary key field • confined to non-key fields • all updates that do not affect either the primary or secondary key fields do not affect the secondary key index, even if the update is substantial. • recall, can affect primary index, since that refers to location in data file

  25. Retrieving Data with Multiple Secondary Keys • Example: • All courses taught by Spiegel or Gordon • Requires two searches • searches produce a list of courses by providing primary keys. • Spiegel: CIS136, CIS235, CIS402 • Gordon: CIS425, CIS520, CIS243

  26. Boolean AND in searches • Example: • Search for courses: • taught by Spiegel • located in Lytle Hall • Courses found are in intersection of • courses taught by Spiegel • courses offered in Lytle Hall

  27. Boolean OR searches • Example: • Search for courses: • taught by Spiegel • located in Lytle Hall • Courses found are in union of • courses taught by Spiegel • courses offered in Lytle Hall

  28. 7.8 Improving the Secondary Index Structure Cons of the Current Secondary Index Structure • index file has to be rearranged every time a newrecord is added to the file. • for duplicate secondary keys, secondary key field is repeated for each entry.

  29. 7.8 Improving the Secondary Index Structure Cons of the Current Secondary Index Structure • Solution A: by an array of references • Solution B: by linking the list of references

  30. Improvements to the secondary index key structure • Solution 1 • Allow for multiple primary keys to be associated with a single secondary key by allocating a primary key list (STL vector is best; why?) for each secondary key entry. • Solves the problem of sorting each time when an new entry is added. • According to text: Suffers from internal fragmentation due to fixed nature of list, and the number of allocated entries in the array may prove too small. • STL (or Java) vector fixes this: How?

  31. Revised composer index Secondary key Set of primary key references BEETHOVEN ANG3795 DG139201 DG18807 RCA2626 COREA WAR23699 DVORAK COL31809 PROKOFIEV LON2312 RIMSKY-KORSAKOV MER75016 SPRINGSTEEN COL38358 SWEET HONEY IN THE R FF245 A. Array of References • * no need to rearrange • * limited reference array • * internal fragmentation

  32. A. Array of References • Allow for multiple primary keys to be associated with a single secondary key by allocating a primary key array (STL vector is best; why?) for each secondary key entry. • Solves the problem of sorting each time when an new entry is added. • According to text: Suffers from internal fragmentation due to fixed nature of list, and the number of allocated entries in the array may prove too small. • STL (or Java) vector fixes this: How?

  33. B. Inverted List • Solution B • Create an inverted listof indexes. Have each secondary key point to a list of primary key references associated with it. • This method eliminates most of the problems associated with maintaining a secondary index file. • Which solution is better?

  34. PROKOFIEV ANG36193 LON2312 Inverted Lists • Guidelines for better solution • no reorganization when adding • no limitation for duplicate key • no internal fragmentation • Solution B: by Linking the list of references • A list of primary key references • secondary key field, relative record number of the first corresponding primary key reference

  35. Improved revision of the composer index Secondary Index file Label ID List file BEETHOVEN 3 LON2312 -1 0 0 1 2 -1 COREA RCA2626 1 7 2 DVORAK WAR23699 -1 2 PROKOFIEV 3 ANG23699 10 8 3 4 4 RIMSKY-KORSAKOV COL38358 6 -1 5 SPINGSTEEN DG18807 4 1 5 6 SWEET HONEY IN THE R MER75016 9 -1 6 COL31809 -1 7 5 DG139201 8 FF245 -1 9 10 ANG36193 0 Linking List of References (1)

  36. Linking List of References (2) • The primary key references in a separate, entry-sequenced file • Advantages • rearranges only when secondary key changes • rearrangement is quick • less penalty associated with keeping the secondary index file on secondary storage (less need for sorting) • Label ID List file not need to be sorted • reusing the space of deleted record is easy

  37. Linking List of References (3) • Disadvantage • same secondary key references may not be physically grouped • lack of locality • could involve a large amount of seeking • solution: reside in memory • same Label ID list can hold the lists of a number of secondary index files • if too large in memory, can load only a part of it

  38. 7.9 Selective Indexes Selective Indexes • Selective Index: Index on a subset of records • Selective index contains only some part of entire index • provide a selective view • useful when contents of a file fall into several categories • e.g. 20 < Age < 30 and $1000 < Salary • e.g. Courses offered after 12 noon

More Related