1 / 66

Chap 7 . Indexing

Chap 7 . Indexing. Chapter Objectives(1). Introduce concepts of indexing that have broad applications in the design of file systems Introduce the use of a simple linear index to provide rapid access to records in an entry-sequenced, variable-length record file

Download Presentation

Chap 7 . Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chap 7. Indexing

  2. Chapter Objectives(1) • Introduce concepts of indexing that have broad applications in the design of file systems • Introduce the use of a simple linear index to provide rapid access to records in an entry-sequenced, variable-length record file • Investigate the implementation of the use of indexes for file maintenance • Introduce the template features of C++ for object I/O • Describe the object-oriented approach to indexed sequential files

  3. Chapter Objectives(2) • Describe the use of indexes to provide access to records by more than one key • Introduce the idea of an inverted list, illustrating Boolean operations on lists • Discuss of when to bind an index key to an address in the data file • Introduce and investigate the implications of self-indexing files

  4. Contents(1) 7.1 What is an Index? 7.2 A Simple Index for Entry-Sequenced Files 7.3 Using Template Classes in C++ for Object I/O 7.4 Object-Oriented Support for Indexed, Entry- Sequenced Files of Data Objects 7.5 Indexes That Are Too Large to Hold in Memory

  5. Contents(2) 7.6 Indexing to Provide Access by Multiple Keys 7.7 Retrieval Using Combinations of Secondary Keys 7.8 Improving the Secondary Index Structure: Inverted Lists 7.9 Selective Indexes 7.10 Binding

  6. Overview: Index(1) • Index: a data structure which associates given key values with corresponding record numbers • It is usually physically separate from the file (unlike for indexed sequential files tight binding). • Linear indexes (like indexes found at the back of books) • Index records are ordered by key value as in an ordered relative file • Best algorithm for finding a record with a specific key value is binary search • Addition requires reorganization

  7. Index File k1 k2 k4 k5 k7 k9 k1 k2 k4 k5 k7 k9 AAA ZZZ CCC XXX EEE FFF Data File Overview: Index(2)

  8. Overview: Index(3) • Tree Indexes (like those of indexed sequential files) • Hierarchical in that each level • Beginning with the root level, points to the next record • Leaves POINTs only the data file • Indexed Sequential File • Binary Tree Index • AVL Tree Index • B+ tree Index

  9. Roles of Index? • Index: keys and reference fields • Fast Random Accesses • Uniform Access Speed • Allow users to impose order on a file without actually rearranging the file • Provide multiple access paths to a file • Give user keyed access to variable-length record files

  10. A Simple Index(1) • Datafile • entry-sequenced, variable-length record • primary key : unique for each entry in a file • Search a file with key (popular need) • cannot use binary search in a variable-length record file(can’t know where the middle record) • construct an index object for the file • index object : key field + byte-offset field

  11. Datafile Indexfile Reference Address of Key Actual data record field record ANG3795 167 LON|2312|Romeo and Juliet|Prokofiev . . . 32 COL31809 353 RCA|2626|Quarter in C Sharp Minor . . . 77 DG139201 396 WAR|23699|Touchstone|Corea . . . 132 COL38358 211 ANG|3795|Sympony No. 9|Beethoven . . . 167 DG18807 256 COL|38358|Nebeaska|Springsteen . . . 211 FF245 442 DG|18807|Symphony No. 9|Beethoven . . . 256 LON2312 32 MER|75016|Coq d'or Suite|Rimsky . . . 300 MER75016 300 COL|31809|Symphony No. 9|Dvorak . . . 353 RCA2626 77 DG|139201|Violin Concerto|Beethoven . . . 396 WAR23699 132 FF|245|Good News|Sweet Honey In The . . . 442 A Simple Index (2)

  12. Key Reference field A Simple Index (3) • Index file: fixed-size record, sorted • Datafile: not sorted because it is entry sequenced • Record addition is quick (faster than a sorted file) • Can keep the index in memory • find record quickly with index file than with a sorted one • Class TextIndex encapsulates the index data and index operations

  13. Let’s See Figure 7.4 Class TextIndex{ public: TextIndex(int maxKeys = 100, int unique = 1); int Insert(const char*ckey, int recAddr); //add to index int Remove(const char* key); //remove key from index int Search(const char* key) const; //search for key, return recAddr void Print (ostream &) const; protected: int MaxKeys; // maximum num of entries int NumKeys;// actual num of entries char **Keys; // array of key values int* RecAddrs; // array of record references int Find (const chat* key) const; int Init (int maxKeys, int unique); int Unique;// if true --> each key must be unique }

  14. TextIndex::TextIndex TextIndex:: TextIndex (int maxKeys, int unique) : NumKeys (0), Keys(0), RecAddrs(0) {Init (maxKeys, unique);} TextIndex :: ~TextIndex () {delete Keys; delete RecAddrs;}

  15. TextIndex::Init int TextIndex :: Init (int maxKeys, int unique) { Unique = unique != 0; if (maxKeys <= 0) { MaxKeys = 0; return 0; } MaxKeys = maxKeys; Keys = new char *[maxKeys]; RecAddrs = new int [maxKeys]; return 1; }

  16. TextIndex::Insert int TextIndex :: Insert (const char * key, int recAddr) { int i; int index = Find (key); if (Unique && index >= 0) return 0; // key already in if (NumKeys == MaxKeys) return 0; //no room for another key for (i = NumKeys-1; i >= 0; i--) { if (strcmp(key, Keys[i])>0) break; // insert into location i+1 Keys[i+1] = Keys[i]; RecAddrs[i+1] = RecAddrs[i]; } Keys[i+1] = strdup(key); RecAddrs[i+1] = recAddr; NumKeys ++; return 1; }

  17. TextIndex::Remove int TextIndex :: Remove (const char * key) { int index = Find (key); if (index < 0) return 0; // key not in index for (int i = index; i < NumKeys; i++) { Keys[i] = Keys[i+1]; RecAddrs[i] = RecAddrs[i+1]; } NumKeys --; return 1; }

  18. TextIndex::Search int TextIndex :: Search (const char * key) const { int index = Find (key); if (index < 0) return index; return RecAddrs[index]; }

  19. TextIndex::Find int TextIndex :: Find (const char * key) const { for (int i = 0; i < NumKeys; i++) if (strcmp(Keys[i], key)==0) return i;// key found else if (strcmp(Keys[i], key)>0) return -1;// not found return -1;// not found }

  20. Index Implementation • Page 706~709 • G.1 Recording.h • G.2 Recording.cpp • G.3 Makerec.cpp • Page 710~712 • G.4 Textind.h • G.5 Textind.cpp

  21. IndexRecordingFile int IndexRecordingFile (char * myfile, TextIndex & RecordingIndex) { Recording rec; int recaddr, result; DelimFieldBuffer Buffer; // create a buffer BufferFile RecordingFile(Buffer); result = RecordingFile . Open (myfile,ios::in); if (!result) { cout << "Unable to open file "<<myfile<<endl; return 0; } while (1) // loop until the read fails { recaddr = RecordingFile . Read (); // read next record if (recaddr < 0) break; rec. Unpack (Buffer); RecordingIndex . Insert(rec.Key(), recaddr); cout << recaddr <<'\t'<<rec<<endl; } RecordingIndex . Print (cout); result = RetrieveRecording (rec, "LON2312", RecordingIndex, RecordingFile); cout <<"Found record: "<<rec; }

  22. RetrieveRecording int RetrieveRecording (Recording & recording, char * key, TextIndex & RecordingIndex, BufferFile & RecordingFile) // read and unpack the recording, return TRUE if succeeds {int result; cout <<"Retrieve "<<key<<" at recaddr "<<RecordingIndex.Search(key)<<endl; result = RecordingFile . Read (RecordingIndex.Search(key)); cout <<"read result: "<<result<<endl; if (result == -1) return FALSE; result = recording.Unpack (RecordingFile.GetBuffer()); return result; }

  23. Template Class for I/O Object(1) • Template Class RecordFile • we want to make the following code possible • Person p; RecordFile pFile; pFile.Read(p); • Recording r; RecordFile rFile; rFile.Read(r); • difficult to support files for different record types without having to modify the class • Template class which is derived from BufferFile • the actual declarations and calls • RecordFile <Person> pFile; pFile.Read(p); • RecordFile <Recording> rFile; rFile.Read(p);

  24. Template Class for I/O Object(2) • Template Class RecordFile template <class RecType> class RecordFile : public BufferFile{ public: int Read(RecType& record, int recaddr = -1); int Write(const RecType& record, int recaddr = -1); int Append(const RecType& record); RecordFile(IOBuffer& buffer) : BufferFile(buffer) {} }; //The template parameter RecType must have the following methods //int Pack(IOBuffer &); pack record into buffer //int Unpack(IOBuffer &); unpack record from buffer

  25. Template Class for I/O Object(3) • Adding I/O to an existing class RecordFile • add methods Pack and Unpack to class Recording • create a buffer object to use in the I/O • DelimFieldBuffer Buffer; • declare an object of type RecordFile<Recording> • RecordFile<Recording> rFile (Buffer); • Declaration and Calls • Recording r1, r2; • rFile.Open(“myfile”); • rFile.Read(r1); • rFile.Write(r2); Directly open a file and read and write objects of class Recording

  26. Object-Oriented Approach to I/O • Class IndexedFile • add indexed access to the sequential access provided by class RecordFile • extends RecordFile with Update, Append and Read method • Update & Append : maintain a primary key index of data file • Read : supports access to object by key • TextIndex, RecordFile ==> IndexedFile • Issues of IndexedFile • how to make a persistent index of a file • how to guarantee that the index is an accurate reflection of the contents of the data file

  27. Basic Operations of IndexedFile(1) • Create the original empty index and data files • Load the index file into memory • Rewrite the index file from memory • Add records to the data file and index • Delete records from the data file • Update records in the data file • Update the index to reflect changes in the data file • Retrieve records

  28. Basic Operations of TextIndexedFile (1) • Creating the files • initially empty files (index file and data file) created as empty files with header records • implementation ( makeind.cpp in Appendix G ) Create method in class BufferFile • Loading the index into memory • loading/storing objects are supported in the IOBuffer classes • need to choose a particular buffer class to use for an index file ( tindbuff.cpp in Appendix G ) • define class TextIndexBuffer as a derived class of FixedFieldBuffer to support reading and writing of index objects

  29. Basic Operations of TextIndexedFile(2) • Rewriting the index file from memory • part of the Close operation on an IndexedFile • write back index object to the index file • should protect the index when failure • write changes when out-of-date(use status flag) • Implementation • Rewind and Write operations of class BufferFile • Record Addition Add a new record to data file using RecordFile<Recording>::Write Add an entry to the index Requires rearrangement if in memory, no file access using TextIndex.Insert +

  30. Basic Operations of TextIndexedFile(3) • Record Deletion • data file: the records need not be moved • index: delete entry really or just mark it • using TextIndex::Delete • Record Updating (2 categories) • the update changes the value of the key field • delete/add approach • reorder both the index and the data file • the update does not affect the key field • no rearrangement of the index file • may need to reconstruct the data file

  31. Class TextIndexedFile(1) • Members • methods • Create, Open, Close, Read (sequential & indexed), Append, and Update operations • protected members • ensure the correlation between the index in memory (Index),the index file (IndexFile), and the data file (DataFile) • char* key() • the template parameter RecType must have the key method • used to extract the key value from the record

  32. Class TextIndexedFile(2) Template <class RecType> class TextIndexedFile { public: int Read(RecType& record); // read next record int Read(char* key, RecType& record) // read by key int Append(const RecType& record); int Update(char* oldKey, const RecType& record); int Create(char* name, int mode=ios::in|los::out); int Open(char* name, int mode=ios::in|los::out); int Close(); TextIndexedFile(IOBuffer & buffer, int keySize, int maxKeys=100); ~TextIndexedFile(); // close and delete protected: TextIndex Index; BufferFile IndexFile; TextIndexBuffer IndexBuffer; RecordFile<RecType> DataFile; char * FileName; // base file name for file int SetFileName(char* fName, char*& dFileName, char*&IdxFName); };

  33. TextIndexedFile 생성자/소멸자 template <class RecType> TextIndexedFile<RecType>::TextIndexedFile (IOBuffer & buffer, int keySize, int maxKeys) : DataFile(buffer), Index (maxKeys), IndexBuffer(keySize, maxKeys), IndexFile(IndexBuffer) { FileName = 0; } template <class RecType> TextIndexedFile<RecType>::~TextIndexedFile (){ Close(); }

  34. TextIndexedFile::Create int TextIndexedFile<RecType>::Create (char * fileName, int mode) // use fileName.dat and fileName.ind {int result; char * dataFileName, * indexFileName; result = SetFileName (fileName, dataFileName, indexFileName); cout <<"file names "<<dataFileName<<" "<<indexFileName<<endl; if (result == -1) return 0; result = DataFile.Create (dataFileName, mode); if (!result){ FileName = 0; // remove connection return 0; } result = IndexFile.Create (indexFileName, ios::out|ios::in); if (!result){ DataFile . Close(); // close the data file FileName = 0; // remove connection return 0; } return 1; }

  35. TextIndexedFile::Open template <class RecType> int TextIndexedFile<RecType>::Open (char * fileName, int mode) // open data and index file and read index file {int result; char * dataFileName, * indexFileName; result = SetFileName (fileName, dataFileName, indexFileName); if (!result) return 0; // open files result = DataFile.Open (dataFileName, mode); if (!result) { FileName = 0; return 0; } result = IndexFile.Open (indexFileName, ios::out); if (!result) { DataFile . Close(); FileName = 0; return 0; } // read index into memory result = IndexFile . Read (); if (result != -1) {result = IndexBuffer . Unpack (Index);if (result != -1) return 1; } DataFile.Close(); IndexFile.Close(); FileName = 0; return 0; }

  36. TextIndexedFile::Read template <class RecType> int TextIndexedFile<RecType>::Read (RecType & record) {return result = DataFile . Read (record, -1);} template <class RecType> int TextIndexedFile<RecType>::Read (char * key, RecType & record) { int ref = Index.Search(key); if (ref < 0) return -1; int result = DataFile . Read (record, ref); return result; }

  37. TextIndexedFile::Append template <class RecType> int TextIndexedFile<RecType>::Append (const RecType & record) { char * key = record.Key(); int ref = Index.Search(key); if (ref != -1) // key already in file return -1; ref = DataFile . Append(record); int result = Index . Insert (key, ref); return ref; }

  38. TextIndexedFile::Close template <class RecType> int TextIndexedFile<RecType>::Close () {int result; if (!FileName) return 0; // already closed! DataFile . Close(); IndexFile . Rewind(); IndexBuffer.Pack (Index); result = IndexFile . Write (); cout <<"result of index write: "<<result<<endl; IndexFile . Close (); FileName = 0; return 1; }

  39. TextIndexBuffer class TextIndexBuffer: public FixedFieldBuffer {public: TextIndexBuffer(int keySize, int maxKeys = 100, int extraFields = 0, int extraSize=0); // extraSize is included to allow derived classes to extend // the buffer with extra fields. // Required because the buffer size is exact. int Pack (const TextIndex &); int Unpack (TextIndex &); void Print (ostream &) const; protected: int MaxKeys; int KeySize; char * Dummy; // space for dummy in pack and unpack };

  40. TextIndexBuffer::TextIndexBuffer TextIndexBuffer::TextIndexBuffer (int keySize, int maxKeys,int extraFields, int extraSpace) : FixedFieldBuffer (1+2*maxKeys+extraFields, sizeof(int)+maxKeys*keySize+maxKeys*sizeof(int) + extraSpace) // buffer fields consist of numKeys, actual number of keys // Keys [maxKeys] key fields size = maxKeys * keySize // RecAddrs [maxKeys] record address fields size = maxKeys*sizeof(int) { MaxKeys = maxKeys; KeySize = keySize; AddField (sizeof(int)); for (int i = 0; i < maxKeys; i++) { AddField (KeySize); AddField (sizeof(int)); } Dummy = new char[keySize+1]; }

  41. TextIndexBuffer::Pack int TextIndexBuffer::Pack (const TextIndex & index) { int result; Clear (); result = FixedFieldBuffer::Pack (&index.NumKeys); for (int i = 0; i < index.NumKeys; i++) {// note only pack the actual keys and recaddrs result = result && FixedFieldBuffer::Pack (index.Keys[i]); result = result && FixedFieldBuffer::Pack (&index.RecAddrs[i]); } for (int j = 0; j<index.MaxKeys-index.NumKeys; j++) {// pack dummy values for other fields result = result && FixedFieldBuffer::Pack (Dummy); result = result && FixedFieldBuffer::Pack (Dummy); } return result; }

  42. TextIndexBuffer::Unpack int TextIndexBuffer::Unpack(TextIndex & index) { int result; result = FixedFieldBuffer::Unpack (&index.NumKeys); for (int i = 0; i < index.NumKeys; i++) {// note only pack the actual keys and recaddrs index.Keys[i] = new char[KeySize]; // just to be safe result = result && FixedFieldBuffer::Unpack (index.Keys[i]); result = result && FixedFieldBuffer::Unpack (&index.RecAddrs[i]); } for (int j = 0; j<index.MaxKeys-index.NumKeys; j++) {// pack dummy values for other fields result = result && FixedFieldBuffer::Unpack (Dummy); result = result && FixedFieldBuffer::Unpack (Dummy); } return result; }

  43. IndexRecordingFile int IndexRecordingFile (char * myfile,TextIndexedFile<Recording> & indexFile) {Recording rec; int recaddr, result; DelimFieldBuffer Buffer; // create a buffer BufferFile RecFile(Buffer); result = RecFile . Open (myfile,ios::in); if (!result) {cout << "Unable to open file "<<myfile<<endl; return 0; } while (1) // loop until the read fails {recaddr = RecFile . Read (); // read next record if (recaddr < 0) break; rec. Unpack (Buffer); indexFile . Append(rec); } Recording rec1; result = indexFile.Read ("LON2312", rec1); cout <<"Found record: "<<rec; }

  44. Enhancements to TextIndexedFile(1) • Support other types of keys • Restriction: the key type is restricted to string (char *) • Relaxation: support a template class SimpleIndex with parameter for key type • Support data object class hierarchies • Restriction: every object must be of the same type in RecordFile • Relaxation: the type hierarchy supports virtual pack methods

  45. Enhancements to TextIndexedFile(2) • Support multirecord index files • Restriction: the entire index fit in a single record • Relaxation: add protected method Insert, Delete, and Search to manipulate the arrays of index objects • Active optimization of operations • Obvious: the most obvious optimization is to use binary search in the Find method • Active: add a flag to the index object to avoid writing the index record back to the index file when it has not been changed

  46. Where are we going? • Plain Stream File • Persistency ==> Buffer support ==> BufferFile <incremental approach> Deriving BufferFile using various other classes • Random Access ==> Index support => IndexedFile <incremental approach> : Deriving TextIndexedFile using RecordFile and TextIndex

  47. Too Large Index(1) • On secondary storage (large linear index) • Disadvantages • binary searching of the index requires several seeks(slower than a sorted file) • index rearrangement requires shifting or sorting records on second storage • Alternatives (to be considered later) • hashed organization • tree-structured index (e.g. B-tree)

  48. Too Large Index (2) • Advantages over the use of a data file sorted by key even if the index is on the secondary storage • can use a binary search • sorting and maintaining the index is less expensive than doing the data file • can rearrange the keys without moving the data records if there are pinned records

  49. Index by Multiple Keys(1) • DB-Schema = ( ID-No, Title, Composer, Artist, Label) • Find the record with ID-NO “COL38358” (primary key - ID-No) • Find all the recordings of “Beethoven” (2ndary key - composer) • Find all the recordings titled “Violin Concerto” (2ndary key - title)

  50. BEETHOVEN DG18807 Index by Multiple Keys(2) • Most people don’t want to search only by primary key • Secondary Key • can be duplicated • Figure --> • Secondary Key Index • secondary key --> consult one additional index (primary key index)

More Related