Chap 7 indexing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 48

Chap 7 . Indexing PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on
  • Presentation posted in: General

File Structures by Folk, Zoellick, and Ricarrdi. Chap 7 . Indexing. 서울대학교 컴퓨터공학과 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수. Chapter Objectives(1). Introduce concepts of indexing that have broad applications in the design of file systems

Download Presentation

Chap 7 . Indexing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chap 7 indexing

File Structures by Folk, Zoellick, and Ricarrdi

Chap 7. Indexing

서울대학교 컴퓨터공학과

객체지향시스템연구실

SNU-OOPSLA-LAB

김 형 주 교수

SNU-OOPSLA Lab.


Chapter objectives 1

Chapter Objectives(1)

  • Introduce concepts of indexing that have broad applications in the design of file systems

  • Introduce the use of a simple linear index to provide rapid access to records in an entry-sequenced, variable-length record file

  • Investigate the implementation of the use of indexes for file maintenance

  • Introduce the template features of C++ for object I/O

  • Describe the object-oriented approach to indexed sequential files

SNU-OOPSLA Lab.


Chapter objectives 2

Chapter Objectives(2)

  • Describe the use of indexes to provide access to records by more than one key

  • Introduce the idea of an inverted list, illustrating Boolean operations on lists

  • Discuss of when to bind an index key to an address in the data file

  • Introduce and investigate the implications of self-indexing files

SNU-OOPSLA Lab.


Contents 1

Contents(1)

7.1 What is an Index?

7.2 A Simple Index for Entry-Sequenced Files

7.3 Using Template Classes in C++ for Object I/O

7.4 Object-Oriented Support for Indexed, Entry- Sequenced Files of Data Objects

7.5 Indexes That Are Too Large to Hold in Memory

SNU-OOPSLA Lab.


Contents 2

Contents(2)

7.6 Indexing to Provide Access by Multiple Keys

7.7 Retrieval Using Combinations of Secondary Keys

7.8 Improving the Secondary Index Structure: Inverted Lists

7.9 Selective Indexes

7.10 Binding

SNU-OOPSLA Lab.


Overview index 1

7.1 What Is an Index?

Overview: Index(1)

  • Index: a data structure which associates given key values with corresponding record numbers

  • It is usually physically separate from the file (unlike for indexed sequential files tight binding).

  • Linear indexes (like indexes found at the back of books)

    • Index records are ordered by key value as in an ordered relative file

    • Best algorithm for finding a record with a specific key value is binary search

    • Addition requires reorganization

SNU-OOPSLA Lab.


Overview index 2

7.1 What Is an Index?

Index File

k1

k2

k4

k5

k7

k9

k1

k2

k4

k5

k7

k9

AAA

ZZZ

CCC

XXX

EEE

FFF

Data File

Overview: Index(2)

SNU-OOPSLA Lab.


Overview index 3

7.1 What Is an Index?

Overview: Index(3)

  • Tree Indexes (like those of indexed sequential files)

    • Hierarchical in that each level

    • Beginning with the root level, points to the next record

    • Leaves POINTs only the data file

  • Indexed Sequential File

  • Binary Tree Index

  • AVL Tree Index

  • B+ tree Index

SNU-OOPSLA Lab.


Roles of index

7.1 What Is an Index?

Roles of Index?

  • Index: keys and reference fields

  • Fast Random Accesses

  • Uniform Access Speed

  • Allow users to impose order on a file without actually rearranging the file

  • Provide multiple access paths to a file

  • Give user keyed access to variable-length record files

SNU-OOPSLA Lab.


A simple index 1

7.2 A Simple Index for E-S Files

A Simple Index(1)

  • Datafile

    • entry-sequenced, variable-length record

    • primary key : unique for each entry in a file

  • Search a file with key (popular need)

    • cannot use binary search in a variable-length record file(can’t know where the middle record)

    • construct an index object for the file

      • index object : key field + byte-offset field

SNU-OOPSLA Lab.


A simple index 2

7.2 A Simple Index for E-S Files

Datafile

Indexfile

Reference

Address of

Key

Actual data record

field

record

ANG3795 167

LON|2312|Romeo and Juliet|Prokofiev . . .

32

COL31809 353

RCA|2626|Quarter in C Sharp Minor . . .

77

DG139201 396

WAR|23699|Touchstone|Corea . . .

132

COL38358 211

ANG|3795|Sympony No. 9|Beethoven . . .

167

DG18807 256

COL|38358|Nebeaska|Springsteen . . .

211

FF245 442

DG|18807|Symphony No. 9|Beethoven . . .

256

LON2312 32

MER|75016|Coq d'or Suite|Rimsky . . .

300

MER75016 300

COL|31809|Symphony No. 9|Dvorak . . .

353

RCA2626 77

DG|139201|Violin Concerto|Beethoven . . .

396

WAR23699 132

FF|245|Good News|Sweet Honey In The . . .

442

A Simple Index (2)

SNU-OOPSLA Lab.


A simple index 3

7.2 A Simple Index for E-S Files

Key

Reference field

A Simple Index (3)

  • Index file: fixed-size record, sorted

  • Datafile: not sorted because it is entry sequenced

  • Record addition is quick (faster than a sorted file)

  • Can keep the index in memory

    • find record quickly with index file than with a sorted one

  • Class TextIndex encapsulates the index data and index operations

SNU-OOPSLA Lab.


Chap 7 indexing

7.2 A Simple Index for E-S Files

Let’s See Figure 7.4

Class TextIndex{

public:

TextIndex(int maxKeys = 100, int unique = 1);

int Insert(const char*ckey, int recAddr); //add to index

int Remove(const char* key); //remove key from index

int Search(const char* key) const;

//search for key, return recAddr

void Print (ostream &) const;

protected:

int MaxKeys; // maximum num of entries

int NumKeys;// actual num of entries

char **Keys; // array of key values

int* RecAddrs; // array of record references

int Find (const chat* key) const;

int Init (int maxKeys, int unique);

int Unique;// if true --> each key must be unique

}

SNU-OOPSLA Lab.


Index implementation

Index Implementation

  • Page 638, 639, 640

    • G.1 Recording.h

    • G.2 Recording.cpp

    • G.3 Makere.cpp

  • Page 641, 642

    • G.4 Textind.h

    • G.5 Textind.cpp

SNU-OOPSLA Lab.


Retrieverecording with the index

RetrieveRecording with the Index

  • RetrieveRecording(KEY...)procedure : retrieve a single record by key from datafile. And puts together the index search, file read, and buffer unpack operations into single function

    int RetriveRecording (Recording & recording, char * key,

    TextIndex & RecordingIndex, BufferFile & RecordingFile)

    // read and unpack the recording, return TRUE if succeeds

    { int result;

    result = RecordingFile . Read (RecordingIndex.Search(key));

    if (result == -1) return FALSE;

    result = recording.Unpack (RecordingFile.GetBuffer());

    return result;

    }

SNU-OOPSLA Lab.


Template class for i o object 1

7.3 Using Template Classes in C++ for Object I/O

Template Class for I/O Object(1)

  • Template Class RecordFile

    • we want to make the following code possible

      • Person p; RecordFile pFile; pFile.Read(p);

      • Recording r; RecordFile rFile; rFile.Read(r);

    • difficult to support files for different record types without having to modify the class

    • Template class which is derived from BufferFile

      • the actual declarations and calls

        • RecordFile <Person> pFile; pFile.Read(p);

        • RecordFile <Recording> rFile; rFile.Read(p);

SNU-OOPSLA Lab.


Template class for i o object 2

7.3 Using Template Classes in C++ for Object I/O

Template Class for I/O Object(2)

template <class RecType>

class RecordFile : public BufferFile{

public:

int Read(RecType& record, int recaddr = -1);

int Write(const RecType& record, int recaddr = -1);

int Append(const RecType& record);

RecordFile(IOBuffer& buffer) : BufferFile(buffer) {}

};

//The template parameter RecType must have the following methods

//int Pack(IOBuffer &); pack record into buffer

//int Unpack(IOBuffer &); unpack record from buffer

  • Template Class RecordFile

SNU-OOPSLA Lab.


Chap 7 indexing

7.3 Using Template Classes in C++ for Object I/O

Template Class for I/O Object(3)

  • Adding I/O to an existing class RecordFile

    • add methods Pack and Unpack to class Recording

    • create a buffer object to use in the I/O

      • DelimFieldBuffer Buffer;

    • declare an object of type RecordFile<Recording>

      • RecordFile<Recording> rFile (Buffer);

  • Declaration and Calls

  • Recording r1, r2;

  • rFile.Open(“myfile”);

  • rFile.Read(r1);

  • rFile.Write(r2);

Directly open a file and read and

write objects of class Recording

SNU-OOPSLA Lab.


Object oriented approach to i o

7.4 OO Support for Indexed, E-S Files of Data Objects

Object-Oriented Approach to I/O

  • Class IndexedFile

    • add indexed access to the sequential access provided by class RecordFile

    • extends RecordFile with Update, Append and Read method

      • Update & Append : maintain a primary key index of data file

      • Read : supports access to object by key

  • TextIndex, RecordFile ==> IndexedFile

  • Issues of IndexedFile

    • how to make a persistent index of a file

    • how to guarantee that the index is an accurate reflection of the contents of the data file

SNU-OOPSLA Lab.


Basic operations of indexedfile 1

7.4 OO Support for Indexed, E-S Files of Data Objects

Basic Operations of IndexedFile(1)

  • Create the original empty index and data files

  • Load the index file into memory

  • Rewrite the index file from memory

  • Add records to the data file and index

  • Delete records from the data file

  • Update records in the data file

  • Update the index to reflect changes in the data file

  • Retrieve records

SNU-OOPSLA Lab.


Basic operations of textindexedfile 1

7.4 OO Support for Indexed, E-S Files of Data Objects

Basic Operations of TextIndexedFile (1)

  • Creating the files

    • initially empty files (index file and data file) created as empty files with header records

    • implementation ( makeind.cpp in Appendix G ) Create method in class BufferFile

  • Loading the index into memory

    • loading/storing objects are supported in the IOBuffer classes

    • need to choose a particular buffer class to use for an index file ( tindbuff.cpp in Appendix G )

      • define class TextIndexBuffer as a derived class of FixedFieldBuffer to support reading and writing of index objects

SNU-OOPSLA Lab.


Basic operations of textindexedfile 2

7.4 OO Support for Indexed, E-S Files of Data Objects

Basic Operations of TextIndexedFile(2)

  • Rewriting the index file from memory

    • part of the Close operation on an IndexedFile

    • write back index object to the index file

    • should protect the index when failure

    • write changes when out-of-date(use status flag)

    • Implementation

      • Rewind and Write operations of class BufferFile

  • Record Addition

Add a new record to data file

using

RecordFile<Recording>::Write

Add an entry to the index

Requires rearrangement

if in memory, no file access

using TextIndex.Insert

+

SNU-OOPSLA Lab.


Basic operations of textindexedfile 3

7.4 OO Support for Indexed, E-S Files of Data Objects

Basic Operations of TextIndexedFile(3)

  • Record Deletion

    • data file: the records need not be moved

    • index: delete entry really or just mark it

      • using TextIndex::Delete

  • Record Updating (2 categories)

    • the update changes the value of the key field

      • delete/add approach

      • reorder both the index and the data file

    • the update does not affect the key field

      • no rearrangement of the index file

      • may need to reconstruct the data file

SNU-OOPSLA Lab.


Class textindexedfile 1

7.4 OO Support for Indexed, E-S Files of Data Objects

Class TextIndexedFile(1)

  • Members

    • methods

      • Create, Open, Close, Read (sequential & indexed), Append, and Update operations

    • protected members

      • ensure the correlation between the index in memory (Index),the index file (IndexFile), and the data file (DataFile)

    • char* key()

      • the template parameter RecType must have the key method

      • used to extract the key value from the record

SNU-OOPSLA Lab.


Chap 7 indexing

7.4 OO Support for Indexed, E-S Files of Data Objects

Class TextIndexedFile(2)

Template <class RecType>

class TextIndexedFile

{ public:

int Read(RecType& record); // read next record

int Read(char* key, RecType& record) // read by key

int Append(const RecType& record);

int Update(char* oldKey, const RecType& record);

int Create(char* name, int mode=ios::in|los::out);

int Open(char* name, int mode=ios::in|los::out);

int Close();

TextIndexedFile(IOBuffer & buffer, int keySize, int maxKeys=100);

~TextIndexedFile(); // close and delete

protected:

TextIndex Index; BufferFile IndexFile;

TextIndexBuffer IndexBuffer;

RecordFile<RecType> DataFile;

char * FileName; // base file name for file

int SetFileName(char* fName, char*& dFileName, char*&IdxFName);

};

SNU-OOPSLA Lab.


Enhancements to textindexedfile 1

7.4 OO Support for Indexed, E-S Files of Data Objects

Enhancements to TextIndexedFile(1)

  • Support other types of keys

    • Restriction: the key type is restricted to string (char *)

    • Relaxation: support a template class SimpleIndex with parameter for key type

  • Support data object class hierarchies

    • Restriction: every object must be of the same type in RecordFile

    • Relaxation: the type hierarchy supports virtual pack methods

SNU-OOPSLA Lab.


Chap 7 indexing

7.4 OO Support for Indexed, E-S Files of Data Objects

Enhancements to TextIndexedFile(2)

  • Support multirecord index files

    • Restriction: the entire index fit in a single record

    • Relaxation: add protected method Insert, Delete, and Search to manipulate the arrays of index objects

  • Active optimization of operations

    • Obvious: the most obvious optimization is to use binary search in the Find method

    • Active: add a flag to the index object to avoid writing the index record back to the index file when it has not been changed

SNU-OOPSLA Lab.


Where are we going

Where are we going?

  • Plain Stream File

  • Persistency ==> Buffer support ==> BufferFile

    <incremental approach> Deriving BufferFile using

    various other classes

  • Random Access ==> Index support => IndexedFile

    <incremental approach> : Deriving TextIndexedFile using RecordFile and TextIndex

SNU-OOPSLA Lab.


Too large index 1

7.5 Indexes That Are Too Large to Hold in Memory

Too Large Index(1)

  • On secondary storage (large linear index)

  • Disadvantages

    • binary searching of the index requires several seeks(slower than a sorted file)

    • index rearrangement requires shifting or sorting records on second storage

  • Alternatives (to be considered later)

    • hashed organization

    • tree-structured index (e.g. B-tree)

SNU-OOPSLA Lab.


Too large index 2

7.5 Indexes That Are Too Large to Hold in Memory

Too Large Index (2)

  • Advantages over the use of a data file sorted by key even if the index is on the secondary storage

    • can use a binary search

    • sorting and maintaining the index is less expensive than doing the data file

    • can rearrange the keys without moving the data records if there are pinned records

SNU-OOPSLA Lab.


Index by multiple keys 1

7.6 Indexing to Provide Access by Multiple Keys

Index by Multiple Keys(1)

  • DB-Schema = ( ID-No, Title, Composer, Artist, Label)

  • Find the record with ID-NO “COL38358” (primary key - ID-No)

  • Find all the recordings of “Beethoven” (2ndary key - composer)

  • Find all the recordings titled “Violin Concerto” (2ndary key - title)

SNU-OOPSLA Lab.


Index by multiple keys 2

BEETHOVEN

DG18807

7.6 Indexing to Provide Access by Multiple Keys

Index by Multiple Keys(2)

  • Most people don’t want to search only by primary key

  • Secondary Key

    • can be duplicated

    • Figure -->

  • Secondary Key Index

    • secondary key --> consult one additional index (primary key index)

SNU-OOPSLA Lab.


Secondary index basic operations 1

7.6 Indexing to Provide Access by Multiple Keys

Secondary Index:Basic Operations(1)

  • Record Addition

    • similar to the case of adding to primary index

    • secondary index is stored in canonical form

      • fixed length (so it can be truncated)

      • original name can be obtained from the data file

    • can contain duplicate keys

    • local ordering in the same key group

SNU-OOPSLA Lab.


Secondary index basic operations 2

7.6 Indexing to Provide Access by Multiple Keys

Secondary Index:Basic Operations (2)

  • Record Deletion (2 cases)

    • Secondary index references directly record

      • delete both primary index and secondary index

      • rearrange both indexes

    • Secondary index references primary key

      • delete only primary index

      • leave intact the reference to the deleted record

      • advantage : fast

      • disadvantage : deleted records take up space

SNU-OOPSLA Lab.


Secondary index basic operations 3

7.6 Indexing to Provide Access by Multiple Keys

Secondary Index: Basic Operations (3)

  • Record Updating

    • primary key index serves as a kind of protective buffer

    • Secondary index references directly record

      • update all files containing record’s location

    • Secondary index references primary key (1)

      • affect secondary index only when either primary or secondary key is changed

Continued.

SNU-OOPSLA Lab.


Secondary index basic operations 4

7.6 Indexing to Provide Access by Multiple Keys

Secondary Index: Basic Operations (4)

  • Secondary index references primary key(2)

    • when changes the secondary key

      • rearrange the secondary key index

    • when changes the primary key

      • update all reference field

      • may require reordering the secondary index

    • when confined to other fields

      • do not affect the secondary key index

SNU-OOPSLA Lab.


Retrieval of records

7.7 Retrieval Using Combinations of Secondary Keys

Retrieval of Records

  • Types

    • primary key access

    • secondary key access

    • combination of above

  • Combination of keys

    • using secondary key index, it is easy

    • boolean operation (AND, OR)

SNU-OOPSLA Lab.


Inverted lists 1

7.8 Improving the Secondary Index Structure

Inverted Lists(1)

  • Inverted List

    • a secondary key leads to a set of one or more primary keys

  • Disadvantages of 2nd-ary index structure

    • rearrange when adding

    • repeated entry when duplicating

  • Solution A: by an array of references

  • Solution B: by linking the list of references

SNU-OOPSLA Lab.


Array of references

Revised composer index

Secondary key Set of primary key references

BEETHOVEN ANG3795 DG139201 DG18807 RCA2626

COREA WAR23699

DVORAK COL31809

PROKOFIEV LON2312

RIMSKY-KORSAKOV MER75016

SPRINGSTEEN COL38358

SWEET HONEY IN THE R FF245

7.8 Improving the Secondary Index Structure

Array of References

  • * no need to rearrange

  • * limited reference array

  • * internal fragmentation

SNU-OOPSLA Lab.


Inverted lists 2

PROKOFIEV

ANG36193

LON2312

7.8 Improving the Secondary Index Structure

Inverted Lists (2)

  • Guidelines for better solution

    • no reorganization when adding

    • no limitation for duplicate key

    • no internal fragmentation

  • Solution B: by Linking the list of references

  • A list of primary key references

  • secondary key field, relative record number of the first corresponding primary key reference

SNU-OOPSLA Lab.


Linking list of references 1

Improved revision of the composer index

Secondary Index file Label ID List file

BEETHOVEN

3

LON2312

-1

0

0

1

2

-1

COREA

RCA2626

1

7

2

DVORAK

WAR23699

-1

2

PROKOFIEV

3

ANG23699

10

8

3

4

4

RIMSKY-KORSAKOV

COL38358

6

-1

5

SPINGSTEEN

DG18807

4

1

5

6

SWEET HONEY IN THE R

MER75016

9

-1

6

COL31809

-1

7

DG139201

5

8

FF245

-1

9

10

ANG36193

0

7.8 Improving the Secondary Index Structure

Linking List of References (1)

SNU-OOPSLA Lab.


Linking list of references 2

7.8 Improving the Secondary Index Structure

Linking List of References (2)

  • The primary key references in a separate, entry-sequenced file

  • Advantages

    • rearranges only when secondary key changes

    • rearrangement is quick

    • less penalty associated with keeping the secondary index file on secondary storage (less need for sorting)

    • Label ID List file not need to be sorted

    • reusing the space of deleted record is easy

SNU-OOPSLA Lab.


Linking list of references 3

7.8 Improving the Secondary Index Structure

Linking List of References (3)

  • Disadvantage

    • same secondary key references may not be physically grouped

      • lack of locality

      • could involve a large amount of seeking

      • solution: reside in memory

        • same Label ID list can hold the lists of a number of secondary index files

        • if too large in memory, can load only a part of it

SNU-OOPSLA Lab.


Selective indexes

7.9 Selective Indexes

Selective Indexes

  • Selective Index: Index on a subset of records

  • Selective index contains only some part of entire index

    • provide a selective view

    • useful when contents of a file fall into several categories

      • e.g. 20 < Age < 30 and $1000 < Salary

SNU-OOPSLA Lab.


Index binding 1

7.10 Binding

Index Binding(1)

  • When to bind the key indexes to the physical address of its associated record?

  • File construction time binding

    (Tight, in-the-data binding)

    • tight binding & faster access

    • the case of primary key

    • when secondary key is bound to that time

      • simpler and faster retrieval

      • reorganization of the data file results in modifications of all bound index files

SNU-OOPSLA Lab.


Index binding 2

7.10 Binding

Index Binding (2)

  • Postpone binding until a record is actually retrieved (Retrieval-time binding)

    • minimal reorganization & safe approach

    • mostly for secondary key

  • Tight, in-the-data binding is good when

    • static, little or no changes

    • rapid performance during retrieval

    • mass-produced, read-only optical disk

SNU-OOPSLA Lab.


Let s review 1

Let’s Review (1)

7.1 What is an Index?

7.2 A Simple Index for Entry-Sequenced Files

7.3 Using Template Classes in C++ for Object I/O

7.4 Object-Oriented Support for Indexed, Entry- Sequenced Files of Data Objects

7.5 Indexes That Are Too Large to Hold in Memory

SNU-OOPSLA Lab.


Let s review 2

Let’s Review(2)

7.6 Indexing to Provide Access by Multiple Keys

7.7 Retrieval Using Combinations of Secondary Keys

7.8 Improving the Secondary Index Structure: Inverted Lists

7.9 Selective Indexes

7.10 Binding

SNU-OOPSLA Lab.


  • Login