chap 7 indexing
Download
Skip this Video
Download Presentation
Chap 7 . Indexing

Loading in 2 Seconds...

play fullscreen
1 / 48

Chap 7 . Indexing - PowerPoint PPT Presentation


  • 110 Views
  • Uploaded on

File Structures by Folk, Zoellick, and Ricarrdi. Chap 7 . Indexing. 서울대학교 컴퓨터공학과 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수. Chapter Objectives(1). Introduce concepts of indexing that have broad applications in the design of file systems

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chap 7 . Indexing' - komala


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
chap 7 indexing
File Structures by Folk, Zoellick, and Ricarrdi

Chap 7. Indexing

서울대학교 컴퓨터공학과

객체지향시스템연구실

SNU-OOPSLA-LAB

김 형 주 교수

SNU-OOPSLA Lab.

chapter objectives 1
Chapter Objectives(1)
  • Introduce concepts of indexing that have broad applications in the design of file systems
  • Introduce the use of a simple linear index to provide rapid access to records in an entry-sequenced, variable-length record file
  • Investigate the implementation of the use of indexes for file maintenance
  • Introduce the template features of C++ for object I/O
  • Describe the object-oriented approach to indexed sequential files

SNU-OOPSLA Lab.

chapter objectives 2
Chapter Objectives(2)
  • Describe the use of indexes to provide access to records by more than one key
  • Introduce the idea of an inverted list, illustrating Boolean operations on lists
  • Discuss of when to bind an index key to an address in the data file
  • Introduce and investigate the implications of self-indexing files

SNU-OOPSLA Lab.

contents 1
Contents(1)

7.1 What is an Index?

7.2 A Simple Index for Entry-Sequenced Files

7.3 Using Template Classes in C++ for Object I/O

7.4 Object-Oriented Support for Indexed, Entry- Sequenced Files of Data Objects

7.5 Indexes That Are Too Large to Hold in Memory

SNU-OOPSLA Lab.

contents 2
Contents(2)

7.6 Indexing to Provide Access by Multiple Keys

7.7 Retrieval Using Combinations of Secondary Keys

7.8 Improving the Secondary Index Structure: Inverted Lists

7.9 Selective Indexes

7.10 Binding

SNU-OOPSLA Lab.

overview index 1
7.1 What Is an Index?Overview: Index(1)
  • Index: a data structure which associates given key values with corresponding record numbers
  • It is usually physically separate from the file (unlike for indexed sequential files tight binding).
  • Linear indexes (like indexes found at the back of books)
    • Index records are ordered by key value as in an ordered relative file
    • Best algorithm for finding a record with a specific key value is binary search
    • Addition requires reorganization

SNU-OOPSLA Lab.

overview index 2
7.1 What Is an Index?

Index File

k1

k2

k4

k5

k7

k9

k1

k2

k4

k5

k7

k9

AAA

ZZZ

CCC

XXX

EEE

FFF

Data File

Overview: Index(2)

SNU-OOPSLA Lab.

overview index 3
7.1 What Is an Index?Overview: Index(3)
  • Tree Indexes (like those of indexed sequential files)
    • Hierarchical in that each level
    • Beginning with the root level, points to the next record
    • Leaves POINTs only the data file
  • Indexed Sequential File
  • Binary Tree Index
  • AVL Tree Index
  • B+ tree Index

SNU-OOPSLA Lab.

roles of index
7.1 What Is an Index?Roles of Index?
  • Index: keys and reference fields
  • Fast Random Accesses
  • Uniform Access Speed
  • Allow users to impose order on a file without actually rearranging the file
  • Provide multiple access paths to a file
  • Give user keyed access to variable-length record files

SNU-OOPSLA Lab.

a simple index 1
7.2 A Simple Index for E-S FilesA Simple Index(1)
  • Datafile
    • entry-sequenced, variable-length record
    • primary key : unique for each entry in a file
  • Search a file with key (popular need)
    • cannot use binary search in a variable-length record file(can’t know where the middle record)
    • construct an index object for the file
      • index object : key field + byte-offset field

SNU-OOPSLA Lab.

a simple index 2
7.2 A Simple Index for E-S Files

Datafile

Indexfile

Reference

Address of

Key

Actual data record

field

record

ANG3795 167

LON|2312|Romeo and Juliet|Prokofiev . . .

32

COL31809 353

RCA|2626|Quarter in C Sharp Minor . . .

77

DG139201 396

WAR|23699|Touchstone|Corea . . .

132

COL38358 211

ANG|3795|Sympony No. 9|Beethoven . . .

167

DG18807 256

COL|38358|Nebeaska|Springsteen . . .

211

FF245 442

DG|18807|Symphony No. 9|Beethoven . . .

256

LON2312 32

MER|75016|Coq d'or Suite|Rimsky . . .

300

MER75016 300

COL|31809|Symphony No. 9|Dvorak . . .

353

RCA2626 77

DG|139201|Violin Concerto|Beethoven . . .

396

WAR23699 132

FF|245|Good News|Sweet Honey In The . . .

442

A Simple Index (2)

SNU-OOPSLA Lab.

a simple index 3
7.2 A Simple Index for E-S Files

Key

Reference field

A Simple Index (3)
  • Index file: fixed-size record, sorted
  • Datafile: not sorted because it is entry sequenced
  • Record addition is quick (faster than a sorted file)
  • Can keep the index in memory
    • find record quickly with index file than with a sorted one
  • Class TextIndex encapsulates the index data and index operations

SNU-OOPSLA Lab.

slide13
7.2 A Simple Index for E-S Files

Let’s See Figure 7.4

Class TextIndex{

public:

TextIndex(int maxKeys = 100, int unique = 1);

int Insert(const char*ckey, int recAddr); //add to index

int Remove(const char* key); //remove key from index

int Search(const char* key) const;

//search for key, return recAddr

void Print (ostream &) const;

protected:

int MaxKeys; // maximum num of entries

int NumKeys;// actual num of entries

char **Keys; // array of key values

int* RecAddrs; // array of record references

int Find (const chat* key) const;

int Init (int maxKeys, int unique);

int Unique;// if true --> each key must be unique

}

SNU-OOPSLA Lab.

index implementation
Index Implementation
  • Page 638, 639, 640
    • G.1 Recording.h
    • G.2 Recording.cpp
    • G.3 Makere.cpp
  • Page 641, 642
    • G.4 Textind.h
    • G.5 Textind.cpp

SNU-OOPSLA Lab.

retrieverecording with the index
RetrieveRecording with the Index
  • RetrieveRecording(KEY...)procedure : retrieve a single record by key from datafile. And puts together the index search, file read, and buffer unpack operations into single function

int RetriveRecording (Recording & recording, char * key,

TextIndex & RecordingIndex, BufferFile & RecordingFile)

// read and unpack the recording, return TRUE if succeeds

{ int result;

result = RecordingFile . Read (RecordingIndex.Search(key));

if (result == -1) return FALSE;

result = recording.Unpack (RecordingFile.GetBuffer());

return result;

}

SNU-OOPSLA Lab.

template class for i o object 1
7.3 Using Template Classes in C++ for Object I/OTemplate Class for I/O Object(1)
  • Template Class RecordFile
    • we want to make the following code possible
      • Person p; RecordFile pFile; pFile.Read(p);
      • Recording r; RecordFile rFile; rFile.Read(r);
    • difficult to support files for different record types without having to modify the class
    • Template class which is derived from BufferFile
      • the actual declarations and calls
        • RecordFile pFile; pFile.Read(p);
        • RecordFile rFile; rFile.Read(p);

SNU-OOPSLA Lab.

template class for i o object 2
7.3 Using Template Classes in C++ for Object I/OTemplate Class for I/O Object(2)

template

class RecordFile : public BufferFile{

public:

int Read(RecType& record, int recaddr = -1);

int Write(const RecType& record, int recaddr = -1);

int Append(const RecType& record);

RecordFile(IOBuffer& buffer) : BufferFile(buffer) {}

};

//The template parameter RecType must have the following methods

//int Pack(IOBuffer &); pack record into buffer

//int Unpack(IOBuffer &); unpack record from buffer

  • Template Class RecordFile

SNU-OOPSLA Lab.

slide18
7.3 Using Template Classes in C++ for Object I/O

Template Class for I/O Object(3)

  • Adding I/O to an existing class RecordFile
    • add methods Pack and Unpack to class Recording
    • create a buffer object to use in the I/O
      • DelimFieldBuffer Buffer;
    • declare an object of type RecordFile
      • RecordFile rFile (Buffer);
  • Declaration and Calls
  • Recording r1, r2;
  • rFile.Open(“myfile”);
  • rFile.Read(r1);
  • rFile.Write(r2);

Directly open a file and read and

write objects of class Recording

SNU-OOPSLA Lab.

object oriented approach to i o
7.4 OO Support for Indexed, E-S Files of Data ObjectsObject-Oriented Approach to I/O
  • Class IndexedFile
    • add indexed access to the sequential access provided by class RecordFile
    • extends RecordFile with Update, Append and Read method
      • Update & Append : maintain a primary key index of data file
      • Read : supports access to object by key
  • TextIndex, RecordFile ==> IndexedFile
  • Issues of IndexedFile
      • how to make a persistent index of a file
      • how to guarantee that the index is an accurate reflection of the contents of the data file

SNU-OOPSLA Lab.

basic operations of indexedfile 1
7.4 OO Support for Indexed, E-S Files of Data Objects Basic Operations of IndexedFile(1)
  • Create the original empty index and data files
  • Load the index file into memory
  • Rewrite the index file from memory
  • Add records to the data file and index
  • Delete records from the data file
  • Update records in the data file
  • Update the index to reflect changes in the data file
  • Retrieve records

SNU-OOPSLA Lab.

basic operations of textindexedfile 1
7.4 OO Support for Indexed, E-S Files of Data ObjectsBasic Operations of TextIndexedFile (1)
  • Creating the files
    • initially empty files (index file and data file) created as empty files with header records
    • implementation ( makeind.cpp in Appendix G ) Create method in class BufferFile
  • Loading the index into memory
    • loading/storing objects are supported in the IOBuffer classes
    • need to choose a particular buffer class to use for an index file ( tindbuff.cpp in Appendix G )
      • define class TextIndexBuffer as a derived class of FixedFieldBuffer to support reading and writing of index objects

SNU-OOPSLA Lab.

basic operations of textindexedfile 2
7.4 OO Support for Indexed, E-S Files of Data Objects Basic Operations of TextIndexedFile(2)
  • Rewriting the index file from memory
    • part of the Close operation on an IndexedFile
    • write back index object to the index file
    • should protect the index when failure
    • write changes when out-of-date(use status flag)
    • Implementation
      • Rewind and Write operations of class BufferFile
  • Record Addition

Add a new record to data file

using

RecordFile::Write

Add an entry to the index

Requires rearrangement

if in memory, no file access

using TextIndex.Insert

+

SNU-OOPSLA Lab.

basic operations of textindexedfile 3
7.4 OO Support for Indexed, E-S Files of Data Objects Basic Operations of TextIndexedFile(3)
  • Record Deletion
    • data file: the records need not be moved
    • index: delete entry really or just mark it
      • using TextIndex::Delete
  • Record Updating (2 categories)
    • the update changes the value of the key field
      • delete/add approach
      • reorder both the index and the data file
    • the update does not affect the key field
      • no rearrangement of the index file
      • may need to reconstruct the data file

SNU-OOPSLA Lab.

class textindexedfile 1
7.4 OO Support for Indexed, E-S Files of Data ObjectsClass TextIndexedFile(1)
  • Members
    • methods
      • Create, Open, Close, Read (sequential & indexed), Append, and Update operations
    • protected members
      • ensure the correlation between the index in memory (Index),the index file (IndexFile), and the data file (DataFile)
    • char* key()
      • the template parameter RecType must have the key method
      • used to extract the key value from the record

SNU-OOPSLA Lab.

slide25
7.4 OO Support for Indexed, E-S Files of Data Objects

Class TextIndexedFile(2)

Template

class TextIndexedFile

{ public:

int Read(RecType& record); // read next record

int Read(char* key, RecType& record) // read by key

int Append(const RecType& record);

int Update(char* oldKey, const RecType& record);

int Create(char* name, int mode=ios::in|los::out);

int Open(char* name, int mode=ios::in|los::out);

int Close();

TextIndexedFile(IOBuffer & buffer, int keySize, int maxKeys=100);

~TextIndexedFile(); // close and delete

protected:

TextIndex Index; BufferFile IndexFile;

TextIndexBuffer IndexBuffer;

RecordFile DataFile;

char * FileName; // base file name for file

int SetFileName(char* fName, char*& dFileName, char*&IdxFName);

};

SNU-OOPSLA Lab.

enhancements to textindexedfile 1
7.4 OO Support for Indexed, E-S Files of Data ObjectsEnhancements to TextIndexedFile(1)
  • Support other types of keys
    • Restriction: the key type is restricted to string (char *)
    • Relaxation: support a template class SimpleIndex with parameter for key type
  • Support data object class hierarchies
    • Restriction: every object must be of the same type in RecordFile
    • Relaxation: the type hierarchy supports virtual pack methods

SNU-OOPSLA Lab.

slide27
7.4 OO Support for Indexed, E-S Files of Data Objects

Enhancements to TextIndexedFile(2)

  • Support multirecord index files
    • Restriction: the entire index fit in a single record
    • Relaxation: add protected method Insert, Delete, and Search to manipulate the arrays of index objects
  • Active optimization of operations
    • Obvious: the most obvious optimization is to use binary search in the Find method
    • Active: add a flag to the index object to avoid writing the index record back to the index file when it has not been changed

SNU-OOPSLA Lab.

where are we going
Where are we going?
  • Plain Stream File
  • Persistency ==> Buffer support ==> BufferFile

Deriving BufferFile using

various other classes

  • Random Access ==> Index support => IndexedFile

: Deriving TextIndexedFile using RecordFile and TextIndex

SNU-OOPSLA Lab.

too large index 1
7.5 Indexes That Are Too Large to Hold in MemoryToo Large Index(1)
  • On secondary storage (large linear index)
  • Disadvantages
    • binary searching of the index requires several seeks(slower than a sorted file)
    • index rearrangement requires shifting or sorting records on second storage
  • Alternatives (to be considered later)
    • hashed organization
    • tree-structured index (e.g. B-tree)

SNU-OOPSLA Lab.

too large index 2
7.5 Indexes That Are Too Large to Hold in MemoryToo Large Index (2)
  • Advantages over the use of a data file sorted by key even if the index is on the secondary storage
    • can use a binary search
    • sorting and maintaining the index is less expensive than doing the data file
    • can rearrange the keys without moving the data records if there are pinned records

SNU-OOPSLA Lab.

index by multiple keys 1
7.6 Indexing to Provide Access by Multiple KeysIndex by Multiple Keys(1)
  • DB-Schema = ( ID-No, Title, Composer, Artist, Label)
  • Find the record with ID-NO “COL38358” (primary key - ID-No)
  • Find all the recordings of “Beethoven” (2ndary key - composer)
  • Find all the recordings titled “Violin Concerto” (2ndary key - title)

SNU-OOPSLA Lab.

index by multiple keys 2
BEETHOVEN

DG18807

7.6 Indexing to Provide Access by Multiple Keys

Index by Multiple Keys(2)
  • Most people don’t want to search only by primary key
  • Secondary Key
    • can be duplicated
    • Figure -->
  • Secondary Key Index
    • secondary key --> consult one additional index (primary key index)

SNU-OOPSLA Lab.

secondary index basic operations 1
7.6 Indexing to Provide Access by Multiple Keys Secondary Index:Basic Operations(1)
  • Record Addition
    • similar to the case of adding to primary index
    • secondary index is stored in canonical form
      • fixed length (so it can be truncated)
      • original name can be obtained from the data file
    • can contain duplicate keys
    • local ordering in the same key group

SNU-OOPSLA Lab.

secondary index basic operations 2
7.6 Indexing to Provide Access by Multiple Keys Secondary Index:Basic Operations (2)
  • Record Deletion (2 cases)
    • Secondary index references directly record
      • delete both primary index and secondary index
      • rearrange both indexes
    • Secondary index references primary key
      • delete only primary index
      • leave intact the reference to the deleted record
      • advantage : fast
      • disadvantage : deleted records take up space

SNU-OOPSLA Lab.

secondary index basic operations 3
7.6 Indexing to Provide Access by Multiple Keys Secondary Index: Basic Operations (3)
  • Record Updating
    • primary key index serves as a kind of protective buffer
    • Secondary index references directly record
      • update all files containing record’s location
    • Secondary index references primary key (1)
      • affect secondary index only when either primary or secondary key is changed

Continued.

SNU-OOPSLA Lab.

secondary index basic operations 4
7.6 Indexing to Provide Access by Multiple Keys Secondary Index: Basic Operations (4)
  • Secondary index references primary key(2)
    • when changes the secondary key
      • rearrange the secondary key index
    • when changes the primary key
      • update all reference field
      • may require reordering the secondary index
    • when confined to other fields
      • do not affect the secondary key index

SNU-OOPSLA Lab.

retrieval of records
7.7 Retrieval Using Combinations of Secondary KeysRetrieval of Records
  • Types
    • primary key access
    • secondary key access
    • combination of above
  • Combination of keys
    • using secondary key index, it is easy
    • boolean operation (AND, OR)

SNU-OOPSLA Lab.

inverted lists 1
7.8 Improving the Secondary Index StructureInverted Lists(1)
  • Inverted List
    • a secondary key leads to a set of one or more primary keys
  • Disadvantages of 2nd-ary index structure
    • rearrange when adding
    • repeated entry when duplicating
  • Solution A: by an array of references
  • Solution B: by linking the list of references

SNU-OOPSLA Lab.

array of references
Revised composer index

Secondary key Set of primary key references

BEETHOVEN ANG3795 DG139201 DG18807 RCA2626

COREA WAR23699

DVORAK COL31809

PROKOFIEV LON2312

RIMSKY-KORSAKOV MER75016

SPRINGSTEEN COL38358

SWEET HONEY IN THE R FF245

7.8 Improving the Secondary Index Structure

Array of References
  • * no need to rearrange
  • * limited reference array
  • * internal fragmentation

SNU-OOPSLA Lab.

inverted lists 2
PROKOFIEV

ANG36193

LON2312

7.8 Improving the Secondary Index Structure

Inverted Lists (2)
  • Guidelines for better solution
    • no reorganization when adding
    • no limitation for duplicate key
    • no internal fragmentation
  • Solution B: by Linking the list of references
  • A list of primary key references
  • secondary key field, relative record number of the first corresponding primary key reference

SNU-OOPSLA Lab.

linking list of references 1
Improved revision of the composer index

Secondary Index file Label ID List file

BEETHOVEN

3

LON2312

-1

0

0

1

2

-1

COREA

RCA2626

1

7

2

DVORAK

WAR23699

-1

2

PROKOFIEV

3

ANG23699

10

8

3

4

4

RIMSKY-KORSAKOV

COL38358

6

-1

5

SPINGSTEEN

DG18807

4

1

5

6

SWEET HONEY IN THE R

MER75016

9

-1

6

COL31809

-1

7

DG139201

5

8

FF245

-1

9

10

ANG36193

0

7.8 Improving the Secondary Index Structure

Linking List of References (1)

SNU-OOPSLA Lab.

linking list of references 2
7.8 Improving the Secondary Index StructureLinking List of References (2)
  • The primary key references in a separate, entry-sequenced file
  • Advantages
    • rearranges only when secondary key changes
    • rearrangement is quick
    • less penalty associated with keeping the secondary index file on secondary storage (less need for sorting)
    • Label ID List file not need to be sorted
    • reusing the space of deleted record is easy

SNU-OOPSLA Lab.

linking list of references 3
7.8 Improving the Secondary Index StructureLinking List of References (3)
  • Disadvantage
    • same secondary key references may not be physically grouped
      • lack of locality
      • could involve a large amount of seeking
      • solution: reside in memory
        • same Label ID list can hold the lists of a number of secondary index files
        • if too large in memory, can load only a part of it

SNU-OOPSLA Lab.

selective indexes
7.9 Selective IndexesSelective Indexes
  • Selective Index: Index on a subset of records
  • Selective index contains only some part of entire index
    • provide a selective view
    • useful when contents of a file fall into several categories
      • e.g. 20 < Age < 30 and $1000 < Salary

SNU-OOPSLA Lab.

index binding 1
7.10 BindingIndex Binding(1)
  • When to bind the key indexes to the physical address of its associated record?
  • File construction time binding

(Tight, in-the-data binding)

    • tight binding & faster access
    • the case of primary key
    • when secondary key is bound to that time
      • simpler and faster retrieval
      • reorganization of the data file results in modifications of all bound index files

SNU-OOPSLA Lab.

index binding 2
7.10 BindingIndex Binding (2)
  • Postpone binding until a record is actually retrieved (Retrieval-time binding)
    • minimal reorganization & safe approach
    • mostly for secondary key
  • Tight, in-the-data binding is good when
    • static, little or no changes
    • rapid performance during retrieval
    • mass-produced, read-only optical disk

SNU-OOPSLA Lab.

let s review 1
Let’s Review (1)

7.1 What is an Index?

7.2 A Simple Index for Entry-Sequenced Files

7.3 Using Template Classes in C++ for Object I/O

7.4 Object-Oriented Support for Indexed, Entry- Sequenced Files of Data Objects

7.5 Indexes That Are Too Large to Hold in Memory

SNU-OOPSLA Lab.

let s review 2
Let’s Review(2)

7.6 Indexing to Provide Access by Multiple Keys

7.7 Retrieval Using Combinations of Secondary Keys

7.8 Improving the Secondary Index Structure: Inverted Lists

7.9 Selective Indexes

7.10 Binding

SNU-OOPSLA Lab.

ad