1 / 33

Learning Objectives

Learning Objectives. Concept of key - primary and secondary keys. Sequential versus direct access. RRN Use of templates for I/O operations Abstract Data Models Tags Extensibility Portability. Record Access -Record Keys. File structures concentrate on records.

Download Presentation

Learning Objectives

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Objectives • Concept of key - primary and secondary keys. • Sequential versus direct access. • RRN • Use of templates for I/O operations • Abstract Data Models • Tags • Extensibility • Portability CPSC 231 Managing Files of Records (D.H.)

  2. Record Access -Record Keys • File structures concentrate on records. • Records are retrieved, written, modified deleted, etc. • In order to perform operations on a record we need to identify this record or we need a record key! CPSC 231 Managing Files of Records (D.H.)

  3. Key • Record key is an expression derived from one or more of the fields within a record that can be used to identify this record. The fields used to build the key are sometimes called the key fields. Key based access provides a way of performing content-based retrieval of records, rather than retrieval based merely on a record’s position. CPSC 231 Managing Files of Records (D.H.)

  4. Desired properties of primary keys • Canonical (conforming to specific rules) • e.g. has to consist of uppercase letters and no blanks • Unique (each record has a distinct key) • e.g. student I.D. number+StudentName • unique canonical keys are called primary keys • Primary keys should be dataless • it is easier to ensure its uniqueness • it should be unchanging CPSC 231 Managing Files of Records (D.H.)

  5. Secondary keys • Secondary keys do not have to be unique and can contain data • E.G. city field in a record with a name and address. CPSC 231 Managing Files of Records (D.H.)

  6. Sequential Search • Sequential search for a desired record requires that the file records be retrieved serially until the record that matches a desired key is found. • Sequential search is based on reading each record from a file and comparing its key with the key of the record that we are looking for. CPSC 231 Managing Files of Records (D.H.)

  7. Sequential search performance • If each read operation requires one disk access than sequential read can be very inefficient. • E.g. Suppose that we have a file with one thousand records and we are looking for a record called Alan Smith. For an average search we need 500 read operations. • In general: n/2 read calls are needed if the file has n records. CPSC 231 Managing Files of Records (D.H.)

  8. Blocking of records • In order to improve performance, records are blocked together on the disk and read together to avoid additional seeks. • Blocking can improve the sequential search time considerably due to reduced number of seeks. • Note that the average search time is still proportional to the number of records in the file (O(n)). CPSC 231 Managing Files of Records (D.H.)

  9. When sequential search is good. • Text files in which you are searching for some pattern. • Files with few records. • Files that hardly ever are searched (archive files). • Files that are searched on a secondary key and a lot matches are expected. CPSC 231 Managing Files of Records (D.H.)

  10. Unix Tools for Sequential processing • cat - prints a text file sequentially to the console. • E.G. % cat myfile • wc - reads a text file sequentially and counts the number of lines and words in it. • E.G. % wc myfile • grep - searches sequentially through a file for a pattern • E.G. % grep text myfile CPSC 231 Managing Files of Records (D.H.)

  11. Direct Access • A radical alternative to sequential search is direct access. • Direct access is a file access mode that involves jumping to the exact location of a record in the file. • The search time required to perform a read via direct access is constant and it does not depend on the number of records in the file (O(1)). CPSC 231 Managing Files of Records (D.H.)

  12. Direct AccessC++ Example Int IOBuffer:: DRead (istream &, int reref) //read specified record //recref is record reference (or address, or offset) { stream.seekg(recref, ios::beg); if (stream.tellg()!=recref) return -1; return Read(stream); } CPSC 231 Managing Files of Records (D.H.)

  13. RRN • RRN = Relative Record Number • If a file is a collection of records than RRN is the record number of a record relative to its position in the file • E.G. RRN =0 for the first record , RRN=1 for the second record, etc. CPSC 231 Managing Files of Records (D.H.)

  14. RRN usage • What was an RRN in our first assignment? • We can support direct access with RRNs if the file structure uses fixed size records. How? CPSC 231 Managing Files of Records (D.H.)

  15. Record Structure and Length • In designing a fixed size record structure one may choose: • fixed length fields or • variable size fields. • The fixed length fields approach is simple but it tends to waste more disk space. • The variable size fields approach is more complicated but its usage of disk space is better. CPSC 231 Managing Files of Records (D.H.)

  16. Record size • One may choose to assure that a record never spans multiple sectors by selecting a record size that is a power of two. This way an integral number of records can be placed in one sector. CPSC 231 Managing Files of Records (D.H.)

  17. Header records • Header record is a record placed at the beginning of a file that is used to store information about the file contents and the file organization. • E.G. Header record contents can be three two byte values: • the size of the header • the number of records • the size of each record CPSC 231 Managing Files of Records (D.H.)

  18. Header records - cont. • Additionally, the following information can be kept in header records: • the date and the time of last update • the date and the time of last access • protection information • Header records usually have a different structure than data records. CPSC 231 Managing Files of Records (D.H.)

  19. C++ Templates use for file I/O. Template class to support direct read and write of records The template parameter RecType must support the following int Pack (BufferType &) ; pack record int Unpack (BufferType &); unpack record CPSC 231 Managing Files of Records (D.H.)

  20. Example of file I/O using templates template <class RecType> class RecordFile:public BufferFile {public: int Read (RecType & record, int recaddr); int Write(const RecType & record, int recaddr); RecordFile(IOBuffer &buffer): BufferFile(buffer) {} }; CPSC 231 Managing Files of Records (D.H.)

  21. Template method: Read template <class RecType> int RecordFile<RecType>::Read (RecType & record, int recaddr) { int writeAddr, result; writeAddr = BufferFile::Read(recaddr); if (!writeAddr) return -1; result = record.Unpack(buffer); if(!result) return -1; return writeAddr; } CPSC 231 Managing Files of Records (D.H.)

  22. RecordFile template • The RecordFile template is a pattern that can be used for different classes of records. • When a template class is supplied with values for its parameters, it becomes a real class. • E.G. RecordFile<Person> PersonFile (Buffer) CPSC 231 Managing Files of Records (D.H.)

  23. File Organization vs File Access • File Organization refers to: • record and field organization • E.G. variable or fixed size records, etc. • File Access refers to; • sequential access or direct access • Both file organization and access need to be considered when designing an efficient file structure for a given application. CPSC 231 Managing Files of Records (D.H.)

  24. Abstract Data Models • Abstract Data Models refer to application oriented view of data (as opposed to media -oriented view). • Abstract Data Models allow for dealing with information that cannot be easily represented as a sequence of records. • E.G. images, sounds CPSC 231 Managing Files of Records (D.H.)

  25. Metadata • Data in the file that is not the primary data but describes the primary data in the file. • A common place to store metadata is in a file header. • Typically, a community of users of a particular data agrees on a standard format for holding metadata. CPSC 231 Managing Files of Records (D.H.)

  26. Metadata Example • FITS (Flexible Image Transport System) is a standard for holding metadata in images developed by the International Astronomers’ Union. • FITS header record is 2880 bytes long and holds information images generated by telescopes such as: Date/time and place of the picture taken, Galactic longitude and latitude, telescope type, e number of pixels/row and number of rows, etc. CPSC 231 Managing Files of Records (D.H.)

  27. Tagged File Format • Tags are keywords used in connection of file structures to identify various data objects. • Tagged files are used to store objects with different data types. • Index tables and tags are used to hold information about data objects and to distinguish different types of objects. • See Fig.5.9 p.181 of text to for tagged file CPSC 231 Managing Files of Records (D.H.)

  28. Examples of tags • header / header record • text /text data • image /image data • exec /executable program • video /video data • sound /sound data CPSC 231 Managing Files of Records (D.H.)

  29. Extensibility • Tag approach allows for easy extensibility of file systems. • Once the software is built to manipulate different type of objects, it is easy to add to it new object types. For each new object one needs to define a tag, index and the methods for reading and writing it. CPSC 231 Managing Files of Records (D.H.)

  30. Portability • Portability of a file system refers to its ability to be used on different hardware platforms, running various operating system, to be used by different applications. CPSC 231 Managing Files of Records (D.H.)

  31. Portability Issues • Differences among operating systems • E.G. EOF character in MS-DOS is CTRL/Z but in Unix is CTRL/D • Differences among languages • E.G. Pascal supports only fixed size record for non text files but C++ supports both fixed size and variable size records • Differences in hardware architectures • E.G. PC stores the low order byte followed by the high order byte, but Sun does the way around. CPSC 231 Managing Files of Records (D.H.)

  32. Achieving Portability • Standard formats for data storage and encoding are used to achieve portability • Standard physical record format represents a data format that is independent of the hardware, the language and the operating system. • E.G. FITS is a good a example of a standard physical format CPSC 231 Managing Files of Records (D.H.)

  33. Standard Data Encoding • Standards for Text and Number Encoding • ASCII or EBCDIC for text • IEEE Standard formats and XDR formats for binary representation of numbers • IEEE Standard formats specify formats for data 32 bit, 62-bit and 128-bit floating point numbers and for 8-bit, 16-bit, and 32-bit integers • XDR specifies encoding for files and routines for each machine how to convert data while writing to a file and vice versa. CPSC 231 Managing Files of Records (D.H.)

More Related