1 / 22

CIS 402: File Management Techniques Chapter 5

CIS 402: File Management Techniques Chapter 5. Managing Files of Records. Chapter Objectives. Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search and Direct access Files access and file organization

ide
Download Presentation

CIS 402: File Management Techniques Chapter 5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CIS 402: File Management Techniques Chapter 5 Managing Files of Records

  2. Chapter Objectives • Extend the file structure concepts of Chapter 4: • Search keys and canonical forms • Sequential search and Direct access • Files access and file organization • Examine other kinds of the file structures in terms of • Abstract data models • Metadata • Object-oriented file access • Extensibility • Examine issues of portability and standardization.

  3. Record Access • Record Key • Canonical form : a standard form of a key • e.g. Ames or ames or AMES (need conversion) • Distinct keys : uniquely identify asingle record • Primary keys, Secondary keys, Candidate keys • Primary keys should be dataless (not updatable) • Primary keys should be unchanging • Social-securiy-number: good primary key • but, 999-99-9999 for all non-registered aliens • Measurement of work: • Comparisons: occur in main memory • Disk accesses: main bottleneck

  4. Sequential Search Sequential search is least efficient. Our main pursuit for the duration of the term is to present improved search methods • O(n), n : the number of records • Use record blocking to reduce work • A block of several records • fields < records < blocks • O(n), but blocking decreases the number of seek • sequential within each block • e.g.- 4000 records, 512 bytes each, sector size 512 bytes • Unblocked (sector-sized buffers): 512 (½K buffer) • => average 2000 READ() calls • Blocked (16 recs / block) : 8K size buffer => average 125 READ() calls • Can further improve upon performance by using block key containing last record key to avoid searching within blocks where data can’t be

  5. Sequential Search: Best Uses • UNIX sequential processing commands • cat, wc, grep • When is Sequential Search Superior? • Repetitive hits • Searching for patterns in ASCII files • Searching records with a certainsecondary key value • Small Search Set • Processing files with few records • Devices/media most hospitable to sequential access • tape • binary file on disk

  6. Direct Access • Access a record without searching • O(1) operation • RRN ( Relative Record Number ) • Gives relative position of the record • O(n) process with variable-length records • Easy with fixed-length records: RRN*sizeof(record) • View file as collection of records, not bytes; all byte info is internal • Byte offset = N X R • r : record size • n : RRN value • Class IOBuffer includes • direct read (DRead) • direct write (DWrite) • take byte offset as argument, along with stream • use polymorphism to pick correct Read/Write fns.

  7. OHIO 10847115 7264.9 4133035 3 1180317COLUMBUS OHIO|10847115|7|264.9|41330|35|3|1|1803|17|COLUMBUS\0....\0 Choosing Record Length and Structure • Record length is related to the size of the fields • Access vs. fragmentaion vs. implementation • Fixed length record • fixed-length fields • variable-length fields • Unused space portion is filled with null character in C • e.g. delimited

  8. Header Records • File as a Self-Describing Object • General information about file • date and time of recent update, • number of records • size of record, fields (fixed-length record & field) • delimiter (variable-length field) • Often placed at the beginning of the file • Pascal did not naturally support header records (File is a repeated collection of the same type) • Use variant records (depending on context) • In C: union • polymorphic structure

  9. Abstract base class for file buffers class IOBuffer public : virtual int Read( istream & ) = 0; // read a buffer from the stream virtual int Write( ostream &) const = 0; // write a buffer to the stream // these are the direct access read and write operations virtual int DRead( istream &, int recref ); //read specified record virtual int DWrite( ostream &, int recref ) const; // write specified record // these header operations return the size of the header virtual int ReadHeader ( istream & ); virtual int WriteHeader ( ostream &) const; protected : int Initialized ; // TRUE if buffer is initialized char *Buffer; // character array to hold field values IO Buffer Class definition

  10. IO Buffer Class definition Full definition of buffer class hierarchy • WriteHeader method : • writes the header string at the beginning of the file. Possible strings: • “Variable” • “Fixed” • Returns size of header written • ReadHeader method : • reads the header id string. Must be the expected record type, variable or fixed length • If the string matches that subclass’ header string, returns size of header • any other string causes return of –1  header doesn’t match buffer • DWrite/DRead methods : • operates using the byte address of the record as the record reference. Methods begin by seeking to the requested spot.

  11. Encapsulating Record I/O Operations in a Single Class • Good design for making objects persistent • provide operation to read and write objects directly • Write operation until now : • two operations : • pack into a buffer • write the buffer to a file • Class ‘RecordFile’ • supports a write operation that takes an object and writes it to a file. • use of buffers is encapsulated within the class • must be generalized, as it is built with a generic type

  12. Encapsulation Record: I/O Operation in a Single Class • Class ‘RecordFile’ • uses C++ template features to become generic • definition of the template class RecordFile • template <class RecType> • class RecordFile : public BufferFile • { • public: • int Read(RecType& record, int recaddr = -1); • int Write(const RecType& record, int recaddr=-1); • RecordFile(IOBuffer& buffer) : BufferFile(buffer) { } • };

  13. // template method bodies template <class RecType> int RecordFile<RecType>::Read (RecType &record, int recaddr) { int writeAdd, result; writeAddr = BufferFile::Read (recaddr); if (!writeAddr) return -1; result = record.Unpack(Buffer); if (!result) return -1; return writeAddr; } template <class RecType> int RecordFile<RecType>::Write (const RecType &record, int recaddr) { int result; result = record . Pack (Buffer); if (!result) return -1; return BufferFile::Write (recaddr); }

  14. File Organization File Access Variable-length Records Sequential access Fixed-length records Direct access File Access and File Organization • There is difference between file access and file organization. • Variable-length records • Sequential access is suitable • Fixed-length records • Direct access and sequential access are possible • Note: Book references to Pascal are completely obsolete. It is unusual in present-day programming languages to be unable to freely maneuver within a file

  15. Abstract Data Model • Data object such as document, images, sound • e.g. images, sound • Abstract Data Model does not view data as it appears on a particular medium. • application-oriented view • application shielded from details of storage on medium • How to specify a file’s content? • Headers and Self-describing files • e.g. images: jpg: ÿØÿà JFIF gif: GIF89a • e.g. sounds: mp3: ÿûD EQ¹à wav: RIFF$P WAVEfmt

  16. Metadata • Data that describe the primary data in a file • e.g. <Meta> in html • Store in the header record • Standard format • As shown on previous slide

  17. Mixing object Types in a file • Each field is identified using “keyword = value” • Index table with tags • e.g.

  18. Object-oriented file access • Separate translating to and from the physical format and application (representation-independent file access) • provide a function to handle access (OO style) • encapsulate details • read_image() is image file type independent; method determines file type Program find_star : read_image(“star1”, image) process image : end find_star image : star1 star2 RAM Disk

  19. Extensibility • Advantage of using tags • Identify object within files • do not require a priori knowledge of the types of objects • New type of object • implement method for reading and writing in appropriate module (separate concerns) • call the method.

  20. Factor affecting Portability • Differences among operating systems • e.g. CR/LF in DOS • Differences among languages • physical layout of files may be constrained by language limitation • Differences in machine architectures • byte order: e.g. Unix: hton, ntoh • Differences on platforms • e.g. EBCDIC vs. ASCII

  21. Achieving Portability • Standardization • Standard physical record format • extensible, simple • Standard binary encoding for data elements • IEEE, XDR • File structure conversion • Number and text conversion • Established, well-known methods of conversion

  22. Achieving Portability • File system difference • Block size is 512 bytes on UNIX systems • Block size is 2880 bytes on many non-UNIX systems • UNIX and Portability • UNIX support portability by being commonly available on a large number of platforms • UNIX provides a utility called dd • dd : facilitates data conversion

More Related