1 / 260

File structure & Data processing.

File structure & Data processing. Syllabus

jcarpenter
Download Presentation

File structure & Data processing.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File structure & Data processing. Syllabus UNIT I. Introduction : File structure design, File processing operations: open, close, read, write, seek. Unix directory structure.Secondary storage devices: disks, tapes, CD-ROM. Buffer management. I/O in Unix. • UNIT II. File Structure Concepts : Field & record organization, Using • classes to manipulate buffers, Record access, Record structures. File access & file organization. Abstract data models for file access. Metadata. Extensibility, Portability & • standardization. • UNIT III. Data Compression, Reclaiming spaces in files, Introduction to internal sorting and Binary searching. Keysorting. Indexing concepts. Object I/O. Multiple keys indexing., Inverted lists,Selectiveindexes, Binding. Collected By CJS

  2. Collected By CJS

  3. Collected By CJS

  4. UNIT IV. Cosequential processing : Object-Oriented model, its application. Internal sorting : a second look. File Merging :Sorting of large files on disks. Sorting files on tapes. Sortmerge packages. Sorting and Cosequential processing inUnix. UNIT V. Multilevel indexing : Indexing using Binary Search trees. OOP based B-trees. B-tree methods Search, Insert and others.Deletion, merging & redistribution. B*trees. Virtual B-trees. VL records & keys. Indexed sequential file access and Prefix B+trees. UNIT VI. Hashing : Introduction, a simple hashing algorithm. Hashing functions and record distributions. Collision resolution. Buckets. Making deletions. Pattern of record access. External hashing. Implementation. Deletion. Performance. Alternative approaches. Collected By CJS

  5. Textbook : Michael J.Folk, Bill Zoellick, Greg Riccard :File Structures : An Object-Oriented Approach using C++. (Addison-Wesley) (LPE) References : 1. M.Loomis Data Management & File Processing (PHI) 2. O.Hanson Design of Computer Data Files McGraw-Hill (IE) Collected By CJS

  6. File Structure Design • A file structure is combination of representation of data in file and of operation for accessing the data.& allow read,write and modify data. • Good file structure design give us access to all capacity without spending lot of time waiting for disk. Collected By CJS

  7. Goals of Research and development in File Structure. • To get the information with one access to the disk. • If it is impossible to get info in one access,we need such structure that find target info with few accesses .ex.binary search among 55 thousands records …with 16 comparisons. • F.S with group info so we get all in one trip.Ex client name,address,ph.no& acc.bal Collected By CJS

  8. Making F.S usable in applicationusing c++ Class person { public: // data member char lastname[11],firstname[11]; char address[16],city[16],state[10]; //method person(); }; Person::person() { Lastname[0]=0;firstname[0]=0;address[0]=0;city[0]=0; state[0]=0;} Collected By CJS

  9. Physical file &logical file Physical file Logical file 1 The file are seen by program. 2 The use of logical files allows a program to describe the operations to be performed on a file without knowing what physical file will be used. 1 A file that actually present On secondary storage. 2 It is the file as known by the computer operating system. Collected By CJS

  10. Ex: association between a logical file called inp-file and physical file myfile.doc. • Select inp-file assign to myfile.doc This statement ask the operating system to find the physical file myfile and then to make the hookup by assigning a logical file(phone line) to it. Collected By CJS

  11. Opening Files • Once we have a logical file identifier hooked up to physical file or device. • We declare what to do: 1.Open existing file 2. create new file. Collected By CJS

  12. Opening Files • Open an existing file • Create new file Ex. Fd = open (filename,flags,[pmode]); Fd.- int - The file descriptor.if error this value is -ve. Filename-character string Collected By CJS

  13. Flags: Flags is set by bit-wise OR.(I) • O_Append-Append every write op to the end of the file. • O_CREAT- create &open a file for writing. this has no effect if the file already exist. • O_RDONLY- open file for reading only. • O_RDWR- open file for reading& writing. • O_WRONLY- open file for writing. Collected By CJS

  14. Pmode- int- if O_create is specified ,pmode is required. protection mode for file. In unixpmode is three digits. r w e EX.pmode=0751= 111 101 001 owner group world R=read,w=write,e=execute Ex.fd=open(filename,o_RDWR I o_create,0751) Collected By CJS

  15. Closing files • In terms of telephone line analogy closing file is like hanging up the phone. • When we hang up the phone the phone line is available for taking or placing another call. • When u close a file the logical file name or file descriptor is available for use with another file. Collected By CJS

  16. Closing files • Files are closed automatically by the operating system when program terminates normally. • Execution of a close statement within a program is needed only to protect it against data lose. • Closing file ensures that the buffer for that file has been flushed of data. • Everything written has been sent to the file. Collected By CJS

  17. Reading and writing • Input/output operation. • Read write statements used in diff lang varies. • Ex.Read(s_file,D_Addr,size); • Read (sourcefile,destinationfile, size) source_file-: : where it is to read from. The read call must know from where it is to read from . We must have already opened the file so the connection between a logical file and physical file(device)exists. Collected By CJS

  18. Reading and writing Destination_addr-:: where to place the information.by giving first address of the memory block where we want to store. Size-:: finaly How much info to bring in the file. Here argument is supplied as byte count. Collected By CJS

  19. Write Functions • Write(D_file,source_addr,Size) Destination_file-::logical file name is used for sending the data source_Addr-:: write must know where to find the info it will send. Size-:: the no of bytes to be written Collected By CJS

  20. Files with c & c++ • Two different ways 1.stdio.h 2.iostream.h &fstream.h • file=fopen(filename,type); File * --- A pointer to the file descriptor. Filename — char * ---- the file name type --- char* ----control the operation “r”—open an existing file for input. ”w”---create a new file . ”a”—create n or append Collected By CJS

  21. Programs to display the contents of file. • Program steps 1 Display for the name of input file. 2 Read the users response from keybord. 3 open the file for input. 4 close the input file. Collected By CJS

  22. #include<iostream.h> • #include<fstream.h> main() { char ch; fstream file; char filename[20]; cout<<“enter file name”; Cin>>filename; File.open(filename,ios::in); While(1) { file>>ch; If(file.fail()) break; Cout<<ch;} file.close();} Collected By CJS

  23. Detecting End of file • If(file.fail()) break; Function fail which return true (1) if previous operation failed. Collected By CJS

  24. Seeking • We read through the file sequentially ,reading one byte after another until we reach the end of file .every time a byte is read the operating system moves the read/write pointer head . • If we need ten thousand bytes away so we want to jump there. • Action of moving directly to certain position in file is often called seeking. Collected By CJS

  25. Seeking • Seek(source_file,offset) source_file – l.file name in which the seek will occur. Offset- The number of position in the file the ptr is to move. Seeking with c streams Pos=fseek(file,byte_offset,origin) Collected By CJS

  26. Pos---A long integer value • File—the file descriptor • Byte_offset - the no of bytes to move from in the file. Origin-- 0:- beginning of the file 1:- current position 2:- from the end of file. Collected By CJS

  27. Seeking with c++ stream classes • File.seekg(byte_offset,origin) Ios::beg ex:file.seekg(373,ios::beg) Ios::cur Ios::end Collected By CJS

  28. Unix file system commands • Cat filename… print the contents of the named text files. • Tail filename….print the last ten lines of the text file. • Cp file1 file2…. Copy file1 to file2 • Mv file1 file2…move(rename)file1 to file2. • Chmodmodefilename …change the protection mode on the named files. Collected By CJS

  29. Ls…..list of contents of the directory • Mkdir name….. Create a directory with given name. Rmdir name……..remove the named directory. Collected By CJS

  30. Disks • Disks drive belong to a class of devices known as direct access storage devices.(DASDs). because they make it possible to access data directly. • Magnetic tape permit only serial access • Hard disks r the most common disk used in everyday file processing . • Floppy disks r inexpensive but they r slow & hold very small data. • Floppy good for backup of single files. Collected By CJS

  31. Organisation of disks • The info r stored on disk is stored on the surface of one or more platters. • Info r stored in successive tracks on surface of the disk. • Each track is divided into number of sectors. • O.s find the correct surface,track and sector read the entire sector into buffer & the find the requested byte within that buffer. Collected By CJS

  32. Organisation of disks • Disk drives typically have a number of platters. • Tracks that are directly above and below one another form a cylinder. Collected By CJS

  33. Collected By CJS

  34. Estimating capacities and space Needs • Track capacity= number of sectors per track*bytes per sector • Cylinder capacity= number of tracks per cylinder*track capacity • Drive capacity=number of cylinder*cylinder capacity Collected By CJS

  35. Organizing track by sector • There are two basics ways to organize data on disk. • 1.by sector • 2.by user define block. Collected By CJS

  36. Organizing track by sector • The physical placement of sectors. • The most practical logical organization of sectors on a track is that sectors are adjacent, fixed-sized segments of a track that happens to hold a file. • Physically, is not optimal: after reading the data, it takes the disk controller some time to process the received information before it is ready to accept more. • Consequently we would be able to read only one sector per revolution. • Traditional Solution: Interleave the sectors Collected By CJS

  37. Interleave the sectors • They leave an interval of several physical sectors. • Suppose our disk had an interleaving factor of 5. • Its take five revolutions to read the entire thirty-two sectors. • That is big improvement over 32 revolutions. Collected By CJS

  38. Organizing track by sector • Clusters The file can also be viewed as a series of clusters of sectors which represent a fixed number of contiguous sectors. Once a cluster has been found on a disk, all sectors in that cluster can be accessed without requiring an additional seek The File Allocation Table(FAT)ties logical sectors to the physical clusters they belong to. Collected By CJS

  39. Extents • Lot of free room on disk we may be possible to make file consist entirely of contiguous clusters.we say file consists of one extents. • If there is no enough space avail. To contain entire file is divided & each part is an extents. • Imp thing @ extents is that a no of extents in a file increases the file spread more on the disk & amt of seeking increases Collected By CJS

  40. Fragmentation • G. all sectors have same no of bytes. • Ex size of sector is 512bytes & size of all records in file is 300bytes. • 2 ways…1)store only one record per sector. 2)allow records to span sectors so beginning in one record & end in another. Collected By CJS

  41. Fragmentation • The first option has the advantage that any record can be retrieved by retrieving just one sector . • but it has the disadvantage that it might leave an enormous amount of unused space within each sector. Collected By CJS

  42. Advantages & Disadvantages Advantages &disadvantages First option has advantage that any record can be retrieved by just one sector. Leave enormous amt of space within each sector this loss of space within a sector is called internal fragmentation. Second option ..no loss of space but accesing more sector for onerecord. Collected By CJS

  43. Use of cluster- when the no of bytes in a file is not an exact multiple of cluster size there will be internal fragmentation. • Large cluster for large file • Small cluster for small file Collected By CJS

  44. Organisation tracks by block • Sometimes disk tracks are divided into integral no of user define blocks. • Data transfer in single i/o op vary depend on needs of software designer. • Blocks can normally be either fixed or variable in length depending on req.of the file designer. • Block organisation does not present the fragmentation problems becoz blocks can vary in size. Collected By CJS

  45. Organisation tracks by block • Blocking factor indicate no of records that r Stored in each block in a file. • Suppose a file with 300 byte records,this method define block of multiple of 300bytes • No space lost in internal fragmentation. • Each block contains one or more sub.blocks. Containing extra info. • count block—no of bytes in data block. • Key subblock- key for last record in DB. Collected By CJS

  46. Nondata overhead • Both blocks and sectors require that a certain amount of space be taken up on the disk in the form of non-data overhead. • 1 on sector-addressable disks • 2 on block organized disk. Collected By CJS

  47. on sector-addressable disks • Preformating involves storing at the beginning of each sector,information such as sector address,trackaddress,and condition(whether the sector is usable or defective.) • on block organized disk:- some of the nondata overhead subblockinterblock gaps have to be provided with every block . G more nondata info provided with blocks than with sectors . Collected By CJS

  48. The cost of disk access • Factors contributing toal amount of time needed to access disk. • 1)seek time • 2)Rotational delay • 3)Transfer time Collected By CJS

  49. SEEK TIME • Seek time is the time required to move the access arm to the correct cylinder. • Depends on how far the arm has to move. • Costly in multiuser than Single user where disk usage dedicated to one process. • Its usually impossible to know exactly how many tracks will be traversed every seek. • So we go for avg.seek time. • Today’s harddiskavg seek time is less than 10 miliseconds. Collected By CJS

  50. Rotational delay • Time takes for disk to rotate so sector we want is under read/write head. • Harddisk rotate at- 5000rpm……7200rpm • Floppy disk - 360rpm. • Suppose that you have a file that requires two or more tracks ,that there are plenty of available tracks on one cylinder and that you write the file to disk sequentially with one write call. • When first track is filled the disk can immediately begin writing to the second track without any rotational delay. Collected By CJS

More Related