Comp 335 File Structures

Comp 335File Structures Fundamental File Structure Concepts

File Organization • File Organization is how the data is organized in the file. • Must be considered carefully how data is to be written to file because this will dictate how the data is to be read back in.

Example of Data saved to File • Assume a programmer writes all data to file by using strings. Data to be saved on file Towns and Populations Searcy 15000 Bald Knob 3500 Romance 950

Example of Data saved to File When saved on file: Searcy15000Bald Knob3500Romance950

Considerations when Writing Data to File • Must keep the “integrity” of the individual units of data (fields) which we wrote. • Group logical units of data together in records. • Within each record, organize the data on file in a way that will maintain “field separation”. In other words, write it in a way where the data can be recaptured.

Common Field Structures • Force fields to have a predictable length • Begin each field with a length indicator • Place a delimeter at the end of each field to separate it from the next • Use a “keyword = value” expression to identify each field and its contents.

Fields with a predictable length Assume that: Towns (char [12]) and Population (char [7]) Data to be saved on file Towns and Populations Searcy 15000 Bald Knob 3500 Romance 950 When written to file: Searcy 15000 Bald Knob 3500 Romance 950

Fields with a predictable length • A good method if all of the data to be stored was fixed in length. • What if the data to be stored were variable in length? • A lot of wasted space is used unnecessarily.

Fields with a length indicator Assume that: Towns (char [12]) and Population (char [7]) Data to be saved on file Towns and Populations Searcy 15000 Bald Knob 3500 Romance 950 When written to file: 6Searcy5150009Bald Knob435007Romance3950

Fields with a length indicator • The length indicator tells how many bytes to read. • How many bytes should you use for the length indicator? • 1 byte (field size max = 255) • 2 byte (field size max = 65535) • This method should save space if the data is quite variable in length. • In this case, mixes binary data with text.

Fields separated by delimiters Assume that: Towns (char [12]) and Population (char [7]) Data to be saved on file Towns and Populations Searcy 15000 Bald Knob 3500 Romance 950 When written to file: Searcy|15000|Bald Knob|3500|Romance|950

Fields separated by delimiters • Could possibly save more space • Delimiter choice must not be part of valid data • Language must provide instructions to read data based on a sentinel value • In C++, getline is overloaded to be able to handle this.

Fields separated by “keyword = value” Assume that: Towns (char [12]) and Population (char [7]) Data to be saved on file Towns and Populations Searcy 15000 Bald Knob 3500 Romance 950 When written to file: TOWN=Searcy|POP=1500|TOWN=Bald Knob|POP=3500|TOWN=Romance

Fields separated by “keyword = value” • This does make for potentially a lot of wasted space in the file. • It is a good technique if some fields are not used at times within records. • It also is good if you just want to save a lot of information on file and not organize the data within records.

Record Organization • Fields can be combined to form a record • An entire record can be read in at a time into a buffer and then fields can be parsed out. • This is common because the majority of time we want to read and write records, not read and write individual fields.

Fixed-Length Records • A frequently utilized method for file organization. • This can imply that each field must be fixed length. • It could be just a “container” to store a variable number of variable length fields.

Fixed-Length Records Assume that: Towns (char [12]) and Population (char [7]) These fields are combined in a 19 byte record. Data to be saved on file Towns and Populations Searcy 15000 Bald Knob 3500 Romance 950 When written to file: Searcy 15000 Bald Knob 3500 Romance 950

Fixed-Length Records • Makes DIRECT ACCESS to records feasible, this will help reduce seeks!!!!! • Space could be wasted if the fields within the record are highly variable.

Variable Length Records • Store just the data within the records, no wasted space. • Sequential access to get to each record. Typically a length indicator is given at the beginning of the record. It can be combined with “field integrity” techniques.

Variable Length Records Assume that: Towns (char [12]) and Population (char [7]) These fields are combined in a 19 byte record. Data to be saved on file Towns and Populations Searcy 15000 Bald Knob 3500 Romance 950 When written to file: 13Searcy|15000|15Bald Knob|3500|12Romance|950|

Variable Length Records • To improve access to records (which will minimize seeks), an index can be used which can store the offsets of each variable length record in the file.

Variable Length Records Data to be saved on file Towns and Populations Searcy 15000 Bald Knob 3500 Romance 950 When written to file: Searcy|15000|Bald Knob|3500|Romance|950| Index of Offsets 0 13 28 40

Variable Length Records • To obtain direct access to variable records, each offset address can be associated with a key which uniquely identifies each record. • The index can be searched for the key, address found and then directly access the record.

Comp 335 File Structures