File Organization IB Computer Science SL Evan Stoner | April 2009

File Organization IB Computer Science SL Evan Stoner | April 2009

7.1.1 Definition of the Key Field • Used to identify a record • Primary key is the unique identifier of a record, e.g. alpha code • Can be created by combining data records, e.g. surname + birthday + phone number • Secondary key is used to classify a record, e.g. gender

7.1.2-3 Sequential File Organization • Same as serial file, except stored in an order, e.g. numerically • Designed to operate on magnetic tapes, so previous files must be accessed by starting in the beginning of the file • Cannot be updated directly, must be read, stored to arrays, modified, sorted, and written back

7.1.2-3 Sequential File Organization (II) • Standard algorithm: • Open file for readingFound = falseWhile (not end of file and not found) Do Read record Set fields to data variables: data1 = field1 If record matches required record then found = trueIf (found) then perform desired processing on data variablesClose file

7.1.4 Partially Indexed Sequential File Org. • Index of a file operates in the same way as that of a book; that is, looking up a location • For partially-indexed, starts by looking up a category (A, B, ...), then a record (aardvark, able, ace, ...) which is usually ordered

7.1.4 Partially Indexed Sequential File Org. (II) • Diagram:

7.1.4 Partially Indexed Sequential File Org. (III) • Basic algorithm: • Open indexOpen fileGet searchKeyFound = falseLocate access address in index using searchKeyDirect access record groupWhile not at end of group and not found Do Get record if match then found = true return recordif found then process record as requiredclose file and index

7.1.5 Fully Indexed File • Has an address for each record in the file associated with each separate index • First searched to find the address • Address then used to access desired record • Records don’t have to be in order since each record has its own index

7.1.5 Fully Indexed File (II) • Diagram: • (Relational database, anyone?)

7.1.6 Direct File Organization • A file is a sequence of bytes • If you know the number of bytes in your records, you can access a record directly using an offset formula (records x length) • Consider a file containing records 50 bytes long • To access the third record, the program would start reading at the 101st byte (3-2 * 50+1)

7.1.6 Direct File Organization (II) • Table: • Standard algorithm: • set lengthOfRecord = length of record in bytesopen fileget recordNumbermoveto byte position recordNumber*lengthOfRecordread recordedit recordwrite recordclose file

7.1.7 Fixed and Variable Length Records • Records can be different lengths, e.g. my last name is not the same length as yours • For direct access to work, the record must be fixed length, so Strings are padded by adding spaces to the beginning of the String

7.1.8 Using Hashing to Facilitate File Access • Hashing is a way of creating a unique id for a record by processing its contents • A hash record can be stored to a file then read and unhashed to find the contents • A hash code can be directly read from a file, or stored in an array

7.1.9 Comparing Efficiency of File Types • Sequential and direct access files require the same amount of physical storage • Indexed files require extra storage because of the storage of the index • Indexes typically need to be stored in memory to be used, which can weigh the program down quite heavily if there are a lot of records • Speed of access depends on the type of media

7.1.9 Comparing Efficiency of File Types (II) • Sequential is slow because it depends on disk reads • Index requires fewer disk reads and is thus faster

7.1.10 Logical and Physical Organization • Physical structure of data in RAM or secondary storage is a linear sequence of binary bits • Logical structure is the abstract way the program is able to access data and us under the control of the designer • Logical structure in RAM can be an array, linked lists, and binary trees

7.1.11 External Sorts • Sequential files must be in a certain order • When data is inserted, the file has to be sorted • Data can be read, sorted, and rewritten; or it can be inserted using a merge sort

7.1.12 Examples of Direct Access File Handling • See 7.1.12 for algorithms in Java for • a general three field record file (pg. 366) and • a simple application of a hashing function (pg. 369)

The End • Alternatively, you could use SQL.

File Organization IB Computer Science SL Evan Stoner | April 2009

File Organization IB Computer Science SL Evan Stoner | April 2009

Presentation Transcript

IB Computer Science

IB Computer Science

MATH IB SL

Computer Science 210 Computer Organization

Computer Science 210 Computer Organization

Computer Science 210 Computer Organization

Computer Science 210 Computer Organization

Computer Science 210 Computer Organization

Computer Science 210 Computer Organization

Computer Science 210 Computer Organization

Computer Science 210 Computer Organization

Computer Science 210 Computer Organization

Computer Science 210 Computer Organization

IB Computer Science II

IB Computer Science II

IB Chemistry SL

IB Music SL

Computer Science 210 Computer Organization

Computer Science 210 Computer Organization

IB Physics SL

MATH IB SL

IB Computer Science I