1 / 19

File Organization IB Computer Science SL Evan Stoner | April 2009

File Organization IB Computer Science SL Evan Stoner | April 2009. 7.1.1 Definition of the Key Field. Used to identify a record Primary key is the unique identifier of a record, e.g. alpha code Can be created by combining data records, e.g. surname + birthday + phone number

vidor
Download Presentation

File Organization IB Computer Science SL Evan Stoner | April 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File Organization IB Computer Science SL Evan Stoner | April 2009

  2. 7.1.1 Definition of the Key Field • Used to identify a record • Primary key is the unique identifier of a record, e.g. alpha code • Can be created by combining data records, e.g. surname + birthday + phone number • Secondary key is used to classify a record, e.g. gender

  3. 7.1.2-3 Sequential File Organization • Same as serial file, except stored in an order, e.g. numerically • Designed to operate on magnetic tapes, so previous files must be accessed by starting in the beginning of the file • Cannot be updated directly, must be read, stored to arrays, modified, sorted, and written back

  4. 7.1.2-3 Sequential File Organization (II) • Standard algorithm: • Open file for readingFound = falseWhile (not end of file and not found) Do Read record Set fields to data variables: data1 = field1 If record matches required record then found = trueIf (found) then perform desired processing on data variablesClose file

  5. 7.1.4 Partially Indexed Sequential File Org. • Index of a file operates in the same way as that of a book; that is, looking up a location • For partially-indexed, starts by looking up a category (A, B, ...), then a record (aardvark, able, ace, ...) which is usually ordered

  6. 7.1.4 Partially Indexed Sequential File Org. (II) • Diagram:

  7. 7.1.4 Partially Indexed Sequential File Org. (III) • Basic algorithm: • Open indexOpen fileGet searchKeyFound = falseLocate access address in index using searchKeyDirect access record groupWhile not at end of group and not found Do Get record if match then found = true return recordif found then process record as requiredclose file and index

  8. 7.1.5 Fully Indexed File • Has an address for each record in the file associated with each separate index • First searched to find the address • Address then used to access desired record • Records don’t have to be in order since each record has its own index

  9. 7.1.5 Fully Indexed File (II) • Diagram: • (Relational database, anyone?)

  10. 7.1.6 Direct File Organization • A file is a sequence of bytes • If you know the number of bytes in your records, you can access a record directly using an offset formula (records x length) • Consider a file containing records 50 bytes long • To access the third record, the program would start reading at the 101st byte (3-2 * 50+1)

  11. 7.1.6 Direct File Organization (II) • Table: • Standard algorithm: • set lengthOfRecord = length of record in bytesopen fileget recordNumbermoveto byte position recordNumber*lengthOfRecordread recordedit recordwrite recordclose file

  12. 7.1.7 Fixed and Variable Length Records • Records can be different lengths, e.g. my last name is not the same length as yours • For direct access to work, the record must be fixed length, so Strings are padded by adding spaces to the beginning of the String

  13. 7.1.8 Using Hashing to Facilitate File Access • Hashing is a way of creating a unique id for a record by processing its contents • A hash record can be stored to a file then read and unhashed to find the contents • A hash code can be directly read from a file, or stored in an array

  14. 7.1.9 Comparing Efficiency of File Types • Sequential and direct access files require the same amount of physical storage • Indexed files require extra storage because of the storage of the index • Indexes typically need to be stored in memory to be used, which can weigh the program down quite heavily if there are a lot of records • Speed of access depends on the type of media

  15. 7.1.9 Comparing Efficiency of File Types (II) • Sequential is slow because it depends on disk reads • Index requires fewer disk reads and is thus faster

  16. 7.1.10 Logical and Physical Organization • Physical structure of data in RAM or secondary storage is a linear sequence of binary bits • Logical structure is the abstract way the program is able to access data and us under the control of the designer • Logical structure in RAM can be an array, linked lists, and binary trees

  17. 7.1.11 External Sorts • Sequential files must be in a certain order • When data is inserted, the file has to be sorted • Data can be read, sorted, and rewritten; or it can be inserted using a merge sort

  18. 7.1.12 Examples of Direct Access File Handling • See 7.1.12 for algorithms in Java for • a general three field record file (pg. 366) and • a simple application of a hashing function (pg. 369)

  19. The End • Alternatively, you could use SQL.

More Related