Organizing files for performance

Organizing files for performance Chapter 6

6.1 Data compression • Advantages of reduced file size • Redundancy reduction: state code example • Repeating sequences: run length encoding • Variable length code • static (Morse code) • dynamic (Huffman code) • Irreversible compression (e.g., jpeg) • Unix routines (append .z to compressed files)

6.2 Reclaiming space • “Holes” arise when • variable length records are updated • fixed or variable length records are deleted • Compaction (for deleted records) • mark deleted records • allows undelete to be implemented • periodically run compaction program

6.2.2 Dynamic reclamation • Simple approach: search sequentially until space is found to insert a new record; drawback: very slow • Alternative uses linked list stack to allow immediate access to an empty slot, if available; stack may be kept in deleted record slots, with RRN of top in header record.

6.2.3 Variable length records • Same scheme (linked list stack) may be used, except byte offset rather than RRN must be used as link • Deleted records go on top of stack, but stack must be searched when adding records to find a space big enough to accommodate each new record

6.2.4 Fragmentation • Internal • fixed length records • “unsophisticated” variable length scheme • External: variable length records • smaller record is placed in a larger slot • leftover space is added to available list • Coalescing holes (good test question)

6.2.5 Placement strategies • First fit: first record slot that’s big enough • Best fit: sort slots in ascending order by size, then use first fit • Worst fit: sort in descending order • no need to search: just use first space if it’s big enough • leftover space may be enough for another record

6.3.2 Binary search • relational ops for search key • retrieval by RRN • object-oriented presentation of algorithm • implementation with templates • compilation with class definitions

6.3.3-4 Search performance • complexity for binary search is O(log2n), compared to O(n) for sequential search • records must be sorted on search key • disk sort is prohibitively expensive • “internal sort” allows direct accesses in memory

6.3.5 Limitations • number of disk accesses for binary search is still significant for large files • keeping a file sorted can be less efficient than using sequential search; merge technique addresses this problem • internal sort is limited to small files, that will fit entirely in memory

6.4 Keysort • only keys are kept in memory • each key is kept with its RRN (keynode) • keynode array is sorted in memory • data file can be sorted by reading records in order or sorted keynodes and writing them to a new file • keynodes can be written as an index file

6.4.4 Pinned records • available list (of deleted record slots) • records whose physical locations are referenced in other records are pinned

Organizing files for performance

Organizing files for performance

Presentation Transcript

PowerBase for Organizing

Organizing your files and folders

Piles of Files? Organizing Your Records

Windows Tutorial 2 Organizing Your Files

Windows Tutorial 2 Organizing Your Files

Organizing for College

Organizing for Innovation

Performance Improvements with ATLAS AOD files

Organizing for Success

Organizing for Quality

United Way Business Performance Overview United Way’s Organizing Framework for Performance

Organizing for College

Chap6. Organizing Files for Performance

Organizing Files for Performance

Chapter 3: Organizing for Performance

Organizing Files for Performance

Organizing for Change

Organizing and Managing Your Files

Windows Tutorial 2 Organizing Your Files

Organizing files in a Website

systematic ways of organizing the files