Lecture 5 record storage and primary file organizations
1 / 24

Lecture 5: Record Storage and Primary File Organizations - PowerPoint PPT Presentation

  • Uploaded on

Lecture 5: Record Storage and Primary File Organizations. Storage Devices Storage of Databases Operations on Files Primary vs. Secondary File Organizations Heap Files Sorted Files Hashing. Storage Devices. Computer Storage Medium (Hierarchy) Factors: cost, capacity, speed

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Lecture 5: Record Storage and Primary File Organizations' - ryanadan

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lecture 5 record storage and primary file organizations l.jpg
Lecture 5: Record Storage and Primary File Organizations

  • Storage Devices

  • Storage of Databases

  • Operations on Files

  • Primary vs. Secondary File Organizations

  • Heap Files

  • Sorted Files

  • Hashing

Storage devices l.jpg
Storage Devices

  • Computer Storage Medium (Hierarchy)

    • Factors: cost, capacity, speed

    • Primary Storage – data processed directly by the CPU; main memory, cache memory

    • Secondary (on-line) Storage - data must first be copied into primary storage for processing; magnetic disks

    • Secondary (off-line) Storage - optical disks (direct access), magnetic tapes (sequential)

Storage of databases l.jpg
Storage of Databases

  • Main Memory Databases

    • entire databases are kept in main memory

    • main memory is a volatile storage: requires a backup copy (on magnetic disk)

  • Most Databases

    • are stored permanently on magnetic disk

    • are too large to fit entirely in main memory

    • magnetic disk is less expensive

File records on disk l.jpg
File Records on Disk

  • Records

    • file as a sequence of records (fig5.7)

    • record type = field names + data types

  • Fixed-Length Records

    • records with the same size in a file

  • Variable-Length Records (with separators)

    • records of different sizes

    • caused by multi-valued fields, optional fields, or variable-length fields

File blocks on disk l.jpg
File Blocks on Disk

  • Disk Block (fig5.8)

    • unit of data transfer between disk & memory

    • records of a file are allocated to disk blocks

    • usually 512 to 4K bytes (K=1024)

  • Blocking Factor (bfr)

    • number of (fixed-length) records in a block

    • bfr = B/R (floor function)

    • B = block size, R = record size (in bytes)

File blocks on disk6 l.jpg
File Blocks on Disk

  • Spanned vs. Unspanned File Org. (fig5.8)

    • Unspanned: leaves the remaining space in each block unused

    • Spanned: utilizes the unused space

  • Contiguous vs. Linked Allocation

    • Contiguous: file blocks are allocated to consecutive disk blocks

    • Linked: each file block contains the pointer to the next block

Operations on files l.jpg
Operations on Files

  • Types of Operations

    • Retrieval: do not change data in the file (open/close a file, find/read records)

    • Update: change the files by insertion, deletion or modification of records

    • Record-at-a-time: operations are applied to a single record

    • Set-at-a-time: operations are applied to a set of records or to the whole file

Operations on files8 l.jpg
Operations on Files

  • File Open/Close Operations

    • Open: readies the file for access, allocates buffers to hold file blocks, sets the file pointer to the beginning of the file

    • Close: terminates access to the file

  • Set-at-a-time Operations

    • Find: searches for the first file record that satisfies a certain condition (selection condition), and makes it the current file record

Operations on files9 l.jpg
Operations on Files

  • FindNext: searches for the next file record (from the current record) and makes it the current file record

  • Read: reads the current file record

  • Insert: inserts a new record into the file and makes it the current file record

  • Delete: removes the current file record from the file by marking the record to indicate that it is no longer valid

Operations on files10 l.jpg
Operations on Files

  • Modify: changes the values of some fields of the current file record

  • Record-at-a-time Operations

    • FindAll: locates all the records satisfying a search condition

    • FindOrdered: retrieves all the records in a specific order

    • Reorganize: reorganizes the records after update operations

  • Operations on files11 l.jpg
    Operations on Files

    • Operation Factors

      • Access Type: attribute value(=) or range(>)

      • Access Time: to find a particular record(s)

      • Insertion Time: to insert a new record (find the place to insert + index structure update)

      • Deletion Time: to delete a record (find the record(s) to delete + index structure update)

      • Space Overhead: additional space occupied by an index structure

    Primary vs secondary file organizations l.jpg
    Primary vs. Secondary File Organizations

    • Primary File Organizations

      • Heap Files

      • Sorted Files

      • Hashing

    • Secondary File Organizations (Index)

      • Single-level or Multi-level Indexes

      • B-trees

      • B+-trees

    Heap files l.jpg
    Heap Files

    • Files of Unordered Records

      • simplest and basic file organization

      • new records are inserted at the end of the file

      • Access: linear search requires searching through the file block by block (N/2 file blocks on average if the record exists, N file blocks if not), very inefficient (it takes O(N) time)

      • Insertion: very efficient (random order)

      • Deletion: must first find its block, inefficient

    Heap files14 l.jpg
    Heap Files

    • Direct File

      • allows direct access by the position of a record in a file

      • applies only to fixed-length records, contiguous allocation, and unspanned blocks

      • file records: 0, 1, … , r-1 (i.e., 120)

      • records in each block (bfr): 0, 1, … , bfr-1 (15)

      • ith record of a file (43): block position = (i/bfr), record position in the block = (i mod bfr)

    Sorted files l.jpg
    Sorted Files

    • Files of Ordered Records

      • file records are kept sorted by the values of an ordering field (sequential file): fig5.9

      • Access: binary search (on its ordering field) requires reading and searching log2 of the file blocks on the average (O(logN) time), improvement over linear search

      • Insertion: records must be inserted in the correct order, very inefficient

    Sorted files16 l.jpg
    Sorted Files

    • Files of Ordered Records (con’t)

      • Deletion: inefficient, less expensive with deletion marker and periodic reorganization

      • FindOrdered: reading the records in order of the ordering key values is extremely efficient

      • Overflow: temporary unordered file for new records to improve insertion efficiency, periodically merged with the main ordered file

    Hashing l.jpg

    • Hash Functions

      • records in the file are unordered

      • determine the address (B) of a record based on the value of the hash field (K) in the record

      • h(K) -> B

      • ex) h(K) = K mod M (1, 2, … , M-1)

      • allow direct access to the target disk block

      • record search in the block: main memory

    Internal hashing l.jpg
    Internal Hashing

    • Internal Hashing

      • hashing for an internal file

      • hash table as an array of records (fig5.10)

      • noninteger hash field value such as names can be transformed into an integer (ASCII)

    • Collision (of hash addresses)

      • occurs when two hash field values are mapped into the same hash address

    Collision resolution l.jpg
    Collision Resolution

    • Open Addressing

      • checks the subsequent positions in order until an empty position is found

    • Chaining

      • extend the array with a number of overflow positions

      • use a linked list of overflow records for each hash address

      • overflow pointer refers to the position of the next record (fig5.10(b))

    Collision resolution20 l.jpg
    Collision Resolution

    • Multiple Hashing

      • applies a second hash function if the first hash function results in a collision

      • uses open addressing or applies a third hash function if another collision results

    • Good Hashing Function

      • uniform and random distribution of records

      • hash table 70-90% full to minimize collisions with less unused locations

    External hashing l.jpg
    External Hashing

    • Hashing Function

      • target address space is made of buckets (one disk block or a cluster of contiguous blocks)

      • maps a hash field value into a bucket number

      • bucket number is then converted to the corresponding disk block address (fig5.11)

      • collision is less severe with buckets because as many records as will fit in a bucket

    External hashing22 l.jpg
    External Hashing

    • Bucket Overflow

      • when a bucket is filled to capacity

      • can be solved by chaining method: fig5.12

      • a pointer is maintained in each bucket to a linked list of overflow records for the bucket

      • record pointers include both a block address and a relative record position within the block

    External hashing23 l.jpg
    External Hashing

    • Static Hashing

      • very fast access to records by the hash field

      • a fixed number of buckets M is allocated

      • not suitable for dynamic files (grows and shrinks dynamically)

      • difficult to determine the number of buckets in advance

      • requires a dynamic hashing technique

    Dynamic hashing l.jpg
    Dynamic Hashing

    • Extendible Hashing (fig5.13)

      • maintains a directory of 2d bucket addresses

      • uses first d bits of a hash value to determine a directory entry and then a bucket address

      • d = global depth, d’ = local depth of a bucket

      • directory expands and shrinks dynamically

      • bucket doubling (split) vs. halving (merge)

      • update directory and local depth appropriately