Representing data elements
1 / 27

Representing Data Elements - PowerPoint PPT Presentation

  • Updated On :

Representing Data Elements. Fields, Records, Blocks Variable-length Data Modifying Records. Source: our textbook. Overview. Attributes are represented by sequences of bytes, called fields Tuples are represented by collections of fields, called records

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Representing Data Elements' - shadi

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Representing data elements l.jpg

Representing Data Elements

Fields, Records, Blocks

Variable-length Data

Modifying Records

Source: our textbook

Overview l.jpg

  • Attributes are represented by sequences of bytes, called fields

  • Tuples are represented by collections of fields, called records

  • Relations are represented by collections of records, called files

  • Files are stored in blocks, using specialized data structures to support efficient modification and querying

Representing sql data types l.jpg
Representing SQL Data Types

  • integers and reals: built-in

  • CHAR(n): array of n bytes

  • VARCHAR(n): array of n+1 bytes (extra byte is either string length or null char)

  • dates and times: fixed length strings

  • etc.

Representing tuples l.jpg








256 bytes



10 bytes



30 bytes



1 byte

Representing Tuples

  • For now, assume all attributes (fields) are fixed length.

  • Concatenate the fields

  • Store the offset of each field in schema

More on tuples l.jpg








256 bytes



10 bytes

+ 2



30 bytes

+ 2



1 byte

+ 3

More on Tuples

  • Due to hardware considerations, certain types of data need to start at addresses that are multiples of 4 or 8

  • Previous example becomes:

Record headers l.jpg
Record Headers

  • Often it is convenient to keep some "header" information in each record:

    • a pointer to schema information (attributes/fields, types, their order in the tuple, constraints)

    • length of the record/tuple

    • timestamp of last modification

Packing records into blocks l.jpg
Packing Records into Blocks

  • Start with block header:

    • timestamp of last modification/access

    • offset of each record in the block, etc.

  • Follow with sequence of records

  • May end with some unused space


block 1

block 2

block n-1

block n

Representing addresses l.jpg
Representing Addresses

  • Often addresses (pointers) are part of records:

    • the application data in object-oriented databases

    • as part of indexes and other data structures supporting the DBMS

  • Every data item (block, record, etc.) has two addresses:

    • database address: address on the disk

      (typically 8-16 bytes)

    • memory address, if the item is in virtual memory (typically 4 bytes)

Translation table l.jpg
Translation Table

  • Provides mapping from database addresses to memory addresses for all blocks currently in memory

  • Later we'll discuss how to implement it

Pointer swizzling l.jpg
Pointer Swizzling

  • When a block is moved from disk into main memory, change all the disk addresses that point to items in this block into main memory addresses.

  • Need a bit for each address to indicate if it is a disk address or a memory address.

  • Why? Faster to follow memory pointers (only uses a single machine instruction).

Example of swizzling l.jpg
Example of Swizzling


Main Memory

read into

main memory

Block 1

Block 2

Swizzling policies l.jpg
Swizzling Policies

  • Automatic swizzling: as soon as block is brought into memory, swizzle all relevant pointers

  • Swizzling on demand: only swizzle a pointer if and when it is actually followed

  • No swizzling

  • Programmer control

Automatic swizzling l.jpg
Automatic Swizzling

  • Locating all pointers within a block:

    • refer to the schema, which will indicate where addresses are in the records

    • for index structures, pointers are at known locations

  • Update translation table with memory addresses of items in the block

  • Update pointers in the block (in memory) with memory addresses, when possible, as obtained from translation table

Unswizzling l.jpg

  • When a block is moved from memory back to disk, all pointers must go back to database (disk) addresses

  • Use translation table again

  • Important to have an efficient data structure for the translation table

Pinned records and blocks l.jpg
Pinned Records and Blocks

  • A block in memory is pinned if it cannot be safely written back to disk

  • Indicate with a bit in the block header

  • Reasons for pinning:

    • related to failure recovery (more later)

    • because of pointer swizzling

  • If block B1 has swizzled pointer to an item in block B2, then B2 is pinned.

Unpinning a block l.jpg
Unpinning a Block

  • Consider each item in the block to be unpinned

  • Keep in the translation table the places in memory holding swizzled pointers to that item (e.g., with a linked list)

  • Unswizzle those pointers (i.e., use translation table to replace the memory addresses with database (disk) addresses

Variable length data l.jpg
Variable Length Data

  • Data items with varying size (e.g., if maximum size of a field is large but most of the time the values are small)

  • Variable-format records (e.g., NULLs method for representing a hierarchy of entity sets as relations)

  • Records that do not fit in a block (e.g., an MPEG of a movie)

Variable length fields l.jpg
Variable-Length Fields

  • Store the fixed-length fields before the variable-length fields in each record

  • Keep in the record header

    • record length

    • pointers to the beginnings of all the variable-length fields

  • Book discusses variations on this idea

Variable format records l.jpg
Variable-Format Records

  • Represent by a sequence of tagged fields

  • Each tagged field contains

    • name

    • type

    • length, if not deducible from the type

    • value

Splitting records across blocks l.jpg
Splitting Records Across Blocks

  • Called spanned records

  • Useful when

    • record size exceeds block size

    • putting an integral number of records in a block wastes a lot of the block (e.g., record size is 51% of block size)

  • Each record or fragment header contains

    • bit indicating if it is a fragment

    • if fragment then pointers to previous and next fragments of the record (i.e., a linked list)

Record modification l.jpg
Record Modification

  • Modifications to records:

    • insert

    • delete

    • update

  • issues even with fixed-length records and fields

  • even more involved with variable-length data

Inserting new records l.jpg
Inserting New Records

  • If records need not be any particular order, then just find a block with enough empty space

  • Later we'll see how to keep track of all the tuples of a given relation

  • But what if blocks should be kept in a certain order, such as sorted on primary key?

Insertion in order l.jpg
Insertion in Order

If there is space in the block, then add the record

(going right to left), add a pointer to it (going left

to right) and rearrange the pointers as needed.

What if block is full l.jpg
What if Block is Full?

  • Records are stored in several blocks, in sorted order

  • One approach: keep a linked list of "overflow" blocks for each block in the main sequence

  • Another approach is described in the book

Deleting records l.jpg
Deleting Records

  • Try to reclaim space made available after a record is deleted

  • If using an offset table, then rearrange the records to fill in any hole that is left behind and adjust the pointers

  • Additional mechanisms are based on keeping a linked list of available space and compacting when possible

Tombstones l.jpg

  • What about pointers to deleted records?

  • We place a tombstone in place of each deleted record

  • Tombstone is permanent

  • Issue of where to place the tombstone

  • Keep a tombstone bit in each record header: if this is a tombstone, then no need to store additional data

Updating records l.jpg
Updating Records

  • For fixed-length records, there is no effect on the storage system

  • For variable-length records:

    • if length increases, like insertion

    • if length decreases, like deletion except tombstones are not necessary