representing data elements
Download
Skip this Video
Download Presentation
Representing Data Elements

Loading in 2 Seconds...

play fullscreen
1 / 27

Representing Data Elements - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

Representing Data Elements. Fields, Records, Blocks Variable-length Data Modifying Records. Source: our textbook. Overview. Attributes are represented by sequences of bytes, called fields Tuples are represented by collections of fields, called records

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Representing Data Elements' - shadi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
representing data elements

Representing Data Elements

Fields, Records, Blocks

Variable-length Data

Modifying Records

Source: our textbook

overview
Overview
  • Attributes are represented by sequences of bytes, called fields
  • Tuples are represented by collections of fields, called records
  • Relations are represented by collections of records, called files
  • Files are stored in blocks, using specialized data structures to support efficient modification and querying
representing sql data types
Representing SQL Data Types
  • integers and reals: built-in
  • CHAR(n): array of n bytes
  • VARCHAR(n): array of n+1 bytes (extra byte is either string length or null char)
  • dates and times: fixed length strings
  • etc.
representing tuples

30

286

287

297

0

address

VARCHAR(255)

256 bytes

birthdate

DATE

10 bytes

name

CHAR(30)

30 bytes

gender

CHAR(1)

1 byte

Representing Tuples
  • For now, assume all attributes (fields) are fixed length.
  • Concatenate the fields
  • Store the offset of each field in schema
more on tuples

32

288

292

304

0

address

VARCHAR(255)

256 bytes

birthdate

DATE

10 bytes

+ 2

name

CHAR(30)

30 bytes

+ 2

gender

CHAR(1)

1 byte

+ 3

More on Tuples
  • Due to hardware considerations, certain types of data need to start at addresses that are multiples of 4 or 8
  • Previous example becomes:
record headers
Record Headers
  • Often it is convenient to keep some "header" information in each record:
    • a pointer to schema information (attributes/fields, types, their order in the tuple, constraints)
    • length of the record/tuple
    • timestamp of last modification
packing records into blocks
Packing Records into Blocks
  • Start with block header:
    • timestamp of last modification/access
    • offset of each record in the block, etc.
  • Follow with sequence of records
  • May end with some unused space

header

block 1

block 2

block n-1

block n

representing addresses
Representing Addresses
  • Often addresses (pointers) are part of records:
    • the application data in object-oriented databases
    • as part of indexes and other data structures supporting the DBMS
  • Every data item (block, record, etc.) has two addresses:
    • database address: address on the disk

(typically 8-16 bytes)

    • memory address, if the item is in virtual memory (typically 4 bytes)
translation table
Translation Table
  • Provides mapping from database addresses to memory addresses for all blocks currently in memory
  • Later we\'ll discuss how to implement it
pointer swizzling
Pointer Swizzling
  • When a block is moved from disk into main memory, change all the disk addresses that point to items in this block into main memory addresses.
  • Need a bit for each address to indicate if it is a disk address or a memory address.
  • Why? Faster to follow memory pointers (only uses a single machine instruction).
example of swizzling
Example of Swizzling

Disk

Main Memory

read into

main memory

Block 1

Block 2

swizzling policies
Swizzling Policies
  • Automatic swizzling: as soon as block is brought into memory, swizzle all relevant pointers
  • Swizzling on demand: only swizzle a pointer if and when it is actually followed
  • No swizzling
  • Programmer control
automatic swizzling
Automatic Swizzling
  • Locating all pointers within a block:
    • refer to the schema, which will indicate where addresses are in the records
    • for index structures, pointers are at known locations
  • Update translation table with memory addresses of items in the block
  • Update pointers in the block (in memory) with memory addresses, when possible, as obtained from translation table
unswizzling
Unswizzling
  • When a block is moved from memory back to disk, all pointers must go back to database (disk) addresses
  • Use translation table again
  • Important to have an efficient data structure for the translation table
pinned records and blocks
Pinned Records and Blocks
  • A block in memory is pinned if it cannot be safely written back to disk
  • Indicate with a bit in the block header
  • Reasons for pinning:
    • related to failure recovery (more later)
    • because of pointer swizzling
  • If block B1 has swizzled pointer to an item in block B2, then B2 is pinned.
unpinning a block
Unpinning a Block
  • Consider each item in the block to be unpinned
  • Keep in the translation table the places in memory holding swizzled pointers to that item (e.g., with a linked list)
  • Unswizzle those pointers (i.e., use translation table to replace the memory addresses with database (disk) addresses
variable length data
Variable Length Data
  • Data items with varying size (e.g., if maximum size of a field is large but most of the time the values are small)
  • Variable-format records (e.g., NULLs method for representing a hierarchy of entity sets as relations)
  • Records that do not fit in a block (e.g., an MPEG of a movie)
variable length fields
Variable-Length Fields
  • Store the fixed-length fields before the variable-length fields in each record
  • Keep in the record header
    • record length
    • pointers to the beginnings of all the variable-length fields
  • Book discusses variations on this idea
variable format records
Variable-Format Records
  • Represent by a sequence of tagged fields
  • Each tagged field contains
    • name
    • type
    • length, if not deducible from the type
    • value
splitting records across blocks
Splitting Records Across Blocks
  • Called spanned records
  • Useful when
    • record size exceeds block size
    • putting an integral number of records in a block wastes a lot of the block (e.g., record size is 51% of block size)
  • Each record or fragment header contains
    • bit indicating if it is a fragment
    • if fragment then pointers to previous and next fragments of the record (i.e., a linked list)
record modification
Record Modification
  • Modifications to records:
    • insert
    • delete
    • update
  • issues even with fixed-length records and fields
  • even more involved with variable-length data
inserting new records
Inserting New Records
  • If records need not be any particular order, then just find a block with enough empty space
  • Later we\'ll see how to keep track of all the tuples of a given relation
  • But what if blocks should be kept in a certain order, such as sorted on primary key?
insertion in order
Insertion in Order

If there is space in the block, then add the record

(going right to left), add a pointer to it (going left

to right) and rearrange the pointers as needed.

what if block is full
What if Block is Full?
  • Records are stored in several blocks, in sorted order
  • One approach: keep a linked list of "overflow" blocks for each block in the main sequence
  • Another approach is described in the book
deleting records
Deleting Records
  • Try to reclaim space made available after a record is deleted
  • If using an offset table, then rearrange the records to fill in any hole that is left behind and adjust the pointers
  • Additional mechanisms are based on keeping a linked list of available space and compacting when possible
tombstones
Tombstones
  • What about pointers to deleted records?
  • We place a tombstone in place of each deleted record
  • Tombstone is permanent
  • Issue of where to place the tombstone
  • Keep a tombstone bit in each record header: if this is a tombstone, then no need to store additional data
updating records
Updating Records
  • For fixed-length records, there is no effect on the storage system
  • For variable-length records:
    • if length increases, like insertion
    • if length decreases, like deletion except tombstones are not necessary
ad