principles of database management systems n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Principles of Database Management Systems PowerPoint Presentation
Download Presentation
Principles of Database Management Systems

Loading in 2 Seconds...

play fullscreen
1 / 72

Principles of Database Management Systems - PowerPoint PPT Presentation


  • 237 Views
  • Uploaded on

Principles of Database Management Systems. Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom). Topic for today. How to represent data on disk and in memory. "Executive Summary":.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Principles of Database Management Systems' - ashtyn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
principles of database management systems

Principles of Database Management Systems

Pekka Kilpeläinen

(after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom)

Notes 3: Representing Data

slide2

Topic for today

  • How to represent data on disk and in memory

"Executive Summary":

  • Principles are rather simple, but there are lots of variations in the details

Notes 3: Representing Data

principles of data layout
Principles of Data Layout
  • Attributes of relational tuples (or objects) represented by sequences of bytes called fields
  • Fields grouped together into records
    • representation of tuples or objects
  • Records stored in blocks
  • File: collection of blocks that forms a relation (or the extent of an object class)

Notes 3: Representing Data

overview

here

Overview

Data Items

Records

Blocks

Files

Memory

Notes 3: Representing Data

what are the data items we want to store

What we have available: Bytes

8

bits

What are the data items we want to store?
  • a salary, a name
  • a date, a picture

Notes 3: Representing Data

to represent
To represent:
  • Integer (short): 2 bytes ( -32000…+32000)

e.g., 35 is

00000000

00100011

  • Integer (long): 4 bytes ( -2x109…+ 2x109)
  • Real, floating point (SQL FLOAT) 4 or 8 bytes
  • arithmetic interpretation by hardware

Notes 3: Representing Data

to represent1
To represent:
  • Characters

various coding schemes suggested,

most popular is ASCII

Example (8 bit ASCII):

A: 01000001

a: 01100001

5: 00110101

LF: 00001010

Notes 3: Representing Data

to represent2

Can we use less than 1 byte/code?

Yes, but only if desperate...

To represent:

1111 1111

  • Boolean

e.g., TRUE FALSE

00000000

  • Application specific
  • e.g., RED  1 GREEN  2
  • BLUE  3 YELLOW  4 …

Notes 3: Representing Data

to represent3
To represent:
  • Dates, e.g.:
    • Integer: # days since Jan 1, 1900
    • 8 chars: YYYYMMDD
    • 7 chars: YYYYDDD
    • 10 chars: YYYY-MM-DD (SQL2)

(not YYMMDD! Why?)

  • Time, e.g.
    • Integer: seconds since midnight
    • chars: HH:MM:SS[.FF…] (SQL2)

Notes 3: Representing Data

to represent4

c

c

a

a

t

t

To represent:
  • String of characters
    • Null terminated

e.g.,

    • Length given

e.g.,

- Fixed length

3

Notes 3: Representing Data

to represent5
To represent:
  • Bag of bits

Length

Bits

Notes 3: Representing Data

slide12

Key Point

  • Fixed length items
  • Variable length items
  • - usually length given at beginning

Notes 3: Representing Data

slide13

Also

  • Type of an item: Tells us how to

interpret

(plus size if fixed)

    • normally indicated in the record schema (see in a moment)

Notes 3: Representing Data

record collection of related fields
Record - Collection of related fields

E.g.: Employee record:

name field,

salary field,

date-of-hire field, ...

Notes 3: Representing Data

types of records
Types of records:
  • Main choices:
    • FIXED vs VARIABLE FORMAT
    • FIXED vs VARIABLE LENGTH

Notes 3: Representing Data

slide16

Fixed format

A SCHEMA (not record) contains

following information

- number of fields

- type of each field

- order in record

- meaning of each field

Notes 3: Representing Data

example fixed format and length
Example: fixed format and length

Employee record

(1) E#, 2 byte integer

(2) E.name, 10 char. Schema

(3) Dept, 2 byte code

46

F o r d

02

Records

83

J o n e s

01

Notes 3: Representing Data

slide18

Variable format

  • Record itself contains format;

“Self Describing”

Notes 3: Representing Data

example variable format and length

2

5

I

46

4

S

4

F

o

r

d

Example: variable format and length

# Fields

Code identifying

field as E#

Integer type

Code for Ename

String type

Length of str.

Field name codes could also be strings, i.e. tags ( XML as a data interchange format)

Notes 3: Representing Data

variable format useful for
Variable format useful for:
  • “sparse” records
    • e.g, patient records with thousands of possible tests
  • repeating fields
  • information integration from heterogeneous sources

Notes 3: Representing Data

slide21
EXAMPLE: var format record with

repeating fields

Employee  one or more children

3

E_name: Fred

Child: Sally

Child: Tom

Notes 3: Representing Data

slide22
Note: Repeating fields does not imply

- variable format, nor

- variable size

Fred

Sally

Tom

--

  • Key is to allocate maximum number of

repeating fields (if not used  NULL)

Notes 3: Representing Data

many variants between fixed and variable format
Many variants between fixed and variable format:

Ex. #1: Include record type in record

record type record length

tells me what

to expect

(i.e. points to schema)

5

27

. . . .

Notes 3: Representing Data

record header data at beginning that describes record
Record header - data at beginning that describes record

May contain:

- indication of record type

- record length

- time stamp

- other stuff ...

Notes 3: Representing Data

ex 2 of variant between fixed var format

E.g.: All employees have E#, name, dept;

Other fields vary.

25

Smith

Toy

2

Hobby:chess

retired

# of var

fields

Ex #2of variant between FIXED/VAR format
  • Hybrid format
    • one part is fixed, other variable

Notes 3: Representing Data

also many variations in internal organization of record
Also, many variations in internal organization of record

Just to show one: length of field

* * *

3

10

F1

5

F2

12

F3

total size

3

32

5

15

20

F1

F2

F3

0 1 2 3 4 5 15 20 32

offsets

Notes 3: Representing Data

next placing records into blocks

assume fixed

length blocks

assume a single relation (for now)

Next: placing records into blocks

blocks ...

a file

Notes 3: Representing Data

slide28

Issues in storing records in blocks:

(1) separating records

(2) spanned vs. unspanned

(3) mixed record types – clustering

(4) split records

(5) sequencing

(6) addressing records

Notes 3: Representing Data

1 separating records
(1) Separating records

R1

R2

R3

Block

(a) fixed size recs. -> no need to separate

(b) special marker

(c) give record lengths (or offsets)

- within each record

- in block header (see later)

Notes 3: Representing Data

2 spanned vs unspanned

R1

R2

R3

R4

R5

R1

R2

R3

(a)

R3

(b)

R4

R5

R6

R7

(a)

(2) Spanned vs. Unspanned
  • Unspanned: records are within one block

block 1 block 2

...

  • Spanned: records span block boundaries

block 1 block 2 ...

Notes 3: Representing Data

slide31

R1

R2

R3

(a)

R3

(b)

R4

R5

R6

R7

(a)

With spanned records:

need indication need indication

of partial record of continuation

“pointer” to rest (+ from where?)

Notes 3: Representing Data

slide32

Spanned vs. unspanned:

  • Unspanned is much simpler, but may waste space…
  • Spanned necessary if

record size > block size

(e.g., fields containing large "BLOB"s

for, say, MPEG video clips)

Notes 3: Representing Data

example of unspanned records

block 1 block 2

2050 bytes wasted 2046 2050 bytes wasted 2046

R1

R2

Example (of unspanned records)

106 records

each of size 2,050 bytes (fixed)

block size = 4096 bytes

  • Space used about 4 x 109 B, about half wasted

Notes 3: Representing Data

3 mixed record types
(3) Mixed record types
  • Mixed - records of different types

(e.g. DEPT, EMPLOYEE)

allowed in same block

e.g., a block:

Dep

d1

Emp

e1

Emp

e2

Notes 3: Representing Data

why do we want to mix answer clustering
Why do we want to mix? Answer: CLUSTERING

Records that are frequently

accessed together should be

in the same block

Notes 3: Representing Data

example
Example

Q1: select DEPT.Name, EMP.Name, …

from DEPT, EMP

where DEPT. Name =

EMP.DeptName

a block

DEPT,Name=Toy, ...

EMP,DeptName=Toy, ...

EMP,DeptName=Toy, ...

Notes 3: Representing Data

slide37
If Q1 frequent, clustering is good
  • But consider Q2:

SELECT *

FROM DEPT

If Q2 is frequent, clustering is counter- productive

Notes 3: Representing Data

4 split records
(4) Split records

Fixed part in

one block

Typically for

hybrid format

Variable part in

another block

Notes 3: Representing Data

slide39

R2 (a)

R2 (b)

R2 (c)

Blocks with variable recs.

Block with fixed recs.

R1 (a)

R1 (b)

Notes 3: Representing Data

5 sequencing
(5) Sequencing
  • Ordering records in file (and block) by some key value

Sequential file (  sequenced)

Notes 3: Representing Data

why sequencing
Why sequencing?

Typically to make it possible to efficiently read records in order

(e.g., to do a merge-join — discussed later)

Notes 3: Representing Data

sequencing options
Sequencing Options

(a) Next record physically contiguous

...

(b) Linked

R1

Next (R1)

R1

Next (R1)

Notes 3: Representing Data

sequencing options1

header

R2.1

R1.3

R4.7

Sequencing Options

(c) Overflow area

Records

in sequence

R1

R2

R3

R4

R5

Notes 3: Representing Data

6 addressing records

Many options:

Physical Indirect

(6) Addressing records
  • How does one refer to records?

Rx

Notes 3: Representing Data

purely physical
Purely Physical

Device ID

E.g., Record Cylinder #

Address = Track #

or ID Block #

Offset in block

Block ID

Notes 3: Representing Data

fully indirect
Fully Indirect

E.g., Record ID is arbitrary byte string (say, an object ID)

map table

rec ID

r address

a

Rec ID

Physical

addr.

Notes 3: Representing Data

tradeoff
Tradeoff

Flexibility Cost

to move records of indirection

(for deletions, insertions)

Notes 3: Representing Data

slide48
Physical Indirect

Many options

in between …

Notes 3: Representing Data

ex 1 indirection in block
Ex #1 Indirection in block

Header

A block: Free space

R4

R3

R2 R1

Notes 3: Representing Data

block header data at beginning that describes block
Block header - data at beginning that describes block

May contain:

- File ID (or RELATION or DB ID); the ID of this block - Record directory; Pointer to free space

- Type of block (e.g. contains records of type 4;

is overflow, …)

- Pointer to other blocks “like it” (say, if part of an index structure)

- Timestamp ...

Notes 3: Representing Data

slide51

Issues in storing records in blocks

(1) Separating records

(2) Spanned vs. Unspanned

(3) Mixed record types - Clustering

(4) Split records

(5) Sequencing

(6) Addressing records

here

Notes 3: Representing Data

slide52

Other Topics

(1) Insertion/Deletion

(2) Pointer Swizzling

(3) Comparison of Schemes

Notes 3: Representing Data

slide53

(1) Deletion

Block

Rx

Notes 3: Representing Data

options
Options:

(a) Immediately reclaim space

(b) Mark deleted

    • May need chain of deleted records

(for re-use)

  • If there are pointers to the record, either need to set them to NULL or indicate the record space reclaimed

Notes 3: Representing Data

as usual many tradeoffs
As usual, many tradeoffs...
  • How expensive is to move valid record to free space for immediate reclaim?
  • How much space is wasted?
    • e.g., deleted records, delete fields, free space chains,...

Notes 3: Representing Data

slide56

Concern with deletions

Dangling pointers

R1

?

Notes 3: Representing Data

a solution tombstones

with physical IDs:

A block

This space This space can

never re-used be re-used

A solution: Tombstones

(a) Leave “MARK” in old location

Notes 3: Representing Data

with logical ids

A Solution : Tombstones

with logical Ids:

(b) Leave “MARK” in map table

Map table

ID

LOC

Never reuse

ID 7788 nor

space in map...

7788

Notes 3: Representing Data

slide59

(1) Insert

Easy case: records not in sequence

 Insert new record at end of file, or in any block with space for it

Notes 3: Representing Data

slide60

Insert

Hard case: records in sequence

 If free space “close by”, not too bad…

- can "slide" records to preceeding/suceeding blocks

- pointers to moved records require forwarding addresses

 Or use additional blocks as an overflow area

Notes 3: Representing Data

interesting problems
Interesting problems:
  • How much free space to leave in each block, track, cylinder?
  • How often to reorganize file + overflow?

Notes 3: Representing Data

2 pointer swizzling
(2) Pointer Swizzling
  • Two kinds of record addresses:
  • DB address
    • physical location on secondary storage
  • Memory address
    • record location when loaded into (main or virtual) memory
  • Which one to use, and when?

Notes 3: Representing Data

pointer swizzling

block 2

RecA

Pointer Swizzling

Memory Disk

block 1

block 1

block 2

RecA

Notes 3: Representing Data

slide64

One Option:

Translation DB Addr Mem Addr

Table Rec-A Rec-A-inMem

  • For all records copied to memory

Notes 3: Representing Data

in memory pointers need type bit
In memory pointers - need “type” bit

Another Option:

to disk

to memory

M

Notes 3: Representing Data

swizzling
Swizzling
  • Automatic ("eager")
  • On-demand ("lazy")
  • No swizzling / program control

Notes 3: Representing Data

slide67

Comparison

  • There are about 10,000,000 ways to organize my data on disk…

Which one is right for me?

Notes 3: Representing Data

issues
Issues:

Flexibility Space Utilization

Complexity Performance

Notes 3: Representing Data

slide69
To evaluate a given strategy, estimate following parameters:

-> space used for expected data

-> expected time to

- fetch record given its key

- fetch record with next key

- insert/delete/update record

- scan all records in the file

- reorganize file

Notes 3: Representing Data

example1
Example

How would you design Megatron 3000 storage system? (for a relational DB, low end)

  • Variable length records?
  • Spanned?
  • What data types?
  • Fixed format?
  • Record IDs?
  • Sequencing?
  • How to handle deletions?

Notes 3: Representing Data

slide71

Summary

  • How to lay out data on disk

Data Items

Records

Blocks

Files

Memory

DBMS

Notes 3: Representing Data

slide72

Next

How to find a record quickly,

given a key

Notes 3: Representing Data