physical database design n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Physical Database Design PowerPoint Presentation
Download Presentation
Physical Database Design

Loading in 2 Seconds...

play fullscreen
1 / 36

Physical Database Design - PowerPoint PPT Presentation


  • 143 Views
  • Uploaded on

Physical Database Design. University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management. Review . Normalization Denormalization. Normalization. Unnormalized Relations. First normal form.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Physical Database Design' - nemesio


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
physical database design

Physical Database Design

University of California, Berkeley

School of Information Management and Systems

SIMS 257: Database Management

SIMS 257: Database Management

review
Review
  • Normalization
  • Denormalization

SIMS 257: Database Management

normalization
Normalization

Unnormalized Relations

First normal form

Functional dependencyof nonkey attributes on the primary key - Atomic values only

Second normal form

No transitive dependency between nonkey attributes

Third normal form

Boyce-

Codd and

Higher

Full Functional dependencyof nonkey attributes on the primary key

All determinants are candidate keys - Single multivalued dependency

SIMS 257: Database Management

normalization1
Normalization
  • Normalization is performed to reduce or eliminate Insertion, Deletion or Update anomalies.
  • However, a completely normalized database may not be the most efficient or effective implementation.
  • “Denormalization” is sometimes used to improve efficiency.

SIMS 257: Database Management

today
Physical Database Design

Access Methods

Indexes

Today

Based on McFadden Modern Database Management

and Atre Database:Structured Techniques for Design,

Performance and Management

SIMS 257: Database Management

database design process
Database Design Process

Application 1

Application 2

Application 3

Application 4

External

Model

External

Model

External

Model

External

Model

Application 1

Conceptual

requirements

Application 2

Conceptual

requirements

Conceptual

Model

Logical

Model

Internal Model

Application 3

Conceptual

requirements

Application 4

Physical

Design

Conceptual

requirements

SIMS 257: Database Management

physical database design1
Physical Database Design
  • Many physical database design decisions are implicit in the technology adopted
    • Also, organizations may have standards or an “information architecture” that specifies operating systems, DBMS, and data access languages -- thus constraining the range of possible physical implementations.
  • We will be concerned with some of the possible physical implementation issues

SIMS 257: Database Management

physical database design2
Physical Database Design
  • The primary goal of physical database design is data processing efficiency
  • We will concentrate on choices often available to optimize performance of database services
  • Physical Database Design requires information gathered during earlier stages of the design process

SIMS 257: Database Management

physical design information
Physical Design Information
  • Information needed for physical file and database design includes:
    • Normalized relations plus size estimates for them
    • Definitions of each attribute
    • Descriptions of where and when data are used
      • entered, retrieved, deleted, updated, and how often
    • Expectations and requirements for response time, and data security, backup, recovery, retention and integrity
    • Descriptions of the technologies used to implement the database

SIMS 257: Database Management

physical design decisions
Physical Design Decisions
  • There are several critical decisions that will affect the integrity and performance of the system.
    • Storage Format
    • Physical record composition
    • Data arrangement
    • Indexes
    • Query optimization and performance tuning

SIMS 257: Database Management

storage format
Storage Format
  • Choosing the storage format of each field (attribute). The DBMS provides some set of data types that can be used for the physical storage of fields in the database
  • Data Type (format) is chosen to minimize storage space and maximize data integrity

SIMS 257: Database Management

objectives of data type selection
Objectives of data type selection
  • Minimize storage space
  • Represent all possible values
  • Improve data integrity
  • Support all data manipulations
  • The correct data type should, in minimal space, represent every possible value (but eliminated illegal values) for the associated attribute and can support the required data manipulations (e.g. numerical or string operations)

SIMS 257: Database Management

access data types
Access Data Types
  • Numeric (1, 2, 4, 8 bytes, fixed or float)
  • Text (255 max)
  • Memo (64000 max)
  • Date/Time (8 bytes)
  • Currency (8 bytes, 15 digits + 4 digits decimal)
  • Autonumber (4 bytes)
  • Yes/No (1 bit)
  • OLE (limited only by disk space)
  • Hyperlinks (up to 64000 chars)

SIMS 257: Database Management

access numeric types
Access Numeric types
  • Byte
        • Stores numbers from 0 to 255 (no fractions). 1 byte
  • Integer
        • Stores numbers from –32,768 to 32,767 (no fractions) 2 bytes
  • Long Integer(Default)
        • Stores numbers from –2,147,483,648 to 2,147,483,647 (no fractions). 4 bytes
  • Single
        • Stores numbers from -3.402823E38 to –1.401298E–45 for negative values and from 1.401298E–45 to 3.402823E38 for positive values. 4 bytes
  • Double
        • Stores numbers from –1.79769313486231E308 to –4.94065645841247E–324 for negative values and from 1.79769313486231E308 to 4.94065645841247E–324 for positive values. 15 8 bytes
  • Replication ID
        • Globally unique identifier (GUID) N/A 16 bytes

SIMS 257: Database Management

controlling data integrity
Controlling Data Integrity
  • Default values
  • Range control
  • Null value control
  • Referential integrity
  • Handling missing data

SIMS 257: Database Management

designing physical records
Designing Physical Records
  • A physical record is a group of fields stored in adjacent memory locations and retrieved together as a unit
  • Fixed Length and variable fields
  • Denormalization

SIMS 257: Database Management

denormalization
Denormalization
  • Usually driven by the need to improve query speed
  • Query speed is improved at the expense of more complex or problematic DML (Data manipulation language) for updates, deletions and insertions.

SIMS 257: Database Management

downward denormalization

Customer

ID

Address

Name

Telephone

Order

Order No

Date Taken

Date Dispatched

Date Invoiced

Cust ID

Order

Order No

Date Taken

Date Dispatched

Date Invoiced

Cust ID

Cust Name

Customer

ID

Address

Name

Telephone

Downward Denormalization

Before:

After:

SIMS 257: Database Management

upward denormalization

Order

Order No

Date Taken

Date Dispatched

Date Invoiced

Cust ID

Cust Name

Order

Order No

Date Taken

Date Dispatched

Date Invoiced

Cust ID

Cust Name

Order Price

Order Item

Order No

Item No

Item Price

Num Ordered

Order Item

Order No

Item No

Item Price

Num Ordered

Upward Denormalization

SIMS 257: Database Management

designing physical files
Designing Physical Files
  • Overview
  • terminology
  • Access methods

SIMS 257: Database Management

physical design

DBMS

External Model

Internal Model

Access Methods

Physical Design
  • Internal Model/Physical Model

User request

Interface 1

Interface 2

Operating

System

Access Methods

Interface 3

Data

Base

SIMS 257: Database Management

physical design1
Physical Design
  • Interface 1: User request to the DBMS. The user presents a query, the DBMS determines which physical DBs are needed to resolve the query
  • Interface 2: The DBMS uses an internal model access method to access the data stored in a logical database.
  • Interface 3: The internal model access methods and OS access methods access the physical records of the database.

SIMS 257: Database Management

physical file design
Physical File Design
  • A Physical file is a portion of secondary storage (disk space) allocated for the purpose of storing physical records
  • Pointers - a field of data that can be used to locate a related field or record of data
  • Access Methods - An operating system algorithm for storing and locating data in secondary storage
  • Pages - The amount of data read or written in one disk input or output operation

SIMS 257: Database Management

internal model access methods
Internal Model Access Methods
  • Many types of access methods:
    • Physical Sequential
    • Indexed Sequential
    • Indexed Random
    • Inverted
    • Direct
    • Hashed
  • Differences in
    • Access Efficiency
    • Storage Efficiency

SIMS 257: Database Management

physical sequential
Physical Sequential
  • Key values of the physical records are in logical sequence
  • Main use is for “dump” and “restore”
  • Access method may be used for storage as well as retrieval
  • Storage Efficiency is near 100%
  • Access Efficiency is poor

SIMS 257: Database Management

indexed sequential
Indexed Sequential
  • Key values of the physical records are in logical sequence
  • Access method may be used for storage and retrieval
  • Index of key values is maintained with entries for the highest key values per block(s)
  • Access Efficiency depends on the levels of index, storage allocated for index, number of database records, and amount of overflow
  • Storage Efficiency depends on size of index and volatility of database

SIMS 257: Database Management

index sequential

Adams

Becker

Dumpling

Actual

Value

Address

Block

Number

Dumpling

Harty

Texaci

...

1

2

3

Getta

Harty

Mobile

Sunoci

Texaci

Index Sequential

Data File

Block 1

Block 2

Block 3

SIMS 257: Database Management

indexed sequential two levels

Key

Value

Address

Key

Value

Key

Value

Key

Value

Address

Address

Address

785

805

536

678

150

385

385

678

805

3

4

5

6

7

8

9

1

2

Indexed Sequential: Two Levels

001

003

.

.

150

251

.

.

385

455

480

.

.

536

605

610

.

.

678

705

710

.

.

785

791

.

.

805

SIMS 257: Database Management

indexed random
Indexed Random
  • Key values of the physical records are not necessarily in logical sequence
  • Index may be stored and accessed with Indexed Sequential Access Method
  • Index has an entry for every data base record. These are in ascending order. The index keys are in logical sequence. Database records are not necessarily in ascending sequence.
  • Access method may be used for storage and retrieval

SIMS 257: Database Management

indexed random1
Indexed Random

Becker

Harty

Actual

Value

Address

Block

Number

Adams

Becker

Dumpling

Getta

Harty

2

1

3

2

1

Adams

Getta

Dumpling

SIMS 257: Database Management

btree
Btree

F | | P | | Z |

B | | D | | F |

H | | L | | P |

R | | S | | Z |

Devils

Hawkeyes

Hoosiers

Minors

Panthers

Seminoles

Aces

Boilers

Cars

Flyers

SIMS 257: Database Management

inverted
Inverted
  • Key values of the physical records are not necessarily in logical sequence
  • Access Method is better used for retrieval
  • An index for every field to be inverted may be built
  • Access efficiency depends on number of database records, levels of index, and storage allocated for index

SIMS 257: Database Management

inverted1

Student

name

Course

Number

CH 145

101, 103,104

Adams

Becker

Dumpling

Getta

Harty

Mobile

Actual

Value

Address

Block

Number

CH 145

CS 201

CS 623

PH 345

1

2

3

CS 201

102

CS 623

105, 106

Inverted

CH145

cs201

ch145

ch145

cs623

cs623

SIMS 257: Database Management

direct
Direct
  • Key values of the physical records are not necessarily in logical sequence
  • There is a one-to-one correspondence between a record key and the physical address of the record
  • May be used for storage and retrieval
  • Access efficiency always 1
  • Storage efficiency depends on density of keys
  • No duplicate keys permitted

SIMS 257: Database Management

hashing
Hashing
  • Key values of the physical records are not necessarily in logical sequence
  • Many key values may share the same physical address (block)
  • May be used for storage and retrieval
  • Access efficiency depends on distribution of keys, algorithm for key transformation and space allocated
  • Storage efficiency depends on distibution of keys and algorithm used for key transformation

SIMS 257: Database Management

comparative access methods
Comparative Access Methods

Factor

Storage space

Sequential

retrieval on

primary key

Random Retr.

Multiple Key

Retr.

Deleting records

Adding records

Updating records

Sequential

No wasted space

Very fast

Impractical

Possible but needs

a full scan

can create wasted

space

requires rewriting

file

usually requires

rewriting file

Indexed

No wasted

space for data

but extra space for index

Moderately Fast

Moderately Fast

Very fast with

multiple indexes

OK if dynamic

OK if dynamic

Easy but requires

Maintenance of

indexes

Hashed

more space needed for

addition and deletion of

records after initial load

Impractical

Very fast

Not possible

very easy

very easy

very easy

SIMS 257: Database Management