Physical database design
This presentation is the property of its rightful owner.
Sponsored Links
1 / 36

Physical Database Design PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on
  • Presentation posted in: General

Physical Database Design. University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management. Review. Normalization Denormalization. Normalization. Unnormalized Relations. First normal form.

Download Presentation

Physical Database Design

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Physical database design

Physical Database Design

University of California, Berkeley

School of Information Management and Systems

SIMS 257: Database Management

SIMS 257: Database Management


Review

Review

  • Normalization

  • Denormalization

SIMS 257: Database Management


Normalization

Normalization

Unnormalized Relations

First normal form

Functional dependencyof nonkey attributes on the primary key - Atomic values only

Second normal form

No transitive dependency between nonkey attributes

Third normal form

Boyce-

Codd and

Higher

Full Functional dependencyof nonkey attributes on the primary key

All determinants are candidate keys - Single multivalued dependency

SIMS 257: Database Management


Normalization1

Normalization

  • Normalization is performed to reduce or eliminate Insertion, Deletion or Update anomalies.

  • However, a completely normalized database may not be the most efficient or effective implementation.

  • “Denormalization” is sometimes used to improve efficiency.

SIMS 257: Database Management


Today

Physical Database Design

Access Methods

Indexes

Today

Based on McFadden Modern Database Management

and Atre Database:Structured Techniques for Design,

Performance and Management

SIMS 257: Database Management


Database design process

Database Design Process

Application 1

Application 2

Application 3

Application 4

External

Model

External

Model

External

Model

External

Model

Application 1

Conceptual

requirements

Application 2

Conceptual

requirements

Conceptual

Model

Logical

Model

Internal Model

Application 3

Conceptual

requirements

Application 4

Physical

Design

Conceptual

requirements

SIMS 257: Database Management


Physical database design1

Physical Database Design

  • Many physical database design decisions are implicit in the technology adopted

    • Also, organizations may have standards or an “information architecture” that specifies operating systems, DBMS, and data access languages -- thus constraining the range of possible physical implementations.

  • We will be concerned with some of the possible physical implementation issues

SIMS 257: Database Management


Physical database design2

Physical Database Design

  • The primary goal of physical database design is data processing efficiency

  • We will concentrate on choices often available to optimize performance of database services

  • Physical Database Design requires information gathered during earlier stages of the design process

SIMS 257: Database Management


Physical design information

Physical Design Information

  • Information needed for physical file and database design includes:

    • Normalized relations plus size estimates for them

    • Definitions of each attribute

    • Descriptions of where and when data are used

      • entered, retrieved, deleted, updated, and how often

    • Expectations and requirements for response time, and data security, backup, recovery, retention and integrity

    • Descriptions of the technologies used to implement the database

SIMS 257: Database Management


Physical design decisions

Physical Design Decisions

  • There are several critical decisions that will affect the integrity and performance of the system.

    • Storage Format

    • Physical record composition

    • Data arrangement

    • Indexes

    • Query optimization and performance tuning

SIMS 257: Database Management


Storage format

Storage Format

  • Choosing the storage format of each field (attribute). The DBMS provides some set of data types that can be used for the physical storage of fields in the database

  • Data Type (format) is chosen to minimize storage space and maximize data integrity

SIMS 257: Database Management


Objectives of data type selection

Objectives of data type selection

  • Minimize storage space

  • Represent all possible values

  • Improve data integrity

  • Support all data manipulations

  • The correct data type should, in minimal space, represent every possible value (but eliminated illegal values) for the associated attribute and can support the required data manipulations (e.g. numerical or string operations)

SIMS 257: Database Management


Access data types

Access Data Types

  • Numeric (1, 2, 4, 8 bytes, fixed or float)

  • Text (255 max)

  • Memo (64000 max)

  • Date/Time (8 bytes)

  • Currency (8 bytes, 15 digits + 4 digits decimal)

  • Autonumber (4 bytes)

  • Yes/No (1 bit)

  • OLE (limited only by disk space)

  • Hyperlinks (up to 64000 chars)

SIMS 257: Database Management


Access numeric types

Access Numeric types

  • Byte

    • Stores numbers from 0 to 255 (no fractions). 1 byte

  • Integer

    • Stores numbers from –32,768 to 32,767 (no fractions) 2 bytes

  • Long Integer(Default)

    • Stores numbers from –2,147,483,648 to 2,147,483,647 (no fractions). 4 bytes

  • Single

    • Stores numbers from -3.402823E38 to –1.401298E–45 for negative values and from 1.401298E–45 to 3.402823E38 for positive values.4 bytes

  • Double

    • Stores numbers from –1.79769313486231E308 to –4.94065645841247E–324 for negative values and from 1.79769313486231E308 to 4.94065645841247E–324 for positive values.158 bytes

  • Replication ID

    • Globally unique identifier (GUID)N/A16 bytes

  • SIMS 257: Database Management


    Controlling data integrity

    Controlling Data Integrity

    • Default values

    • Range control

    • Null value control

    • Referential integrity

    • Handling missing data

    SIMS 257: Database Management


    Designing physical records

    Designing Physical Records

    • A physical record is a group of fields stored in adjacent memory locations and retrieved together as a unit

    • Fixed Length and variable fields

    • Denormalization

    SIMS 257: Database Management


    Denormalization

    Denormalization

    • Usually driven by the need to improve query speed

    • Query speed is improved at the expense of more complex or problematic DML (Data manipulation language) for updates, deletions and insertions.

    SIMS 257: Database Management


    Downward denormalization

    Customer

    ID

    Address

    Name

    Telephone

    Order

    Order No

    Date Taken

    Date Dispatched

    Date Invoiced

    Cust ID

    Order

    Order No

    Date Taken

    Date Dispatched

    Date Invoiced

    Cust ID

    Cust Name

    Customer

    ID

    Address

    Name

    Telephone

    Downward Denormalization

    Before:

    After:

    SIMS 257: Database Management


    Upward denormalization

    Order

    Order No

    Date Taken

    Date Dispatched

    Date Invoiced

    Cust ID

    Cust Name

    Order

    Order No

    Date Taken

    Date Dispatched

    Date Invoiced

    Cust ID

    Cust Name

    Order Price

    Order Item

    Order No

    Item No

    Item Price

    Num Ordered

    Order Item

    Order No

    Item No

    Item Price

    Num Ordered

    Upward Denormalization

    SIMS 257: Database Management


    Designing physical files

    Designing Physical Files

    • Overview

    • terminology

    • Access methods

    SIMS 257: Database Management


    Physical design

    DBMS

    External Model

    Internal Model

    Access Methods

    Physical Design

    • Internal Model/Physical Model

    User request

    Interface 1

    Interface 2

    Operating

    System

    Access Methods

    Interface 3

    Data

    Base

    SIMS 257: Database Management


    Physical design1

    Physical Design

    • Interface 1: User request to the DBMS. The user presents a query, the DBMS determines which physical DBs are needed to resolve the query

    • Interface 2: The DBMS uses an internal model access method to access the data stored in a logical database.

    • Interface 3: The internal model access methods and OS access methods access the physical records of the database.

    SIMS 257: Database Management


    Physical file design

    Physical File Design

    • A Physical file is a portion of secondary storage (disk space) allocated for the purpose of storing physical records

    • Pointers - a field of data that can be used to locate a related field or record of data

    • Access Methods - An operating system algorithm for storing and locating data in secondary storage

    • Pages - The amount of data read or written in one disk input or output operation

    SIMS 257: Database Management


    Internal model access methods

    Internal Model Access Methods

    • Many types of access methods:

      • Physical Sequential

      • Indexed Sequential

      • Indexed Random

      • Inverted

      • Direct

      • Hashed

    • Differences in

      • Access Efficiency

      • Storage Efficiency

    SIMS 257: Database Management


    Physical sequential

    Physical Sequential

    • Key values of the physical records are in logical sequence

    • Main use is for “dump” and “restore”

    • Access method may be used for storage as well as retrieval

    • Storage Efficiency is near 100%

    • Access Efficiency is poor

    SIMS 257: Database Management


    Indexed sequential

    Indexed Sequential

    • Key values of the physical records are in logical sequence

    • Access method may be used for storage and retrieval

    • Index of key values is maintained with entries for the highest key values per block(s)

    • Access Efficiency depends on the levels of index, storage allocated for index, number of database records, and amount of overflow

    • Storage Efficiency depends on size of index and volatility of database

    SIMS 257: Database Management


    Index sequential

    Adams

    Becker

    Dumpling

    Actual

    Value

    Address

    Block

    Number

    Dumpling

    Harty

    Texaci

    ...

    1

    2

    3

    Getta

    Harty

    Mobile

    Sunoci

    Texaci

    Index Sequential

    Data File

    Block 1

    Block 2

    Block 3

    SIMS 257: Database Management


    Indexed sequential two levels

    Key

    Value

    Address

    Key

    Value

    Key

    Value

    Key

    Value

    Address

    Address

    Address

    785

    805

    536

    678

    150

    385

    385

    678

    805

    3

    4

    5

    6

    7

    8

    9

    1

    2

    Indexed Sequential: Two Levels

    001

    003

    .

    .

    150

    251

    .

    .

    385

    455

    480

    .

    .

    536

    605

    610

    .

    .

    678

    705

    710

    .

    .

    785

    791

    .

    .

    805

    SIMS 257: Database Management


    Indexed random

    Indexed Random

    • Key values of the physical records are not necessarily in logical sequence

    • Index may be stored and accessed with Indexed Sequential Access Method

    • Index has an entry for every data base record. These are in ascending order. The index keys are in logical sequence. Database records are not necessarily in ascending sequence.

    • Access method may be used for storage and retrieval

    SIMS 257: Database Management


    Indexed random1

    Indexed Random

    Becker

    Harty

    Actual

    Value

    Address

    Block

    Number

    Adams

    Becker

    Dumpling

    Getta

    Harty

    2

    1

    3

    2

    1

    Adams

    Getta

    Dumpling

    SIMS 257: Database Management


    Btree

    Btree

    F | | P | | Z |

    B | | D | | F |

    H | | L | | P |

    R | | S | | Z |

    Devils

    Hawkeyes

    Hoosiers

    Minors

    Panthers

    Seminoles

    Aces

    Boilers

    Cars

    Flyers

    SIMS 257: Database Management


    Inverted

    Inverted

    • Key values of the physical records are not necessarily in logical sequence

    • Access Method is better used for retrieval

    • An index for every field to be inverted may be built

    • Access efficiency depends on number of database records, levels of index, and storage allocated for index

    SIMS 257: Database Management


    Inverted1

    Student

    name

    Course

    Number

    CH 145

    101, 103,104

    Adams

    Becker

    Dumpling

    Getta

    Harty

    Mobile

    Actual

    Value

    Address

    Block

    Number

    CH 145

    CS 201

    CS 623

    PH 345

    1

    2

    3

    CS 201

    102

    CS 623

    105, 106

    Inverted

    CH145

    cs201

    ch145

    ch145

    cs623

    cs623

    SIMS 257: Database Management


    Direct

    Direct

    • Key values of the physical records are not necessarily in logical sequence

    • There is a one-to-one correspondence between a record key and the physical address of the record

    • May be used for storage and retrieval

    • Access efficiency always 1

    • Storage efficiency depends on density of keys

    • No duplicate keys permitted

    SIMS 257: Database Management


    Hashing

    Hashing

    • Key values of the physical records are not necessarily in logical sequence

    • Many key values may share the same physical address (block)

    • May be used for storage and retrieval

    • Access efficiency depends on distribution of keys, algorithm for key transformation and space allocated

    • Storage efficiency depends on distibution of keys and algorithm used for key transformation

    SIMS 257: Database Management


    Comparative access methods

    Comparative Access Methods

    Factor

    Storage space

    Sequential

    retrieval on

    primary key

    Random Retr.

    Multiple Key

    Retr.

    Deleting records

    Adding records

    Updating records

    Sequential

    No wasted space

    Very fast

    Impractical

    Possible but needs

    a full scan

    can create wasted

    space

    requires rewriting

    file

    usually requires

    rewriting file

    Indexed

    No wasted

    space for data

    but extra space for index

    Moderately Fast

    Moderately Fast

    Very fast with

    multiple indexes

    OK if dynamic

    OK if dynamic

    Easy but requires

    Maintenance of

    indexes

    Hashed

    more space needed for

    addition and deletion of

    records after initial load

    Impractical

    Very fast

    Not possible

    very easy

    very easy

    very easy

    SIMS 257: Database Management


  • Login