Dr alwyn barry dr joanna bryson
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

CM20145 Further DB Design – Normalization PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on
  • Presentation posted in: General

Dr Alwyn Barry Dr Joanna Bryson. CM20145 Further DB Design – Normalization. Last Time. Database design is an ongoing, iterative process. Requirements come from data, user demands, design issues. Change occurs: Corporations & technologies grow. Programmers & users learn. Views / security.

Download Presentation

CM20145 Further DB Design – Normalization

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Dr alwyn barry dr joanna bryson

Dr Alwyn Barry

Dr Joanna Bryson

CM20145Further DB Design –Normalization


Last time

Last Time

  • Database design is an ongoing, iterative process.

    • Requirements come from data, user demands, design issues.

    • Change occurs:

      • Corporations & technologies grow.

      • Programmers & users learn.

  • Views / security.

  • Lossless-join decomposition

    Now: Science for improving design.


Design process normalization

Design Process & Normalization

  • We assume a schema R is given.

    • R could have been generated when converting E-R diagram to a set of tables.

    • R could have been a single relation containing all attributes that are of interest (called universal relation).

  • Normalization breaks R into smaller relations.

    R could be the result of any ad hoc design of relations, which we then test & convert to normal form.


Overview

Overview

  • First Normal Form.

  • Functional Dependencies.

  • Second Normal Form.

  • Third Normal Form.

  • Boyce-Codd Normal Form.

  • Fourth Normal Form.

  • Fifth Normal Form.

  • Domain Key / Normal Form.

  • Design Process & Problems.


First normal form 1nf

First Normal Form – 1NF

  • You aren’t supposed to have more than one value per attribute of a tuple.

  • All tuples have the same number of attributes.

  • Necessary for a relational database.

BAD


Getting caught out with 1nf

Getting Caught Out With 1NF

  • A domain is atomic if its elements are considered to be indivisible units.

    • Examples of non-atomic domains:

      • Set-valued attributes, composite attributes.

      • Identifiers like CS101 that can be broken up into parts.

  • A relational schema R is in first normal form if the domains of all attributes of R are atomic.

  • Non-atomic values:

    • complicate storage,

    • encourage redundancy,

    • Depend on interpretation built into application programs.


Are you atomic

Are You Atomic?

  • Atomicity is not an intrinsic property of the elements of the domain.

  • Atomicity is a property of how the elements of the domain are used.

    • E.g. strings containing a possible delimiter (here: a space)

      • cities = “Melbourne Sydney” (non-atomic: space separated list)

      • surname = “Fortescue Smythe” (atomic: compound surname)

    • E.g. strings encoding two separate fields

      • bucs_login = cssjjb

      • If the first two characters are extracted to find the department, the domain bucs_login is not atomic.

      • Leads to encoding of information in application program rather than in the database.


Second normal form 2nf

Second Normal Form (2NF)

  • Violated when a nonkey column is a fact about part of the primary key.

  • A column is not fully functionally dependent on the full primary key.

    • CUSTOMER-CREDIT in this case:

From Watson

BAD

FIX

ITEM

*itemid

ORDER

quantity

CUSTOMER

*customerid

customer-credit


Def functional dependency

Def: Functional Dependency

  • Let R be a relation schema

      R and   R

  • The functional dependency (FD)  holds on R (“ is FD on ”) iff for any legal relations r(R):

    • whenever any two tuples t1and t2 of r agree on the attributes 

    • they also agree on the attributes .

    • i.e. (t1) = (t2)   (t1) =  (t2)

  • Example: Consider r(A,B) with the following instance of r:

  • AB does NOT hold, but BA does hold


Functional dependencies uses

Functional Dependencies: Uses

  • Way to encode “business rules”.

  • Specify constraints on the set of legal relations.

    • We say that Fholds onR if all legal relations on R satisfy the set of FDs F.

  • Test relations to see if they are legal under a given set of FDs.

    • If a relation r is legal under a set F of FDs, we say that rsatisfies F.


Functional dependencies

Functional Dependencies

  • An FD is an assertion about a schema, not an instance.

  • If we only consider an instance or a few instances, we can’t tell if an FD holds.

    • Inspecting only a few bird species (e.g. crows, cardinals and canaries) we might conclude colour  species.

    • However, this would be a bad FD as we would find out if we found some ravens.

  • Thus, identifying FDs is part of the data modelling process.


Trivial functional dependencies

Trivial Functional Dependencies

  • An FD is trivial if it is satisfied by all instances of a relation

    • E.g.

      • customer-name, loan-number customer-name

      • customer-name customer-name

    • In general,   is trivial if   

  • Permitting such FDs makes certain definitions and algorithms easier to state.


Functional dependency vs key

Functional Dependency vs Key

  • FDs can express the same constraints we could express using keys:

  • Superkeys:

    • K is a superkey for relation schema R if and only if K R

  • Candidate keys:

    • K is a candidate key for R if and only if

      • K R, and

      • there is no K’  K such that K’ R

  • Of course, which candidate key becomes the primary key is arbitrary.


Fds keys

FDs <> Keys

  • FDs can represent more information than keys can on their own.

  • Consider the following Loan-info-schema:

    Loan-info-schema = (customer-name, loan-number,branch-name, amount).

    We expect these FDs to hold:

    loan-numberamountloan-number  branch-name

    We could try to express this by making loan-number the key,however the following FD does not hold:

    loan-number customer-name

  • Incidentally, this isn’t a very good table! (¬2NF)


Fd closure

FD Closure

  • Given a set F of FDs, other FDs are logically implied.

    • E.g. If A  B and B  C, we can infer that A  C

  • The set of all FDs implied by F is the closure of F, written F+ .

  • Find F+by applying Armstrong’s Axioms:

    • if   , then   (reflexivity)

    • if  , then    (augmentation)

    • if  , and   , then   (transitivity)

  • Additional rules (derivable from Armstrong’s Axioms):

    • If   and  holds, then    holds (union)

    • If    holds, then   holds and  holds (decomposition)

    • If   holds and   holds, then   holds (pseudotransitivity)


Bad decomposition example from last time

Bad Decomposition Example(From Last Time)

  • A Non Lossless-Join Decomposition

    R = (A, B) R1 = (A), R2 = (B)

A

B

A

B

A

B

1

2

1

2

1

2

1

1

2

B(r)

A(r)

r

A (r) ⋈ B (r)

  • Thus, r is different to A (r) ⋈ B (r)

  • So A,B is not a lossless-join decomposition of R.


Fds lossless decomposition

FDs & Lossless Decomposition

  • All attributes of an original schema (R) must appear in the decomposition (R1, R2):

    R = R1  R2

  • Lossless-join decomposition.For all possible relations r on schema R

    r = R1 (r) ⋈ R2 (r)

  • A decomposition of R into R1 and R2 is lossless-join if and only if at least one of the following dependencies is in F+:

    • R1 R2R1

    • R1 R2R2


Second normal form 2nf1

Second Normal Form (2NF)

  • Violated when a nonkey column is a fact about part of the primary key.

  • A column is not fully functionally dependent on the full primary key.

    • CUSTOMER-CREDIT in this case:

From Watson

BAD

FIX

ITEM

*itemid

ORDER

quantity

CUSTOMER

*customerid

customer-credit


Third normal form 3nf

Third Normal Form (3NF)

  • Violated when a nonkey column is a fact about another nonkey column.

  • A column is not fully functionally dependent on the primary key.

  • R is 3NF iff R is 2NF and has no transitive dependencies.

    • EXCHANGE RATE violates this.

FIX

BAD


Boyce codd bcnf

Boyce-Codd (BCNF)

  • Arises when a table:

    • has multiple candidate keys,

    • the candidate keys are composite,

    • the candidate keys overlap.

  • R is BCNF iff every determinant is a cand. key.

    • E.g. Assume one consultant per problem per client, and one problem per consultant.

    • If client-problem is the primary key, how do you add a new consultant?

  • Like 3NF but now worry about all fields.

BAD

FIX


Design goals their discontents

Design Goals & their discontents

  • Goals for a relational database design:

    • eliminate redundancies by decomposing relations,

    • must be able to recover original data using lossless joins,

    • prefer not to loose dependencies.

  • BCNF:

    • no redundancies,

    • no guarantee of dependency preservation.

  • 3NF:

    • dependency preservation,

    • but possible redundancies.


Fourth normal form 4nf

Fourth normal form (4NF)

  • A row should not contain two or more independent multivalued facts.

  • 4NF iff BCNF & no non-trivial multi-valued dependencies.

  • Multivalued dependency means the value of one attributed determines a set of values for another.

BAD

FIX


Fifth normal form 5nf

Fifth normal form (5NF)

  • 5NF iff a relation has no join dependency.

  • The schemas R1, R2,.., Rn have a join dependency over R if they define a lossless-join decomposition over R.

  • This is way too complicated, don’t worry about it.


Domain key normal form

Domain Key Normal Form

  • Every constraint on the relation must be a logical consequence of the domain constraints and the key constraints that apply to the relation.

    • Key: unique identifier.

    • Constraint: rule governing attribute values.

    • Domain: set of values of the same data type.

  • No known algorithm gives DK/NF.


E r model and normalization

E-R Model and Normalization

  • When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated should not need further normalization.

  • However, in a real (imperfect) design there can be FDs from non-key attributes of an entity to other attributes of the entity.

  • The keys identified in E-R diagrams might not be minimal - FDs can help us to identify minimal keys.

  • FDs from non-key attributes of a relationship set are possible, but rare.


Denormalization performance

Denormalization & Performance

  • May want to use non-normalized schema for performance.

    • E.g. displaying customer-name along with account-number and balance requires join of account with depositor.

  • Alternative 1: Use denormalized relation containing attributes of account as well as depositor.

    • faster lookup.

    • extra space and extra execution time for updates.

    • extra coding work for programmer and possibility of error in extra code.

  • Alternative 2: use a materialized view defined as account ⋈ depositor

    • as above, except less extra coding, errors.


Limits of normalization

Limits of Normalization

  • Examples of bad database design, not caught by normalization.

  • Good:

    • earnings(company-id, year, amount)

  • Bad:

    • earnings-2000, earnings-2001, earnings-2002, etc., on (company-id, earnings)

      • all are BCNF, but querying across years difficult

      • needs a new table each year

    • company-year(company-id, earnings-2000,earnings-2001, earnings-2002)

      • in BCNF, but querying across years difficult

      • requires new attribute each year


Summary 1 rules to watch

Summary 1 – Rules to Watch

  • 1NF: attributes not atomic.

  • 2NF: non-key attribute FD on part of key.

  • 3NF: one non-key attribute FD on another.

  • Boyce-Codd NF: overlapping but otherwise independent candidate keys.

  • 4NF: multiple, independent multi-valued attributes.

  • 5NF: join dependency.

  • Domain Key / NF: all constraints either domain or key


Summary 2 concepts

Summary 2 – Concepts

  • Functional Dependencies:

    • Axioms & Closure.

  • Lossless-join decomposition.

  • Design Process.

  • Normalization Problems.

    Next: Interfaces and Architectures


Reading exercises

Reading & Exercises

  • Reading

    • Connolly & Begg Chapter (13, 14)

    • Silberschatz Chapters 7.

    • Any other book, the design/normalization chapter.

  • Exercises:

    • Silberschatz

      • 7.1, 7.2, 7.16, 7.23, 7.24, 7.27-29


Next week

Next Week

  • Architectures and Implementations

  • Integrity and Security


Cm20145 further db design normalization

Slides after and including this one you are not responsible for, but I am saving in case I decide to use them in the future.


Goal formalize good design

Goal: Formalize “Good Design”

  • Process:

    • Decide whether a particular relation R is in “good” form.

    • In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that:

      • each relation is in good form,

      • the decomposition is a lossless-join decomposition.

  • Theory:

    • Constraints on the set of legal relations.

    • Require that the value for a certain set of attributes determines uniquely the value for another set of attributes – functional dependencies.


  • Login