Dr alwyn barry dr joanna bryson
1 / 33

- PowerPoint PPT Presentation

  • Updated On :

Dr Alwyn Barry Dr Joanna Bryson. CM20145 Further DB Design – Normalization. Last Time. Database design is an ongoing, iterative process. Requirements come from data, user demands, design issues. Change occurs: Corporations & technologies grow. Programmers & users learn. Views / security.

Related searches for

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about '' - nieve

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Dr alwyn barry dr joanna bryson

Dr Alwyn Barry

Dr Joanna Bryson

CM20145Further DB Design –Normalization

Last time
Last Time

  • Database design is an ongoing, iterative process.

    • Requirements come from data, user demands, design issues.

    • Change occurs:

      • Corporations & technologies grow.

      • Programmers & users learn.

  • Views / security.

  • Lossless-join decomposition

    Now: Science for improving design.

Design process normalization
Design Process & Normalization

  • We assume a schema R is given.

    • R could have been generated when converting E-R diagram to a set of tables.

    • R could have been a single relation containing all attributes that are of interest (called universal relation).

  • Normalization breaks R into smaller relations.

    R could be the result of any ad hoc design of relations, which we then test & convert to normal form.


  • First Normal Form.

  • Functional Dependencies.

  • Second Normal Form.

  • Third Normal Form.

  • Boyce-Codd Normal Form.

  • Fourth Normal Form.

  • Fifth Normal Form.

  • Domain Key / Normal Form.

  • Design Process & Problems.

First normal form 1nf
First Normal Form – 1NF

  • You aren’t supposed to have more than one value per attribute of a tuple.

  • All tuples have the same number of attributes.

  • Necessary for a relational database.


Getting caught out with 1nf
Getting Caught Out With 1NF

  • A domain is atomic if its elements are considered to be indivisible units.

    • Examples of non-atomic domains:

      • Set-valued attributes, composite attributes.

      • Identifiers like CS101 that can be broken up into parts.

  • A relational schema R is in first normal form if the domains of all attributes of R are atomic.

  • Non-atomic values:

    • complicate storage,

    • encourage redundancy,

    • Depend on interpretation built into application programs.

Are you atomic
Are You Atomic?

  • Atomicity is not an intrinsic property of the elements of the domain.

  • Atomicity is a property of how the elements of the domain are used.

    • E.g. strings containing a possible delimiter (here: a space)

      • cities = “Melbourne Sydney” (non-atomic: space separated list)

      • surname = “Fortescue Smythe” (atomic: compound surname)

    • E.g. strings encoding two separate fields

      • bucs_login = cssjjb

      • If the first two characters are extracted to find the department, the domain bucs_login is not atomic.

      • Leads to encoding of information in application program rather than in the database.

Second normal form 2nf
Second Normal Form (2NF)

  • Violated when a nonkey column is a fact about part of the primary key.

  • A column is not fully functionally dependent on the full primary key.

    • CUSTOMER-CREDIT in this case:

From Watson










Def functional dependency
Def: Functional Dependency

  • Let R be a relation schema

      R and   R

  • The functional dependency (FD)  holds on R (“ is FD on ”) iff for any legal relations r(R):

    • whenever any two tuples t1and t2 of r agree on the attributes 

    • they also agree on the attributes .

    • i.e. (t1) = (t2)   (t1) =  (t2)

  • Example: Consider r(A,B) with the following instance of r:

  • AB does NOT hold, but BA does hold

Functional dependencies uses
Functional Dependencies: Uses

  • Way to encode “business rules”.

  • Specify constraints on the set of legal relations.

    • We say that Fholds onR if all legal relations on R satisfy the set of FDs F.

  • Test relations to see if they are legal under a given set of FDs.

    • If a relation r is legal under a set F of FDs, we say that rsatisfies F.

Functional dependencies
Functional Dependencies

  • An FD is an assertion about a schema, not an instance.

  • If we only consider an instance or a few instances, we can’t tell if an FD holds.

    • Inspecting only a few bird species (e.g. crows, cardinals and canaries) we might conclude colour  species.

    • However, this would be a bad FD as we would find out if we found some ravens.

  • Thus, identifying FDs is part of the data modelling process.

Trivial functional dependencies
Trivial Functional Dependencies

  • An FD is trivial if it is satisfied by all instances of a relation

    • E.g.

      • customer-name, loan-number customer-name

      • customer-name customer-name

    • In general,   is trivial if   

  • Permitting such FDs makes certain definitions and algorithms easier to state.

Functional dependency vs key
Functional Dependency vs Key

  • FDs can express the same constraints we could express using keys:

  • Superkeys:

    • K is a superkey for relation schema R if and only if K R

  • Candidate keys:

    • K is a candidate key for R if and only if

      • K R, and

      • there is no K’  K such that K’ R

  • Of course, which candidate key becomes the primary key is arbitrary.

Fds keys
FDs <> Keys

  • FDs can represent more information than keys can on their own.

  • Consider the following Loan-info-schema:

    Loan-info-schema = (customer-name, loan-number,branch-name, amount).

    We expect these FDs to hold:

    loan-numberamount loan-number  branch-name

    We could try to express this by making loan-number the key,however the following FD does not hold:

    loan-number customer-name

  • Incidentally, this isn’t a very good table! (¬2NF)

Fd closure
FD Closure

  • Given a set F of FDs, other FDs are logically implied.

    • E.g. If A  B and B  C, we can infer that A  C

  • The set of all FDs implied by F is the closure of F, written F+ .

  • Find F+by applying Armstrong’s Axioms:

    • if   , then   (reflexivity)

    • if  , then    (augmentation)

    • if  , and   , then   (transitivity)

  • Additional rules (derivable from Armstrong’s Axioms):

    • If   and  holds, then    holds (union)

    • If    holds, then   holds and  holds (decomposition)

    • If   holds and   holds, then   holds (pseudotransitivity)

Bad decomposition example from last time
Bad Decomposition Example(From Last Time)

  • A Non Lossless-Join Decomposition

    R = (A, B) R1 = (A), R2 = (B)



















A (r) ⋈ B (r)

  • Thus, r is different to A (r) ⋈ B (r)

  • So A,B is not a lossless-join decomposition of R.

Fds lossless decomposition
FDs & Lossless Decomposition

  • All attributes of an original schema (R) must appear in the decomposition (R1, R2):

    R = R1  R2

  • Lossless-join decomposition.For all possible relations r on schema R

    r = R1 (r) ⋈ R2 (r)

  • A decomposition of R into R1 and R2 is lossless-join if and only if at least one of the following dependencies is in F+:

    • R1 R2R1

    • R1 R2R2

Second normal form 2nf1
Second Normal Form (2NF)

  • Violated when a nonkey column is a fact about part of the primary key.

  • A column is not fully functionally dependent on the full primary key.

    • CUSTOMER-CREDIT in this case:

From Watson










Third normal form 3nf
Third Normal Form (3NF)

  • Violated when a nonkey column is a fact about another nonkey column.

  • A column is not fully functionally dependent on the primary key.

  • R is 3NF iff R is 2NF and has no transitive dependencies.

    • EXCHANGE RATE violates this.



Boyce codd bcnf
Boyce-Codd (BCNF)

  • Arises when a table:

    • has multiple candidate keys,

    • the candidate keys are composite,

    • the candidate keys overlap.

  • R is BCNF iff every determinant is a cand. key.

    • E.g. Assume one consultant per problem per client, and one problem per consultant.

    • If client-problem is the primary key, how do you add a new consultant?

  • Like 3NF but now worry about all fields.



Design goals their discontents
Design Goals & their discontents

  • Goals for a relational database design:

    • eliminate redundancies by decomposing relations,

    • must be able to recover original data using lossless joins,

    • prefer not to loose dependencies.

  • BCNF:

    • no redundancies,

    • no guarantee of dependency preservation.

  • 3NF:

    • dependency preservation,

    • but possible redundancies.

Fourth normal form 4nf
Fourth normal form (4NF)

  • A row should not contain two or more independent multivalued facts.

  • 4NF iff BCNF & no non-trivial multi-valued dependencies.

  • Multivalued dependency means the value of one attributed determines a set of values for another.



Fifth normal form 5nf
Fifth normal form (5NF)

  • 5NF iff a relation has no join dependency.

  • The schemas R1, R2,.., Rn have a join dependency over R if they define a lossless-join decomposition over R.

  • This is way too complicated, don’t worry about it.

Domain key normal form
Domain Key Normal Form

  • Every constraint on the relation must be a logical consequence of the domain constraints and the key constraints that apply to the relation.

    • Key: unique identifier.

    • Constraint: rule governing attribute values.

    • Domain: set of values of the same data type.

  • No known algorithm gives DK/NF.

E r model and normalization
E-R Model and Normalization

  • When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated should not need further normalization.

  • However, in a real (imperfect) design there can be FDs from non-key attributes of an entity to other attributes of the entity.

  • The keys identified in E-R diagrams might not be minimal - FDs can help us to identify minimal keys.

  • FDs from non-key attributes of a relationship set are possible, but rare.

Denormalization performance
Denormalization & Performance

  • May want to use non-normalized schema for performance.

    • E.g. displaying customer-name along with account-number and balance requires join of account with depositor.

  • Alternative 1: Use denormalized relation containing attributes of account as well as depositor.

    • faster lookup.

    • extra space and extra execution time for updates.

    • extra coding work for programmer and possibility of error in extra code.

  • Alternative 2: use a materialized view defined as account ⋈ depositor

    • as above, except less extra coding, errors.

Limits of normalization
Limits of Normalization

  • Examples of bad database design, not caught by normalization.

  • Good:

    • earnings(company-id, year, amount)

  • Bad:

    • earnings-2000, earnings-2001, earnings-2002, etc., on (company-id, earnings)

      • all are BCNF, but querying across years difficult

      • needs a new table each year

    • company-year(company-id, earnings-2000,earnings-2001, earnings-2002)

      • in BCNF, but querying across years difficult

      • requires new attribute each year

Summary 1 rules to watch
Summary 1 – Rules to Watch

  • 1NF: attributes not atomic.

  • 2NF: non-key attribute FD on part of key.

  • 3NF: one non-key attribute FD on another.

  • Boyce-Codd NF: overlapping but otherwise independent candidate keys.

  • 4NF: multiple, independent multi-valued attributes.

  • 5NF: join dependency.

  • Domain Key / NF: all constraints either domain or key

Summary 2 concepts
Summary 2 – Concepts

  • Functional Dependencies:

    • Axioms & Closure.

  • Lossless-join decomposition.

  • Design Process.

  • Normalization Problems.

    Next: Interfaces and Architectures

Reading exercises
Reading & Exercises

  • Reading

    • Connolly & Begg Chapter (13, 14)

    • Silberschatz Chapters 7.

    • Any other book, the design/normalization chapter.

  • Exercises:

    • Silberschatz

      • 7.1, 7.2, 7.16, 7.23, 7.24, 7.27-29

Next week
Next Week

  • Architectures and Implementations

  • Integrity and Security

Cm20145 further db design normalization

Slides after and including this one you are not responsible for, but I am saving in case I decide to use them in the future.

Goal formalize good design
Goal: Formalize “Good Design” for, but I am saving in case I decide to use them in the future.

  • Process:

    • Decide whether a particular relation R is in “good” form.

    • In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that:

      • each relation is in good form,

      • the decomposition is a lossless-join decomposition.

  • Theory:

    • Constraints on the set of legal relations.

    • Require that the value for a certain set of attributes determines uniquely the value for another set of attributes – functional dependencies.