Schema refinement and normal forms
This presentation is the property of its rightful owner.
Sponsored Links
1 / 134

Schema Refinement and Normal Forms PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on
  • Presentation posted in: General

Schema Refinement and Normal Forms. Review: Database Design. Requirements Analysis user needs; what must database do? Conceptual Design high level description (often done w/ER model) Logical Design translate ER into DBMS data model Schema Refinement consistency, normalization

Download Presentation

Schema Refinement and Normal Forms

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Schema refinement and normal forms

Schema Refinement and Normal Forms


Review database design

Review: Database Design

  • Requirements Analysis

    • user needs; what must database do?

  • Conceptual Design

    • high level description (often done w/ER model)

  • Logical Design

    • translate ER into DBMS data model

  • Schema Refinement

    • consistency, normalization

  • Physical Design - indexes, disk layout

  • Security Design - who accesses what


Design steps

Design Steps

  • Step (3) to step (4) is based on a “design theory” for relations and is called “normalization”. It is important for two reasons:

    • Automatic mappings from ER to relations may not produce the best relational design possible.

    • Database designers may go directly from (1) to (3), in which case, the relational design can be really bad.


Informal guidelines

Informal guidelines

  • Semantics of the attributes

    • easy to explain relation

    • doesn’t mix concepts

  • Reducing the redundant values in tuples

  • Choosing attribute domains that are atomic

  • Reducing the null values in tuples

  • Disallowing spurious tuples


1 semantics of attributes

1. Semantics of Attributes

  • Semantics of attributes specify how to interpret the attributes values stored in a tuple of the relation.

  • In other words, how the attributes’ values in a tuple are related to one another.

    Guideline 1:

  • Design a relation schema so that it is easy to explain its meaning.

  • Do not combine attributes from multiple entity types and relationship types into a single relation.


2 null values

2. Null Values

  • If many of the attributes do not apply to all tuples in the relation, we end up with many null values (no value) in those tuples.

  • This leads to wasted space and misunderstandings.

    Guideline 2:

  • As much as possible, avoid placing attributes in a relation whose values may frequently be null.

  • If nulls are unavoidable, make sure that they apply in exceptional cases, and do not apply to the majority of tuples in the relation.


3 spurious tuples

3. Spurious Tuples

  • Additional tuples that were not in the original relation are called spurious tuples because they represent spurious or wrong information that is not valid.

  • This is called the lossless join property.

    Guideline 3:

  • Design relation schemas so that they can beJOINed (with equality condition on attributes that are either primary keys or foreign keys) in a way that guarantees that no spurious tuples are generated.


3 spurious tuples cont

CustomerID

Title

Date

Price

Kind

0001

True Lies

04-19-2002

3.25

D

0002

True Lies

04-21-2002

3.25

D

0001

The Lion King

04-19-2002

4.00

C

0003

The Lion King

04-19-2002

4.00

C

0001

Henry V

04-18-2002

1.75

D

3. Spurious Tuples cont.


3 spurious tuples cont1

Title

Price

Kind

True Lies

3.25

D

The Lion King

3.25

C

Henry V

1.75

D

CustomerID

Title

Price

Kind

Date

0001

True Lies

3.25

D

04-19-2002

0002

True Lies

3.25

D

04-21-2002

0001

The Lion King

3.25

C

04-19-2002

0003

The Lion King

3.25

C

04-19-2002

0001

Henry V

1.75

D

04-18-2002

0002

The Lion King

3.25

D

04-21-2002

0003

True Lies

3.25

C

04-19-2002

3. Spurious Tuples cont.

Bad Relational Schema:

CustomerID

Price

Date

0001

3.25

04-19-2002

0002

3.25

04-21-2002

0003

3.25

04-19-2002

0001

1.75

04-18-2002

The

Join

Of the

Above

2 Relations


3 spurious tuples cont2

Title

Price

Kind

True Lies

3.25

D

The Lion King

3.25

C

Henry V

1.75

D

CustomerID

Title

Date

0001

True Lies

04-19-2002

0002

True Lies

04-21-2002

CustomerID

Title

Price

Kind

Date

0001

The Lion King

04-19-2002

0001

True Lies

3.25

D

04-19-2002

0003

The Lion King

04-19-2002

0002

True Lies

3.25

D

04-21-2002

0001

Henry V

04-18-2002

0001

The Lion King

3.25

C

04-19-2002

0003

The Lion King

3.25

C

04-19-2002

0001

Henry V

1.75

D

04-18-2002

3. Spurious Tuples cont.

Good Relational Schema:

The

Join

Of the

Above

2 Relations


4 reducing redundancies cont

CustomerID

Title

Date

Price

Kind

0001

True Lies

04-19-2002

3.25

D

0002

True Lies

04-21-2002

3.25

D

0001

The Lion King

04-19-2002

4.00

C

0003

The Lion King

04-19-2002

4.00

C

0001

Henry V

04-18-2002

1.75

D

4. Reducing Redundancies cont.

Modification Anomalies:

  • Insertion Anomaly:

    • Cannot insert information about a film if it has not been rented yet.

  • Update Anomaly:

    • Updating the rental price for “True Lies” to $4, requires changing it in several typles (if not, it will cause inconsistencies).

  • Deletion Anomaly:

    • Deleting the rental information will cause the film information to disappear.


4 reducing redundancies

4. Reducing Redundancies

  • Redundancies in a relation schema result in:

    • Waste of space

    • Potential for inconsistent data (loss of data integrity)

    • Potential for modification anomalies (unusual behavior):

      • Insertion anomalies

      • Update anomalies

      • Deletion anomalies

        Guideline 4:

  • Design the relation schemas so that no insertion, modification, or modification anomalies occur.


  • Refinements

    Refinements


    Schema refinement and normal forms

    • Integrity constraints, in particularfunctional dependencies, can be used to identify schemas with such problems and to suggest refinements.

    • Decomposition should be used judiciously:

      • Is there reason to decompose a relation?

      • What problems (if any) does the decomposition cause?


    Schema refinement and normal forms

    Q1)answered by applying various Normal forms

    Q2)answered by

    properties of decomposition that interests us are

    lossless-join ( enables us to recover any instance of the decomposed relation from corresponding instances of the smaller relations)

    dependency-preservation ( enables us to enforce any constraint on the original relation by simply enforcing some constraints on each of the smaller relations. We do not have to perform join of smaller relation to check if a constraint on original relation is violated.


    Schema refinement and normal forms

    • From Performance point of view

      If queries over the original relation are common then decomposing is not acceptable

      In some cases the decomposition is improves performance when queries and updates examine only decomposed relations.


    A bad relational schema

    A BAD Relational Schema

    An Improved Schema


    What s a good design

    What’s a Good Design?

    • Three properties:

      • No anomalies.

      • Can reconstruct all original information.

      • Ability to check all FDs within a single relation.

    • Role of FDs in detecting redundancy:

      • Consider a relation R with 3 attributes, ABC.

        • No FDs hold: There is no redundancy here.

        • Given A B: Several tuples could have the same A value, and if so, they’ll all have the same B value!


    Decomposition of a relation scheme

    Decomposition of a Relation Scheme

    • Suppose that relation R contains attributes A1 ... An. A decompositionof R consists of replacing R by two or more relations such that:

      • Each new relation scheme contains a subset of the attributes of R (and no attributes that do not appear in R), and

      • Every attribute of R appears as an attribute of one of the new relations.

    • Intuitively, decomposing R means we will store instances of the relation schemes produced by the decomposition, instead of instances of R.

    • E.g., Can decompose SNLRWH into SNLRH and RW.


    Functional dependency

    Functional Dependency

    • A functional dependency (FD) is a constraint between two sets of attributes in relation R.

    • It is denoted by: X  Y

    • Reads:

      • Y is functionally dependent on X

      • X (functionally) determines Y

    • Means:

      • If two tuples in R agree on their X-value, they must necessarily agree on their Y-value.


    Functional dependencies fds

    Functional Dependencies (FDs)

    • A functional dependencyX  Y holds over relation R if, for every allowable instance r of R:

      • i.e., given two tuples in r, if the X values agree, then the Y values must also agree. (X and Y are sets of attributes.)

    • K is a key for relation R if:

      1. K determines all attributes of R.

      2. For no proper subset of K is (1) true.

      • If K satisfies only (1), then K is a superkey.

  • K is a candidate key for R means that K  R

    • However, K  R does not require K to be minimal!


  • Functional dependencies fds1

    Functional Dependencies (FDs)

    • A functional dependencyX  Y holds over relation schema R if, for every allowable instance r of R:

      t1  r, t2  r, pX(t1) = pX(t2)

      implies pY(t1) = pY(t2)

      (where t1 and t2 are tuples;X and Y are sets of attributes)

    • In other words: X  Y means

      Given any two tuples in r, if the X values are the same, then the Y values must also be the same. (but not vice versa)

    • Read “” as “determines”


    Fd examples cont

    EMP_PROJ(SSN, PNUMBER, HOURS, ENAME , PNAME, PLOCATION)

    FD Diagram

    FD Examples cont.

    Can Assert the FDs

    SSN  ENAME

    PNUMBER  { PNAME, PLOCATION }

    { SSN, PNUMBER }  HOURS


    Example

    Example

    • Consider relation Hourly_Emps:

      • Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked)

    • FD is a key:

      • ssn is the key

        • S  SNLRWH

    • FDs give more detail than the mere assertion of a key.

      • rating determines hrly_wages

        • R  W


    Fd s continued

    FD’s Continued

    • An FD is a statement about all allowable relations.

      • Must be identified based on semantics of application.

      • Given some instance r1 of R, we can check if r1 violates some FD f, but we cannot determine if f holds over R.

    • FDs are a generalization of keys.


    Try this one

    FD

    T/F

    AB  C

    is equivalent to

    {A, B}  C

    Try This One

    Assuming that all the FDs in the relation are apparent in the following instance of the relation:

    A  B

    F

    A  C

    T

    B  A

    F

    B  C

    F

    C  A

    T

    C  B

    F

    AB  C

    T

    AC  B

    F

    BC  A

    T


    And this one

    R (SID, CourseID, TotalCreditHours, Grade , SName, Status)

    Can Assert the FDs

    … and This One

    SID  { SName, TotalCreditHours, Status }

    { SID, CourseID }  Grade

    TotalCreditHours  Status


    Notes about fds

    Notes about FDs

    • Functional dependencies are constraints that hold on the whole relation R, not on any particular instance of the relation.

    • An FD X  Y is trivial if YX (subset)

      Examples:

      • StuID  StuID

      • { StuID, CourseID }  CourseID

  • X  Y does not mean Y  X (an FD is not reversible)


  • Notes about fds cont

    Notes about FDs cont.

    • The left-hand-side (LHS) of any FD

      X  Y (Xin this case) is called a determinant.

    • Even though we can write X  YZ (standard form), you should always remember that this is TWOFDs in one: X  Y and X  Z (canonical form).

    • We can write the above formally as:

      X  YZ|={ X  Y , X  Z }

      ( |=denotes “logical implication”)


    Notes about fds cont1

    R (SID, CourseID, TotalCreditHours, Grade , SName, Status)

    F = {SID  { SName, TotalCreditHours, Status },

    { SID, CourseID }  Grade

    TotalCreditHours  Status}

    Notes about FDs cont.

    • We denote by Fthe set of functional dependencies that are specified on a relation schema R.


    Notes about fds cont2

    R (SID, CourseID, TotalCreditHours, Grade , SName, Status)

    SID  SName

    {SID, CourseID}  Grade

    {SID, CourseID}  SName

    {SID, CourseID}  Status

    Notes about FDs cont.

    • If X  Y is an FD that holds in R, we say that Y is fully FD on X if removal of any attribute from X means that the FD does not hold any more; otherwise, we say Y is partially FD on X.

    • Notice that if X is a single attribute, then for sure Y is fully FD on X.

    SName is fully FD onSID

    Grade is fully FD on {SID, CourseID}

    SName is NOT fully FD on {SID, CourseID}

    Status is NOT fully FD on {SID, CourseID}


    Reasoning about fds

    Reasoning About FDs

    • Given some FDs, we can usually infer additional FDs:

      • ssn  did, did  lot implies ssn  lot

    • An FD f is implied bya set of FDs F if f holds whenever all FDs in F hold.

      • F+ = closure of F is the set of all FDs that are implied by F.

    • Armstrong’s Axioms (X, Y, Z are sets of attributes):

      • Reflexivity: If Y X, then X  Y

      • Augmentation: If X  Y, then XZ  YZ for any Z

      • Transitivity: If X  Y and Y  Z, then X  Z

    • These are sound and completeinference rules for FDs!


    Fd inference rules

    FD Inference Rules

    IR1 (Reflexivity): X  Y  |=  X  Y

    • X is superset of Y

    • The trivial dependency rule (e.g.AB  B); useful for derivations.

      IR2 (Augmentation): X  Y  |=  XZ  YZ

    • If a dependency holds, then we can freely expand its left hand side.

      IR3 (Transitivity):X Y, Y  Z  |= X  Z

    • The most powerful inference rule; useful in multi-step derivations.


    Fd inference rules cont

    FD Inference Rules cont.

    Armstrong inference rules (also called Armstrong’s Axioms) are:

    • Sound:

      meaning that given a set of FDs F specified on a relation schema R, any FD that we can infer from F by using IR1 through IR3 holds on every relation state (instance) of R that satisfies the dependencies in F.

    • Complete:

      meaning that using IR1 through IR3 repeatedly to infer FDs, until no more FDs can be inferred, results in the complete set of all possible FDs that can be inferred from F (closure of F, denoted as F+).


    Ir proofs

    IR Proofs

    Prove or Disprove:

    X YZ|= X  Y and X  Z

    (decomposition or projective rule IR4)

    X  YZ(given)

    YZ  Y(using IR1 and knowing that YZ Y)

    X  Y(using IR3 on 1 and 2)


    Reasoning about fds cont

    Reasoning About FDs (Cont.)

    • Couple of additional rules (that follow from AA):

      • Union IR5: If X  Y and X  Z, then X  YZ

    • Proof of Union:

      • X  Y (given)

      • X  XY (augmentation using X)

      • X  Z (given)

      • XY  YZ (augmentation)

      • X  YZ (transitivity)


    Ir proofs1

    IR Proofs

    Prove or Disprove:

    X  Y, WY  Z  |= WX  Z

    (pseudotransitive rule IR6)

    X  Y(given)

    WY  Z(given)

    WX  WY(using IR2 on 1 by augmenting with W)

    WX  Z(using IR3 (transitivity) on 3 and 2)


    Reasoning about fds1

    Reasoning About FDs

    • Computing the closure of a set of FDs can be expensive. (Size of closure is exponential in # attrs!)

    • Typically, we just want to check if a given FD X  Y is in the closure of a set of FDs F. An efficient check:

      • Compute attribute closureof X (denoted X+) wrt F:

        • Set of all attributes A such that X  A can be inferred using the Armstrong Axioms

        • There is a linear time algorithm to compute this.

      • Check if Y is in X+

    • Does F = {A  B, B  C, C D  E } imply A  E?

      • i.e, is A  E in the closure F+ ? Equivalently, is E in A+ ?


    Finding all implied fds

    Finding All Implied FDs

    • Motivation: Suppose we have a relation ABCD with some FDs F. If we decide to decompose ABCD into ABC and AD, what are the FDs for ABC, AD?

    • Example: F = AB  C, C  D, D  A. It looks like just

      AB  C holds in ABC, but in fact C  A follows from F and applies to relation ABC.

    • Problem is exponential in worst case.

    • Algorithm to find F+:

      • For each set of attributes X of R, compute X+.


    Closure of attributes

    X+ := X

    repeat

    oldX+ := X+

    for each FD Y  Z in F do

    if Y  X+ then X+ := X+ Z

    until oldX+ = X+

    Closure of Attributes

    • Given a set of FDs F in relation R, the set of all the attributes that can be determined (directly or indirectly) from a given attribute (or set of attributes) X is called the closure of X, denoted by X+

    • X+ can be determined using the simple algorithm:


    Example1

    Example

    • A  B, BC  D

      • A+ = AB

      • C+ = C

      • (AC)+ = ABCD

    • Thus, AC is a key.


    Example2

    Example

    • F = AB  C, C  D, D  A. What FDs follow?

      • A+ = A; B+ = B (nothing)

      • C+ = ACD (add C  A)

      • D+ = AD (nothing new)

      • (AB)+ = ABCD (add AB  D; skip all supersets of AB).

      • (BC)+ = ABCD (nothing new; skip all supersets of BC).

      • (BD)+ = ABCD (add BD  C; skip all supersets of BD).

      • (AC)+ = ACD; (AD)+ = AD; (CD)+ = ACD (nothing new).

      • (ACD)+ = ACD (nothing new).

      • All other sets contain AB, BC, or BD, so skip.

      • Thus, the only interesting FDs that follow from F are:

        • C  A, AB  D, BD  C.


    Ir proofs cont

    It is clear from those two records that the FD

    X  Y

    does not hold

    X

    Y

    Z

    x1

    ?

    z1

    x2

    y2

    x1

    ?

    z1

    x3

    y3

    ?

    y1

    z2

    ?

    y1

    z2

    IR Proofs cont.

    Prove or Disprove:

    X  Z, Y  Z  |= X  Y

    We will disprove this by presenting a relation instance that satisfies the FDs in the LHS of the rule, but does not satisfy the FDs in the RHS:


    Try this

    Try This

    Prove or Disprove:

    X  Y, X  W, WY  Z|= X  Z


    Schema refinement and normal forms

    X  Y(given)

    X  W(given)

    WY  Z(given)

    X  XY(using IR2 on 1 (augmenting with X; XX=X))

    XY  WY(using IR2 on 2 (augmenting with Y))

    X  WY(using IR3 on 4 and 5)

    X  Z(using IR3 on 6 and 3)


    And this

    Notice the choices for values

    X

    Y

    Z

    W

    x1

    y1

    z1

    ?

    x2

    y2

    It is clear from those two records that the FD

    X  W

    does not hold

    w2

    x1

    y1

    z1

    ?

    y3

    x1

    w2

    ?

    ?

    z2

    w1

    ?

    ?

    z2

    w1

    … and This

    Prove or Disprove:

    XY Z, Z  W  |= X  W

    We will disprove this by presenting a relation instance that satisfies the FDs in the LHS of the rule, but does not satisfy the FDs in the RHS:


    Closure example

    C

    A

    B

    You Have

    Visit

    You Get

    E

    A

    B

    C

    You Have

    Visit

    You Get

    F

    A

    B

    C

    E

    You Have

    Visit

    You Get

    A

    B

    C

    E

    F

    You Have

    Closure Example

    A  BC

    E  CF

    • R(A, B, C, D, E, F) with FDs:

    • Compute {A, B}+

      (can also be written as AB+)

    B  E

    CD  EF

    End of Shopping

    AB+ = ABCEF (i.e. {A,B}+ = {A,B,C,E,F})


    Try this1

    A+

    C+

    BC+

    AD+

    ABCD+

    Try This

    A  BC

    E  CF

    • R(A, B, C, D, E, F) with FDs:

    • Compute:

    B  E

    CD  EF

    ABCEF

    C

    BCEF

    ABCDEF

    ABCDEF


    Attribute closure example

    Attribute Closure (example)

    • R = {A, B, C, D, E}

    • F = { B CD, D  E, B  A, E  C, AD B }

    • Is B  E in F+ ?

      B+ = B

      B+ = BCD

      B+ = BCDA

      B+ = BCDAE … Yes!

      and B is a key for R too!

    • Is D a key for R?

      D+ = D

      D+ = DE

      D+ = DEC

      … Nope!

    • Is AD a key for R? AD+ = AD

    • AD+ = ABD and B is a key, so Yes!

    • Is AD a candidate key for R?

      • A+ = A, D+ = DEC

      • … A,D not keys, so Yes!

    • Is ADE a candidate key for R?

    • … No! AD is a key, so ADE is a superkey, but not a cand. key


    Schema refinement and normal forms

    Given P QR

    Q S

    Using augmentation,

    QR  RS

    PQR, QR RS hence PRS

    RT (given)

    RS  ST (augmentation)

    By transitivity

    PRS, RS ST hence PST


    Equivalence of sets of functional dependencies

    Equivalence of sets of functional dependencies

    • A set of FD F is said to cover another set of FD E if every FD in E is also in F+. i.e if every dependency in E can be inferred from F, alternately we can say that E is covered by F

    • Two sets of FD E and F are equivalent if

      E+ = F+

      .i.e E covers F and F covers E


    A note about closures

    A Note About Closures

    • We have just seen that given a set of FDs that hold in R, we can calculate the closure of a set of attributes

    • The reverse process is also possible. In other words, given a closure of a set of attributes we can infer the non-trivial FDs that hold in R


    Note about closure cont

    A  BC

    E  CF

    B  E

    CD  EF

    • As a matter of fact, it can even be proven, using inference rules, if we knew it only came from the relation in the previous example

    Note About Closure cont.

    • Given R(A, B, C, D, E, F), and:

    B+

    B C E F

    • It can be easily inferred that the following FD holds on R:

    B  C E F


    Minimal cover for an fd set

    Minimal Cover for an FD Set

    • One can imagine the number of FDs that can be inferred from a given set of FDs (using inference rules).

    • Thus, we need minimal set of FDs that maintains the relationships between all the attributes and with no redundancies


    Minimal cover cont

    Minimal Cover cont.

    • A set of FDsFis minimal iff:

    • Every FD inF has a single attribute for its RHS

    • We cannot replace any FD AB in F with an FD CB, where Cis a proper subset of A, and still have a set of FDs that is equivalent to F

    • We cannot remove any FD in F, and still have a set of FDs that is equivalent to F


    Algorithm to find minimal cover

    Algorithm to Find Minimal Cover

    Step 1:Make the RHS of every FD singular. In other words, replace every FD of the type A  BC with A  B and A  C

    Step 2:For every FD A  B where A is not singular. Remove any attribute (or set of attributes) X from A, and check if A+ = {A-X}+. If the answer is yes, then replace A  B with {A-X}  B

    Step 3:For every FD A  B, cover it with your finger (i.e. imagine it not there), and ask if A+ includes B without it. If the answer is yes, eliminate A  B permanently

    Note:For each step, You can use the resulting FDs from the previous step.


    Minimal cover example

    Step 2

    remove C

    is D+=CD+?

    Results

    remove D

    is C+=CD+?

    Step 1

    remove A

    is D+=AD+?

    A  B

    A  B

    remove D

    is A+=AD+?

    A  C

    A  C

    A  D

    A  D

    CD B

    CD B

    Step 3

    AD  C

    A  C

    AD  C

    A  B

    cover it:

    is B in A+?

    A  C

    cover it:

    is C in A+?

    A  D

    cover it:

    is D in A+?

    CD B

    cover it:

    is B in CD+?

    Minimal Cover Example

    A  BCD

    CD B

    AD  C

    • R(A, B, C, D) with FDs:

    CD B

    no

    no

    AD  C

    no

    yes

    A  C

    A  D

    CD B

    yes

    no

    no

    no


    Try this one1

    Step 2

    remove B

    is A+=AB+?

    Results

    remove A

    is B+=AB+?

    Step 1

    D  A

    D  A

    D  C

    D  C

    D  E

    D  E

    A B

    A B

    Step 3

    AB  C

    A  C

    AB  C

    D  A

    cover it:

    is A in D+?

    D  C

    cover it:

    is C in D+?

    D  E

    cover it:

    is E in D+?

    A B

    A  C

    cover it:

    cover it:

    is C in A+?

    is B in A+?

    Try This One

    D  ACE

    A B

    • R(A, B, C, D, E) with FDs:

    AB  C

    AB C

    no

    D  A

    yes

    D  E

    A B

    no

    A  C

    yes

    no

    no

    no


    Note on minimal cover algorithm

    Results

    Step 2

    ….

    remove AB

    is C+=ABC+?

    remove AC

    is B+=ABC+?

    ….

    remove BC

    is A+=ABC+?

    B  D

    A  D

    BC  D

    ABC  D

    remove A

    is BC+=ABC+?

    remove B

    is C+=BC+?

    remove C

    is B+=BC+?

    Note on Minimal Cover Algorithm

    In Step 2:For every FD A  B where A is not singular. Remove any attribute (or set of attributes) X from A, and check if A+ = {A-X}+. If the answer is yes, then replace A  B with {A-X}  B.

    ABC D

    no

    no

    yes

    no

    ABC  D

    no

    yes

    BC D

    no

    yes


    Schema refinement and normal forms

    Keys

    • If X+ (the closure of the attribute (or set of attributes) X) includes all the attributes in a relation R, then X is a superkey (SK) for the relation R

    • If X is a superkey for R, and the removal of any attribute from X will cause X not to be a superkey anymore, then X is called the key for R

    • The difference between a key and a superkey is that a key has to be minimal


    Keys cont

    Keys cont.

    • If R has more than one key, then each is called a candidate key (CK) for R

    • The terms key and candidate key are used interchangeably

    • One of the candidate keys for R is (arbitrarily) designated to be the primary key (PK) for R, and the others are called secondary keys

    • Secondary keys are used for indexing purposes to improve performance


    Keys example

    A  CDE

    B  CE

    AD  E

    A

    CD  F

    AD

    BD  A

    BD

    CDE  ABD

    CD

    B+

    CBE

    CDE

    D+

    D

    C+

    C

    B+

    A+

    ABCDEF

    CBE

    SK

    D+

    D

    CK

    AD+

    ABCDEF

    E+

    E

    BD+

    ABCDEF

    CE+

    CE

    CD+

    CDF

    ……..

    CDE+

    ABCDEF

    Keys Example

    • R(A, B, C, D, E, F) with FDs:


    Finding cks systematically

    Finding CKs Systematically

    To systematically find all the candidate keys for any relation is a brute force algorithm that requires trying all the possible combinations of the attributes:

    Single attribute:take the closure of every single attribute. If the closure gives you all the attributes in the relation, then the attribute is a CK.

    Two attributes:Take the closure of the 2-attributes combinations not including the ones you found in the previous step above.

    Three attributes:Take the closure of the 3-attributes combinations not including the ones you found in the previous step above.

    Four attributes:……… Continue as above


    Finding cks example1

    Answer:

    Single attribute:A+ = ABCDEFA is a CK

    Then we tryB+, C+, D+, E+, F+(but they will not work)

    Two attributes: (all combinations not including A because no future CK can contain A which is itself a CK):

    BD+ = ABCDEFBD is a CK

    Then we tryBE+, BF+, CD+, CE+………… etc.

    Three attributes:(all combinations not including A or BD)

    CDE+ = ABCDEFCDE is a CK

    Then we tryBCE+,BCF+,BEF+………… etc.

    Four attributes:(all combinations not including A, BD or CDE)

    BCEF+ = ….

    Final Results: The only CKs are: A, BD & CDE.

    Finding CKs Example1

    Find the CKs for the relation R (A, B, C, D, E, F) with FDs:

    FD1:B  CEFD2:AD  E FD3: CD  F FD4: BD  A

    FD5:A  CDE FD6:CDE  ABD


    Hints on finding cks

    Hints on Finding CKs

    • Look at the RHS of the FDs. If an attribute does not appear on the RHS of all the FDs, then such attribute must be part of the CK. Why?

    • It makes your life easier if you start with the minimal cover instead of the given FDs.

    • Do not be fooled by trying ONLY the attributes (or combination of attributes) that appear on the LHS of the FDs. (An example will show you why?)


    Finding cks example2

    Answer:

    Because any candidate key must have B and E in its closure, but B and E do not appear at the RHS of any FDs, thus B and E can only be determined by themselves.

    Therefore B and E must be part of the candidate key (1)

    Single attribute: not applicable because of (1)

    Two attributes: only BE is possible because of (1)

    BE+ = ABCDEBE is a CK

    Three attributes: because BE was fund to be a CK, thus no future candidate key can contain BE, but because of (1) any CK must contain BE, this is a contradiction

    Final Results: The only CK is BE.

    Finding CKs Example2

    Find the CKs for the relation R (A, B, C, D, E) with FDs:

    FD1:B  DFD2:E  C FD3: AC  D

    FD4: CD  AFD5:BE  A


    Finding cks example3

    Answer:

    Single attribute:A+ = A, B=B, E +=AE+

    C+ = ABCDEC is a CK

    D+ = ABCDED is a CK

    Two attributes:(all combinations not including C or D)

    AE+ = AE

    AB+ = ABCDEAB is a CK

    BE+ = ABCDEBE is a CK

    Three attributes:(all combos not including C, D, AB or BE)None can be found

    Finding CKs Example3

    Find the CKs for the relation R (A, B, C, D, E) with FDs:

    FD1:E  AFD2:D  BE FD3: C  DFD4: AB  C

    Notice:If you would have tried only the LHS of the FDs, then BE would not have been found as a CK.


    Prime or nonprime

    Prime or Nonprime

    • An attribute that is part (member) of any candidate key is called a prime attribute

    • An attribute is called nonprime if it not a prime attribute


    Database normalization

    Database Normalization

    • Database normalization is the process of removing redundant data from your tables in order to improve storage efficiency, data integrity, and scalability.

    • In the relational model, methods exist for quantifying how efficient a database is. These classifications are called normal forms (or NF), and there are algorithms for converting a given database between them.

    • Normalization generally involves splitting existing tables into multiple ones, which must be re-joined or linked each time a query is issued.


    History

    History

    • Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form in his paper A Relational Model of Data for Large Shared Data Banks Codd stated:

      “There is, in fact, a very simple elimination procedure which we shall call normalization. Through decomposition nonsimple domains are replaced by ‘domains whose elements are atomic (nondecomposable) values.’”


    Normal form

    Normal Form

    • Edgar F. Codd originally established three normal forms: 1NF, 2NF and 3NF. There are now others that are generally accepted, but 3NF is widely considered to be sufficient for most applications. Most tables when reaching 3NF are also in BCNF (Boyce-Codd Normal Form).


    Normal forms

    Normal forms

    Universe of relations

    1 NF

    2NF

    3NF

    BCNF

    4NF

    5NF


    First normal form

    First Normal Form

    • A relation is in 1NF iff (if and only if) every attribute is single valued for each tuple (i.e. no multivalued attributes).

    • In other words, a relation is in 1NF if it has the characteristics of a table:

      • Has the rows and columns format

      • Each data value is atomic

      • Each column has a domain

      • Has at least one key (i.e. no duplicate records)

      • Order of rows is not important

      • Order of columns is not important


    First normal form cont

    First Normal Form cont.

    To put a relation in 1NF:

    • Make the multivalued attribute as part of the key, and duplicate all the other information.


    First normal form cont1

    First Normal Form cont.

    For all our purposes:

    • All relations are in 1NF (i.e. 1NF is of no value other than historical).


    Second normal form

    • Remember:

    • A prime is any attribute that is part of any CK.

    • For X  Y, Y is fully FD on X if Y is not FD on any part of X.

    Second Normal Form

    • A relation R is in 2NF iff every nonprime attribute is fully FD on every candidate key in R

    • In other words, a relation R is in 2NF iff every nonprime attribute is not partially FD on any candidate key in R

    • In other words, R is not in 2NF if it contain an FD of the form:prime  nonprime


    2nf example1

    • Answer:

    • The key is {EmpName, ChildName}

    • In FD1, SpouseName is a nonprime that is FD on a prime EmpName (i.e. SpouseName is not fully FD on the key)

    • That is an explicit violation to 2NF

    • R is not in 2NF.

    2NF Example1

    R (EmpName, SpouseName, ChildName, SpouseAge)with FDs:

    FD1:EmpName  SpouseName

    FD2: SpouseName  SpouseAge

    FD3: {EmpName, ChildName}  {SpouseName, SpouseAge}

    Is R in 2NF?


    2nf example2

    • Answer:

    • The key is AE

    • In FD2, B is a nonprime that is FD a prime A (i.e. B is not fully FD on the key)

    • That is an explicit violation to 2NF

    • R is not in 2NF.

    2NF Example2

    R (A, B, C, D, E, F, G, H)with FDs:

    FD1:AE  GHFD2:A  BC

    FD3:C  DFD4:E  F

    Is R in 2NF?


    Third normal form

    Third Normal Form

    • Arelation R is in 3NF iff the set of FDs for R does not contain any transitive FD.

    • An FD is transitive if it has a nonprime on both of its sides.

    • In other words, R is not in 3NF if it contain an FD of the form:

      nonprime  nonprime


    3nf example1

    • Answer:

    • The CK is D

    • In FD2, a nonprime C is FD on another nonprime AB

    • That is an explicit violation to 3NF

    • R is not in 3NF.

    • Question: Is R in 2NF?

    3NF Example1

    R (A, B, C, D, E)with FDs:

    FD1:A  BFD2:AB  C

    FD3: D  ACE

    Is R in 3NF?


    3nf example2

    • Answer:

    • The key is GE

    • There is no explicit violation to 3NF

    • In FD2, a nonprime BF is FD on a prime G

    • This is an explicit violation to 2NF, and an implicit violation to 3NF

    • R is not in 3NF.

    3NF Example2

    R (A, B, C, D, E, F, G)with FDs:

    FD1:GE  ADFD2:G  BF

    FD3: E  C

    Is R in 3NF?


    Boyce codd normal form

    • Remember:

    • A determinant is the LHS of any FD.

    • A superkey for a relation R is an attribute (or set of attributes) whose closure gives all the attributes of the relation R.

    Boyce-Codd Normal Form

    • Arelation R is in BCNF iff every determinant is a superkey.

    • To find a violation to BCNF, just find a determinant that is not a superkey.


    Bcnf example

    • Answer:

    • A+ = AB

    • A is a determinant in FD1, but A is not a superkey

    • This is an explicit violation to BCNF

    • R is not in BCNF.

    • Question: Is R in 2NF?

    • Question: Is R in 3NF?

    BCNF Example

    R (A, B, C, D)with FDs:

    FD1:A  BFD2:BC  D

    FD3: D  BCFD4: C  A

    Is R in BCNF?


    Notes on bcnf

    Notes on BCNF:

    • Every relation that has only two attributes is in BCNF.

    • Every relation that has only 1 candidate key, if it is in 3NF, then it is in BCNF; except in the very rare case:

      • if an FD X  A exists in R with X not a super key, and A is a prime attribute, then R will be in 3NF but not in BCNF.

    • If the relation has only one candidate key, if you manage to put the relation in 3NF then you will achieve BCNF automatically except in the above rare case.


    Normalizing a relation

    Normalizing a Relation

    • To normalize a relation, means to rid the relation of the modifications anomalies.

    • This usually means breaking up the relation into two (or more) relations. This is called decomposition.

    • Two normalization algorithms are available:

      • 3NF Algorithm:

        • Guarantees every resulting relation is in 3NF

        • Guarantees Lossless Join property

        • Guarantees FD perseverance

      • BCNF Algorithm:

        • Guarantees every resulting relation is in BCNF

        • Guarantees Lossless Join property


    3nf algorithm

    3NF Algorithm

    Step 1: Get the minimal cover for the FDs, and work with it instead of the original FDs.

    Step 2:Combine the attributes on RHSs of any FDs that have the same LHS (i.e A  B and A  C will be combined into A  BC (standard form))

    Step 3: Find the candidate key(s).

    Step 4:Output each of the FDs as a relation by itself (i.e. the FD AB  C will be outputted as the relation (A, B, C)).

    Step 5: If none of the relations (from step 4) contains a key (i.e. we need at least one key), then create one more relation that contains the attributes that form a key.


    Notes on 3nf algorithm

    Notes on 3NF Algorithm:

    • If two relations have the same set of attributes, then eliminate one of them.

    • Always underline a key in each of the new decomposed relations.

    • The key for a new decomposed relation is the determinant of the FD that the relation has resulted from (i.e. AB  C will result in the relation (A, B, C)).


    3nf algorithm example1

    Answer:

    Step 1: As above (no change).

    Step 2: As above (no change).

    Step 3: The key is BE

    Step 4:FD1 R1(B, D)FD2 R2(E, C)

    FD3 R3(A, C, D)FD4 R4(A, C, D)duplicate

    Step 5: None of the relations contains the key, we need

    another relation: R4(B, E)

    Final results:R1(B, D), R2(C, E), R3(A, C, D), and R4(B, E)

    3NF Algorithm Example1

    Normalize the relationR (A, B, C, D, E) with FDs:

    FD1:B  DFD2:E  C FD3: AC  D FD4: CD  A


    Old fd diagram

    Old FD Diagram

    A

    B

    C

    D

    E

    R


    New fd diagram

    New FD Diagram

    B

    D

    A

    C

    D

    R1

    R3

    E

    C

    R2

    B

    E

    R4


    3nf algorithm example2

    Answer:

    Step 1: As above (no change).

    Step 2: As above (no change).

    Step 3: The key is {EmpName, ChildName}

    Step 4:FD1 R1(EmpName, SpouseName)

    FD2 R2(SpouseName, SpouseAge)

    Step 5: None of the relations contains the key, we need

    another relation: R3(EmpName, ChildName)

    Final results:R1, R2 and R3 as above.

    3NF Algorithm Example2

    Normalize the relationR (EmpName, SpouseName, ChildName, SpouseAge) with FDs: FD1:EmpName  SpouseName

    FD2: SpouseName  SpouseAge


    Old fd diagram1

    Old FD Diagram

    EmpName

    SpouseName

    SpouseAge

    ChildName

    R


    New fd diagram1

    New FD Diagram

    EmpName

    ChildName

    R1

    EmpName

    SpouseName

    R2

    SpouseName

    SpouseAge

    R3


    3nf algorithm example3

    Answer:

    Step 1:FD1:A  B, FD2:BC  D ,FD3: D  A ,

    FD4: D  C , FD4: D  E

    Step 2:FD1:A  B, FD2:BC  D ,FD3: D  ACE

    Step 3: The CK is D

    Step 4:FD1 R1(A, B)FD2 R2(B, C, D)

    FD3 R3(A, C, D, E)

    Step 5: The CK is in relation R3 above

    Final results:R1(A, B), R2(B, C, D) and R3(A, C, D, E)

    3NF Algorithm Example3

    Normalize the relationR (A, B, C, D, E) with FDs:

    FD1:A  BFD2:BC  D FD3: D ACE


    Bcnf algorithm

    BCNF Algorithm

    • Step 1: Find an FD that violates BCNF (explicit and implicit)

    • Step 2: Split R into:

      • R1 which contains the attributes of the violating FD.

      • R2 which contains all the attributes in the original relation except the attributes on the RHS of the violating FD.

    • Step 3: Check each of the new relations and repeat steps 1 and 2 on each one of them when the violation is found.


    Notes on bcnf algorithm

    Notes on BCNF Algorithm:

    • This is a universal algorithm that can work to put any relation into 2NF, 3NF or BCNF.

      (i.e. if you want to put a relation into 2NF, then in step 1 of the algorithm, find an FD that violates 2NF).

    • It is better (but not a must) to start with the minimal cover as it gives less FDs to work with.

    • It is better (but not a must) to apply the algorithm to higher violations first (i.e. start with FDs that explicitly violate BCNF, then FDs that explicitly violate 3NF … etc.).


    Bcnf algorithm notes cont

    BCNF Algorithm Notes cont.

    • If you want to preserve the FDs of a relation, then be satisfied with 3NF

    • You can try to apply 3NF (which guarantees that every resulting relation is in 3NF), then check each resulting relation if it is in BCNF. If so, then GREAT; otherwise, you can try applying the BCNF algorithm and risk loosing some of the FDs


    Bcnf algorithm example

    Answer: Remember: The key is {EmpName, ChildName}.

    SinceEmpNname  SpouseName is an explicit violation to 2NF

    and SpouseName  SpouseAge is a explicit violation to 3NF

    We start with the highest violation first:

    We will split SpouseName and SpouseAge into a separate relation:

    R3(SpouseName, SpouseAge)

    Now: what remains in R is what was there originally except what is on the right hand side of the violating FDs (i.e. we exclude SpouseAge):R(EmpName, ChildName, SpouseName)

    BCNF Algorithm Example

    Normalize the relationR (EmpName, SpouseName, ChildName, SpouseAge)with FDs:FD1:EmpName  SpouseName

    FD2: SpouseName  SpouseAge


    Bcnf algorithm example cont

    Answer cont.: So far we have:R3(SpouseName, SpouseAge)

    and R(emp-name, child-name).

    Now we consider the other violation:

    We will split EmpName and SpouseName into a separate relation:

    R2(EmpName, SpouseName)

    Now: what remains in R is what was there originally except what is on the right hand side of the violating FDs (i.e. we exclude SpouseName): R(EmpName, ChildName)

    BCNF Algorithm Example cont.

    Normalize the relationR (EmpName, SpouseName, ChildName, SpouseAge)with FDs:FD1:EmpName  SpouseName

    FD2: SpouseName  SpouseAge


    Schema refinement and normal forms

    1. BCNF decomposition

    Normalize the relation R(A,B,C,D,E)

    with FDs:D->B, CE->A

    DCE is a minimal key.

    R violates BCNF since D (in itself) is not a key.

    D+=DB and we split by D->B. The result is DB, DCEA

    DCEA violates BCNF since CE is not a key. CE+=CEA and we split by CE->A. The result is CEA, CED

    CEA, CED do not violate BCNF.

    R1(D,B), R2(C,E,A) , R3(C,E,D)


    Schema refinement and normal forms

    Normalize the relation S(A,B,C,D,E)

    With FDs: A->E, BC->A, DE->B

    DCE, DCB, DCA are minimal keys

        S violates BCNF since A is not a key

        A+=AE and we split by A->E. The result is AE, ABCD

        ABCD violates BCNF since BC is not a key

        BC+=BCA and we split by BC->A. The result is BCA, BCD.

    BCA, BCD do not violate BCNF

    Answer: AE, BCA, BCD


    Schema refinement and normal forms

    2. 3NF decomposition

      R(A,B,C,D,E); D->B, CE->A

    DCE is a minimal key

    The first condition is already violated (as above for BCNF)

      R violates 3NF since B is not a prime (part of a key)

      D+=DB and we split by D->B. The result is DB, DCEA

    DCEA violates 3NF since A is not a prime.

    CE+=CEA and we split by CE->A. The result is CEA, CED

    CEA, CED do not violate 3NF.

    Answer: DB, CEA, CED


    Schema refinement and normal forms

    S(A,B,C,D,E); A->E, BC->A, DE->B

    DCE, DCB, DCA are minimal keys

    S does not violate 3NF since each right side of given FDs is a prime (part of a key)

    Answer: ABCDE


    Successive normalization example

    Successive Normalization Example

    Normalize the relationR (A, B, C, D, E, F) with FDs:

    FD1:AB  CDEFFD2:B  C FD3: D  F

    Notice: The Key is AB

    R1(B, C)

    R (A, B, D, E, F)

    Remove 2NF

    Violation: B  C

    R (A, B, C, D, E, F)

    R1(B, C)

    R1(D, F)

    R (A, B, D, E)

    Remove 3NF

    Violation: D  F

    R1(B, C) with FD:B  C is in BCNF.Why?

    R (A, B, D, E, F)

    with FDs:AB  DEF &D  F is in 2NF.Why?

    R (A, B, D, E)

    with FD:AB  DE is in BCNF.Why?

    R1(D, F) with FD:D  F is in BCNF.Why?


    Try this important

    Try this …… IMPORTANT

    Assume that AC is the key for the relation R (A,B,C,D,E), and in addition the following FDs hold on the relation R:

    • FD1: A  B

    • FD2: D  E

    • What is the best normal form for the relation R?

    • Normalize the relation R.


    Final word

    R (A, B, C) with FDs: FD1:AB  CFD2:C  B

    Discussion:

    • The CKs are: AB and AC

    • C is a determinant in FD2, but C is not a superkey

    • This is an explicit violation to BCNF, so R is not in BCNF.

    • Problem: Any decomposition of R will fail to preserve the FD AB  C(FD1).

    Final Word …

    • It is not always possible to get a BCNF decomposition that preserves the functional dependencies of a relation, as this example will show.


  • Login