- 57 Views
- Uploaded on
- Presentation posted in: General

Functional Dependency Armstrong’s Axioms Closure of FDs Closure of attribute(X) – Find Keys

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Chapter 15Functional Dependencies and Normalization for Relational DatabasesChapter 16Relational Database Design Algorithms and Further Dependencies

- Functional Dependency
- Armstrong’s Axioms
- Closure of FDs
- Closure of attribute(X) – Find Keys
- Minimal set of FDs
- Normalization
- Lossless Join Decomposition

- 1 Informal Design Guidelines for Relational Databases
- 1.1Semantics of the Relation Attributes
- 1.2 Redundant Information in Tuples and Update Anomalies
- 1.3 Null Values in Tuples
- 1.4 Spurious Tuples

- 2 Functional Dependencies (FDs)
- 2.1 Definition of FD
- 2.2 Inference Rules for FDs
- 2.3 Equivalence of Sets of FDs
- 2.4 Minimal Sets of FDs

3 Normal Forms Based on Primary Keys

3.1 Normalization of Relations

3.2 Practical Use of Normal Forms

3.3 Definitions of Keys and Attributes Participating in Keys

3.4 First Normal Form

3.5 Second Normal Form

3.6 Third Normal Form

4 General Normal Form Definitions (For Multiple Keys)

5 BCNF (Boyce-Codd Normal Form)

- What is relational database design?
- The grouping of attributes to form "good" relation schemas

- Two levels of relation schemas
- The logical "user view" level
- The storage "base relation" level

- Design is concerned mainly with base relations
- What are the criteria for "good" base relations?

A simplified COMPANY relational database schema

- Information is stored redundantly
- Wastes storage
- Causes problems with update anomalies
- Insertion anomalies
- Deletion anomalies
- Modification anomalies

Attribute B is functionally dependent on attribute A if at each point in time, each value of A has only one value of B associated with it.

- SSN ENAME
- PNUMBER{PNAME, PLOCATION}
- {SSN, PNUMBER}HOURS

A functional dependency (X Y)

between two sets of attributes X and Y that are

subsets of R specifies a constraint on the

possible tuples that can form a relation state r

of R. The constraint is that, for any two tuples

t1 and t2 in r that have t1[X] = t2[X], we must

also have t1[Y] = t2[Y].

--X is a candidate keyof R—this implies that XYfor any subset of attributes Y of R

--If X Yin R, this does not say whether ornot Y Xin R.

F = {SSN {ENAME, BDATE, ADDRESS, DNUMBER},

DNUMBER {DNAME,DMGRSSN}}

Can infer additional FDs such as:

SSN {DNAME, DMGRSSN},

SSN SSN,

DNUMBER DNAME

- Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold
- Armstrong's inference rules:
- IR1. (Reflexive) If Y subset-of X, then X -> Y
- IR2. (Augmentation) If X -> Y, then XZ -> YZ
- (Notation: XZ stands for X U Z)

- IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z

- IR1, IR2, IR3 form a sound and complete set of inference rules
- These are rules hold and all other rules that hold can be deduced from these

IR4 (decomposition, or projective rule):

{X YZ} |= X Y.

IR5 (union, or additive rule):

{X Y, X Z} |= X YZ.

IR6 (pseudotransitive rule)

{X Y, WY Z } |= WX Z.

- IR1 – Reflexive rule
X Y, then X Y

ssn, name ssn

- IR2 – Augmentation rule
X Y |= XZ YZ

ssn name |= ssn, dob name,dob

- IR3 – Transitive rule
X Y, Y Z |= X Z

Ssn phone#

Phone# zip

Ssn zip

IR4 (decomposition, or projective, rule):

{X YZ} |= X Y.

- X YZ (given).
- YZ Y (using IR1 and knowing that YZ Y).
- X Y (using IR3 on 1 and 2).

IR5 (union, or additive, rule)

{X Y, X Z} |= X YZ.

XY (given).

XZ (given).

X XY (using IR2 on 1 by augmenting with X; notice that XX = X).

XY YZ (using IR2 on 2 by augmenting with Y).

X YZ (using IR3 on 3 and 4).

IR6 (pseudotransitive rule)

{X Y, WY Z } |= WX Z.

X Y (given).

WY Z (given).

WX WY (using IR2 on 1 by augmenting with W).

WX Z (using IR3 on 3 and 2).

a) {w y, x z} |= {wx y}

{w y, x z} |= {wx y}

w y Given

wx xyAug(X)

xy y Reflexive

wx yTransitive

STUID DOB

C# C_TITLE

STUID, C# DOB

Set of FDs that could be derived from FDs using Armstrong’s Axioms

ssn name

ssn dob

=> Ssn name, dob

F =

{ SSNENAME,

NUMBER{NAME,PLOCATION},

{SSN, PNUMBER}HOURS

}

{ SSN }+ = { SSN, ENAME }

{ PNUMBER }+ =

{ PNUMBER, PNAME, PLOCATION }

{ SSN, PNUMBER }+ =

{ SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS }

- Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F
- Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X
- X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F

- Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X
Algorithm 16.1Determining X+, the closure of X

under F

X+ := X

Repeat

old X+ := X+;

for each functional dependency Y Z in F do

if X+ Y then X+ := X+ U Z;

Until ( X+ = old X+ );

p 310

FDs:

A B

B CD

C M

D FG

A+ = A

= AB (A B)

= ABCD (B CD)

= ABCDM (C M)

= ABCDFGM (DFG)

A IS A KEY OF THIS RELATION

STUID F,M,L,STREET, CITY,STATE,ZIP

C# CNAME

ZIP CITY, STATE

STUID,C# GRADE

(STUID)+ = STUID

= STUID, F,M,L,STREET, CITY,STATE,ZIP

(C#)+ = C#

= C#, CNAME

(STUID,C#)+ = STUID,C#

=STUID,C#, F,M,L,STREET, CITY,STATE,ZIP

= STUID,C#, F,M,L,STREET, CITY,STATE,ZIP ,CNAME

= STUID,C#, F,M,L,STREET, CITY,STATE,ZIP ,CNAME,GRADE

Input: A universal relation R and a set of functional dependencies F on the attributes of R.

1.Set K := R;

2.For each attribute A in K {

Compute (K - A)+ with respect to F;

If (K - A)+ contains all the attributes in R,

then set K := K - {A};

}

FDs:

AC B

B CD

C MN

D FG

Is (AC) a candidate key?

(AC)+ = ABCDFGMN

A+ = A

C+ = CMN

(AC)+ = AC

= ABC (AC B)

= ABCD (B CD)

= ABCDMN (C MN)

= ABCDMNFG (D FG)

FDs:

AC B

B CD

C MN

D FG

A C

Is (AC) a candidate key?

(AC)+ = ABCDFGMN

A+ = AC

= ABC (AC B)

= ABCD (B CD)

= ABCDMN (C MN)

= ABCDMNFG (D FG)

FDs:

A B

B CD

C MN

D FG

ABCDFGMN

ACDFGMN A B

AFGMN A B, B CD

AFG A B, B C, C MN

A A B, B D, D FG

FDs:

AC B

B CD

C MN

D FG

A C

Is (AC) a candidate key?

(AC)+ = ABCDFGMN

A+ =

C+ =

Patient(Patient#, Pat_Name, Pat_Adr, Med_Exam_Description, Doctor_Number,

Med_Exam#, Date_Test, Test_Result, Doctor_Name)

- Key:

Hotel_Bill(Guest_Id, Room#, Room_Rate, Date_Checkin, Total_Amount, Guest_Fname,

Guest_Lname, InvoiceNumber, Guest_Address)

Key:

F is minimal if:

- Every dependency in F has a single attribute for its right-hand side.
- We cannot replace any dependency X A in F with a dependency Y A, where Y is a proper subset of X, and still have a set of dependencies that is equivalent to F.
- We cannot remove any dependency from F and still have a set of dependencies that is equivalent to F.

Input: a set of functional dependencies E.

1. Set F := E.

2. Replace each functional dependency X {A1, A2,…, An} in F by the n functional dependencies XA1,

XA2, . . ., XAn .

3. For each functional dependency XA in F

for each attribute B that is an element of X

if ((F - {XA}) D {(X - {B}) A}) is equivalent to F,

then replace XA with (X - {B}) A in F.

4. For each remaining functional dependency XA in F

if (F - {XA}) is equivalent to F,

then remove XA from F.

1) The set of functional dependencies F of relation R(ABCDEFGHIJKLM) is

A A,B,C,D,E

B C,D,E,F

C E,H

D,E F

D F,G

K L

L K

a) Find a minimal cover of the set F.

b) Find a candidate key for the relation R.

Normalization: The process of decomposing complex data structures into simple relations according to a set of dependency rules.

– reduce anomalies (update, insert, delete)

first proposed by Codd (1972)

Assumption: all non-key fields will be updated frequently

- tend to penalize retrieval

- Insertion anomalies
- Deletion anomalies
- Modification anomalies

- Consider the relation:
- EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

- Update Anomaly:
- Changing the name of project number P1 from “Billing” to “Customer-Accounting” may cause this update to be made for all 100 employees working on project P1.

- Consider the relation:
- EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

- Insert Anomaly:
- Cannot insert a project unless an employee is assigned to it.

- Conversely
- Cannot insert an employee unless a he/she is assigned to a project.

- Consider the relation:
- EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

- Delete Anomaly:
- When a project is deleted, it will result in deleting all the employees who work on that project.
- Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project.

- Normalization:
- The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations

- Normal form:
- Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form

- 2NF, 3NF, BCNF
- based on keys and FDs of a relation schema

- 4NF
- based on keys, multi-valued dependencies : MVDs; 5NF based on keys, join dependencies : JDs (Chapter 11)

- Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation; Chapter 11)

- If a relation schema has more than one key, each is called a candidate key.
- One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys.

- A Prime attribute must be a member of some candidate key
- A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key.

Decomposition by analysis

Decomposition by synthesis

Unnormalized & 1NF relation

A relation is in first normal form if and only if every attribute is single-valued for each tuple.

-- 1NF disallows having a set of values, a tuple of values, or a combination of both as an attribute value for a single tuple. In other words, 1NF disallows "relations within relations" or "relations as attributes of tuples." The only attribute values permitted by 1NF are single atomic (or indivisible) values.

- Disallows
- composite attributes
- multivalued attributes
- nested relations; attributes whose values for an individual tuple are non-atomic

- Uses the concepts of FDs, primary key
- Definitions
- Prime attribute: An attribute that is member of the primary key K
- Full functional dependency: a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more

- Examples:
- {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold
- {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds

- A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
- R can be decomposed into 2NF relations via the process of 2NF normalization

A relation is in second normal form if and only if it is in first normal form and all the nonkey attributes are fully functionally dependent on the key.

Grade_report(SID, C#, stu_name, course_title)

not in 2NF (there are partial dependencies)

A functional dependency X Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more; that is, for any attribute AX, (X - {A}) does not functionally determine Y.

- Second normal form (2NF) is based on the concept of full functional dependency. A functional dependency X Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more; that is, for any attribute A X, (X - {A}) does not functionally determine Y. A functional dependency X Y is a partial dependency if some attribute A X can be removed from X and the dependency still holds; that is, for some A X, (X - {A}) Y.

- Second normal form
- is violated when a non-key field is a fact about a subset of a key.
- It is only relevant when the key is composite, i.e., consists of several fields

Function Dependencies (F):

Warehouse Warehouse_address

Part, Warehouse quantity

William Kent (CACM1982)

Normalizing into 2NF and 3NF

- Definition:
- Transitive functional dependency: a FD X -> Z that can be derived from two FDs X -> Y and Y -> Z

- Examples:
- SSN -> DMGRSSN is a transitive FD
- Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold

- SSN -> ENAME is non-transitive
- Since there is no set of attributes X where SSN -> X and X -> ENAME

- SSN -> DMGRSSN is a transitive FD

- A relation schema R is in third normal form (3NF)if it is in 2NFandno non-prime attribute A in R is transitively dependent on the primary key
- R can be decomposed into 3NF relations via the process of 3NF normalization
- NOTE:
- In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key.
- When Y is a candidate key, there is no problem with the transitive dependency .
- E.g., Consider EMP (SSN, Emp#, Salary ).
- Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

Successive Normalization of LOTS into 2NF and 3NF

- 1st normal form
- All attributes depend on the key

- 2nd normal form
- All attributes depend on the whole key

- 3rd normal form
- All attributes depend on nothing but the key

SUMMARY OF NORMAL FORMS based on Primary Keys

- The above definitions consider the primary key only
- The following more general definitions take into account relations with multiple candidate keys
- A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on every key of R

- Definition:
- Superkey of relation schema R - a set of attributes S of R that contains a key of R
- A relation schema R is in third normal form (3NF) if whenever a FD X -> A holds in R, then either:
- (a) X is a superkey of R, or
- (b) A is a prime attribute of R

- NOTE: Boyce-Codd normal form disallows condition (b) above

Third Normal Form(3NF):

A relation is in third normal form if it is in second normal form and no nonkey attribute is transitively dependent on the key.

Course_info(C#, course_title, Instructure_name, Inst_phone)

Not in 3NF – there is a transitive dependency

C# course_title, Instructure_name

Instructure_name Inst_phone

- Third normal form (3NF) is based on the concept of transitive dependency. A functional dependency X Y in a relation schema R is a transitive dependency if there is a set of attributes Z that is neither a candidate key nor a subset of any key of R, and both X Z and ZY hold.
- Codd’s original definition, a relation schema R is in 3NF if it satisfies 2NFand no nonprime attribute of R is transitively dependent on the primary key

Function Dependencies (F):

Employee Department

Department Location

William Kent ( CACM 1982)

Unnormalized & 1NF relation

University Database

(Third Normal Form relations)

- Philip Bernstein, “Synthesizing Third Normal Form Relations from Functional Dependencies,” ACM Transactions on Database Systems, 1976, pp 277-298. (University of Toronto)
To find 3NF relations

- Find Minimal Cover of FDs

- Each FD— convert to 3NF relation

- If key of the Universal DB is not in any relations then create a relation from the key

A) INVOICE(INVOICE#, CUST_ID, CUST_NAME, BILL_DATE, TOT_AMOUNT, SALES_REP#, SALES_REP_NAME)

Key: Normal Form:

3NF relations:

B) PRJ_EMP(PROJECT#, MANAGER_ID, PROJECT_TITLE, EMP_ID, EMP_LNAME, EMP_FNAME, MANAGER_NAME, EXPECT_END_PROJECT_DATE, EMP_HOURLY_RATE,HOURS_WORK_PER_PRJ)

Key: Normal Form:

3NF relations:

A relation is in Boyce-Codd normal form if and only if every determinant is a candidate key (1974).

(every relation in BCNF is also in 3NF)

Determinant– attribute(s) on the LHS of FD

A BC (A is determinant)

- A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A holds in R, then X is a superkey of R
- Each normal form is strictly stronger than the previous one
- Every 2NF relation is in 1NF
- Every 3NF relation is in 2NF
- Every BCNF relation is in 3NF

- There exist relations that are in 3NF but not in BCNF
- The goal is to have each relation in BCNF (or 3NF)

- Two FDs exist in the relation TEACH:
- fd1: { student, course} -> instructor
- fd2: instructor -> course

- {student, course} is a candidate key for this relation and that the dependencies shown follow the pattern in Figure 10.12 (b).
- So this relation is in 3NF but not in BCNF

- A relation NOT in BCNF should be decomposed so as to meet this property, while possibly forgoing the preservation of all functional dependencies in the decomposed relations.
- (See Algorithm 11.3)

Let R be a relation having attributes or sets of attributes A,B,C. There is a multivalued dependence of attribute B on attribute A if and only if the set of B values associated with a given A value is independent of the C values.

A > B,C

B -/-> C or C -/-> B

A relation is in fourth normal form if and only if it is in Boyce-Codd normal form and there are no nontrivial multivalued dependencies.

Employee > Skill, Language

Language --/ skill

William Kent (1982)

A relation is in fifth normal form if no remaining nonloss projections are possible, except the trivial one in which the key appears in each projection.

William Kent (CACM 1982)

A relation is in domain-key normal form if every constraint is a logical consequence of domain constraints or key constraints.

Classify each of the following relations as 1NF, 2NF, 3NF, BCNF, 4NF, or 5NF. (State any assumptions you make)

a) COURSE(COURSE#,C_TITLE,INS_NAME,TEXT_BOOK,PUBLISHER)

b) ENROLLMENT(STUID,COURSE#,STUNAME,COURSE_TITLE,

GRADE)

c) PART(PART#,PART_NAME,DEPARTMENT,PRICE)

d) EMPLOYEE(EMPID,EMPNAME,DATE_HIRED,JOB_TITLE,

JOB_LEVEL,PROJ#,PROJ_TITLE,DUEDATE)

e) BOOK(ISBN,TITLE,AUTHOR,AUTHOR_INSTITUTE,PRICE,

PUBLISHER_NAME,PUBLISHER_ADR)

f) CUSTOMER(CUSTID,CUSTNAME, PHONE#,

STREET,CITY,STATE,ZIP)

(R1, R2) is a decomposition of R

F is a set of functional dependencies

(R1, R2) is lossless join with to F if and only if

R1 R2 R1 – R2

or R1 R2 R2 – R1

And R1 U R2 = R

- Dependency preservation: Let Fibe the set of dependencies F+ that include only attributes in Ri.
- dependency preserving
(F1 F2 … Fn)+ = F+

- Otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive.

- dependency preserving

Example -- dependency preservation:

R = (A, B, C)F = {A B, B C)

Two ways to decompose this relation:

1) R1 = (A, B), R2 = (B, C)

- Lossless-join decomposition:
R1 R2 = {B} and R2 -R1 = C

- Dependency preserving
2) R1 = (A, B), R2 = (A, C)

- Lossless-join decomposition:
R1 R2 = {A} and R1 -R2 = B

Not dependency preserving (cannot check B C without computing R1 x R2)

3NF – always dependency preservation

BCNF – not always

1) The set of functional dependencies F of relation R(ABCDEFGHIJKLM) is

A A,B,C,D,E

B C,D,E,F

C E,H

D,E F

D F,G

K L

L K

a) Find a minimal cover of the set F.

b) Find a candidate key for the relation R.

c) Find a lossless join decomposition of R into 3NF (also give a proof of lossless join).

2) The set of functional dependencies F of relation R(ABCDEFGHIJKL) is

A ---> B,C,D,E,K

BC ---> D,E,F

C ---> K,H

D,B---> G

D ---> F,G

K ---> L

a) Find a minimal cover of the set F.

b) Find a candidate key for the relation R.

c) Find a lossless join decomposition of R into 3NF (also give a proof of lossless join).

- REFERENCES
- E.F. Codd, "A Relational Model of Data for Large Shared Data Banks", Comm. ACM 13 (6), June 1970, pp. 377-387. The original paper introducing the relational data model.
- E.F. Codd, "Normalized Data Base Structure: A Brief Tutorial", ACM SIGFIDET Workshop on Data Description, Access, and Control, Nov. 11-12, 1971, San Diego, California, E.F. Codd and A.L. Dean (eds.). An early tutorial on the relational model and normalization.
- E.F. Codd, "Further Normalization of the Data Base Relational Model", R. Rustin (ed.), Data Base Systems (Courant Computer Science Symposia 6), Prentice-Hall, 1972. Also IBM Research Report RJ909. The first formal treatment of second and third normal forms.
- C.J. Date, An Introduction to Database Systems (third edition), Addison-Wesley, 1981. An excellent introduction to database systems, with emphasis on the relational.
- R. Fagin, "Multivalued Dependencies and a New Normal Form for Relational Databases", ACM Transactions on Database Systems 2 (3), Sept. 1977. Also IBM Research Report RJ1812. The introduction of fourth normal form.
- R. Fagin, "Normal Forms and Relational Database Operators", ACM SIGMOD International Conference on Management of Data, May 31-June 1, 1979, Boston, Mass. Also IBM Research Report RJ2471, Feb. 1979. The introduction of fifth normal form.
- W. Kent, "A Primer of Normal Forms", IBM Technical Report TR02.600, Dec. 1973. An early, formal tutorial on first, second, and third normal forms.
- T.-W. Ling, F.W. Tompa, and T. Kameda, "An Improved Third Normal Form for Relational Databases", ACM Transactions on Database Systems, 6(2), June 1981, 329-346. One of the first treatments of inter-relational dependencies.

a) {w y, x z} |= {wx y}

{w y, x z} |= {wx y}

w y Given

wx xyAug(X)

xy y Reflexive

wx yTransitive

STUID DOB

C# C_TITLE

STUID, C# DOB