Functional Dependency Armstrong’s Axioms Closure of FDs Closure of attribute(X) – Find Keys

Chapter 15Functional Dependencies and Normalization for Relational DatabasesChapter 16Relational Database Design Algorithms and Further Dependencies

Functional Dependency • Armstrong’s Axioms • Closure of FDs • Closure of attribute(X) – Find Keys • Minimal set of FDs • Normalization • Lossless Join Decomposition

Outline • 1 Informal Design Guidelines for Relational Databases • 1.1Semantics of the Relation Attributes • 1.2 Redundant Information in Tuples and Update Anomalies • 1.3 Null Values in Tuples • 1.4 Spurious Tuples • 2 Functional Dependencies (FDs) • 2.1 Definition of FD • 2.2 Inference Rules for FDs • 2.3 Equivalence of Sets of FDs • 2.4 Minimal Sets of FDs

Outline 3 Normal Forms Based on Primary Keys 3.1 Normalization of Relations 3.2 Practical Use of Normal Forms 3.3 Definitions of Keys and Attributes Participating in Keys 3.4 First Normal Form 3.5 Second Normal Form 3.6 Third Normal Form 4 General Normal Form Definitions (For Multiple Keys) 5 BCNF (Boyce-Codd Normal Form)

What is relational database design? • The grouping of attributes to form "good" relation schemas • Two levels of relation schemas • The logical "user view" level • The storage "base relation" level • Design is concerned mainly with base relations • What are the criteria for "good" base relations?

A simplified COMPANY relational database schema

1.2 Redundant Information in Tuples and Update Anomalies • Information is stored redundantly • Wastes storage • Causes problems with update anomalies • Insertion anomalies • Deletion anomalies • Modification anomalies

Functional Dependency: Attribute B is functionally dependent on attribute A if at each point in time, each value of A has only one value of B associated with it.

Examples: • SSN ENAME • PNUMBER{PNAME, PLOCATION} • {SSN, PNUMBER}HOURS

functional dependency A functional dependency (X Y) between two sets of attributes X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], we must also have t1[Y] = t2[Y].

--X is a candidate keyof R—this implies that XYfor any subset of attributes Y of R --If X  Yin R, this does not say whether ornot Y Xin R.

Inference Rules: F = {SSN {ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER {DNAME,DMGRSSN}} Can infer additional FDs such as: SSN {DNAME, DMGRSSN}, SSN SSN, DNUMBER DNAME

Inference rules (Armstrong’s Axioms) • Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold • Armstrong's inference rules: • IR1. (Reflexive) If Y subset-of X, then X -> Y • IR2. (Augmentation) If X -> Y, then XZ -> YZ • (Notation: XZ stands for X U Z) • IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z • IR1, IR2, IR3 form a sound and complete set of inference rules • These are rules hold and all other rules that hold can be deduced from these

Inference rules (Armstrong’s Axioms) -continued IR4 (decomposition, or projective rule): {X  YZ} |= X Y. IR5 (union, or additive rule): {X  Y, X  Z} |= X  YZ. IR6 (pseudotransitive rule) {X  Y, WY  Z } |= WX  Z.

Armstrong’s Axioms • IR1 – Reflexive rule X  Y, then X  Y ssn, name  ssn • IR2 – Augmentation rule X  Y |= XZ  YZ ssn  name |= ssn, dob  name,dob

IR3 – Transitive rule X  Y, Y  Z |= X  Z Ssn  phone# Phone#  zip Ssn  zip

PROOF OF IR4 (USING IR1 THROUGH IR3) IR4 (decomposition, or projective, rule): {X YZ} |= X Y. • X YZ (given). • YZ Y (using IR1 and knowing that YZ Y). • X Y (using IR3 on 1 and 2).

PROOF OF IR5 (USING IR1 THROUGH IR3) IR5 (union, or additive, rule) {X  Y, X  Z} |= X  YZ. XY (given). XZ (given). X  XY (using IR2 on 1 by augmenting with X; notice that XX = X). XY YZ (using IR2 on 2 by augmenting with Y). X YZ (using IR3 on 3 and 4).

PROOF OF IR6 (USING IR1 THROUGH IR3) IR6 (pseudotransitive rule) {X Y, WY Z } |= WX Z. X Y (given). WY Z (given). WX WY (using IR2 on 1 by augmenting with W). WX Z (using IR3 on 3 and 2).

Exercise a) {w  y, x  z} |= {wx  y} {w y, x  z} |= {wx y} w  y Given wx  xy Aug(X) xy  y Reflexive wx  y Transitive STUID  DOB C#  C_TITLE STUID, C#  DOB

Closure of FDs (F+) Set of FDs that could be derived from FDs using Armstrong’s Axioms ssn  name ssn  dob => Ssn  name, dob

closure sets with respect to F (Example) F = { SSNENAME, NUMBER{NAME,PLOCATION}, {SSN, PNUMBER}HOURS } { SSN }+ = { SSN, ENAME } { PNUMBER }+ = { PNUMBER, PNAME, PLOCATION } { SSN, PNUMBER }+ = { SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS }

Inference Rules for FDs (3) • Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F • Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X • X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F

Closure of attribute X (X+) • Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X Algorithm 16.1Determining X+, the closure of X under F X+ := X Repeat old X+ := X+; for each functional dependency Y  Z in F do if X+ Y then X+ := X+ U Z; Until ( X+ = old X+ ); p 310

ExampleR(A,B,C,D,M,F,G) FDs: A  B B  CD C M D  FG A+ = A = AB (A  B) = ABCD (B  CD) = ABCDM (C  M) = ABCDFGM (DFG) A IS A KEY OF THIS RELATION

R(STUID,F,M,L,STREET,CITY,STATE,ZIP,C#,CNAME,GRADE) STUID  F,M,L,STREET, CITY,STATE,ZIP C#  CNAME ZIP  CITY, STATE STUID,C#  GRADE (STUID)+ = STUID = STUID, F,M,L,STREET, CITY,STATE,ZIP (C#)+ = C# = C#, CNAME (STUID,C#)+ = STUID,C# =STUID,C#, F,M,L,STREET, CITY,STATE,ZIP = STUID,C#, F,M,L,STREET, CITY,STATE,ZIP ,CNAME = STUID,C#, F,M,L,STREET, CITY,STATE,ZIP ,CNAME,GRADE

Algorithm 16.2 (a): Finding a Key K for R Given a set F of Functional Dependencies Input: A universal relation R and a set of functional dependencies F on the attributes of R. 1.Set K := R; 2.For each attribute A in K { Compute (K - A)+ with respect to F; If (K - A)+ contains all the attributes in R, then set K := K - {A}; }

Find key of R(ABCDFGMN) FDs: AC  B B  CD C MN D  FG Is (AC) a candidate key? (AC)+ = ABCDFGMN A+ = A C+ = CMN (AC)+ = AC = ABC (AC B) = ABCD (B  CD) = ABCDMN (C MN) = ABCDMNFG (D  FG)

Find key of R(ABCDFGMN) FDs: AC  B B  CD C MN D  FG A  C Is (AC) a candidate key? (AC)+ = ABCDFGMN A+ = AC = ABC (AC B) = ABCD (B  CD) = ABCDMN (C MN) = ABCDMNFG (D  FG)

Find key of R(ABCDFGMN) FDs: A  B B  CD C  MN D  FG ABCDFGMN ACDFGMN A  B AFGMN A  B, B  CD AFG A  B, B  C, C MN A A  B, B  D, D  FG

Find key of R(ABCDFGMN) FDs: AC  B B  CD C MN D  FG A  C Is (AC) a candidate key? (AC)+ = ABCDFGMN A+ = C+ =

Example: Patient(Patient#, Pat_Name, Pat_Adr, Med_Exam_Description, Doctor_Number, Med_Exam#, Date_Test, Test_Result, Doctor_Name) • Key:

Example: Hotel_Bill(Guest_Id, Room#, Room_Rate, Date_Checkin, Total_Amount, Guest_Fname, Guest_Lname, InvoiceNumber, Guest_Address) Key:

16.1.3 Minimal Sets of Functional Dependencies F is minimal if: • Every dependency in F has a single attribute for its right-hand side. • We cannot replace any dependency X A in F with a dependency Y A, where Y is a proper subset of X, and still have a set of dependencies that is equivalent to F. • We cannot remove any dependency from F and still have a set of dependencies that is equivalent to F.

Algorithm 16.2 Finding a minimal cover F for a set of functional dependencies E (p550) Input: a set of functional dependencies E. 1. Set F := E. 2. Replace each functional dependency X {A1, A2,…, An} in F by the n functional dependencies XA1, XA2, . . ., XAn . 3. For each functional dependency XA in F for each attribute B that is an element of X if ((F - {XA}) D {(X - {B}) A}) is equivalent to F, then replace XA with (X - {B}) A in F. 4. For each remaining functional dependency XA in F if (F - {XA}) is equivalent to F, then remove XA from F.

Example 1) The set of functional dependencies F of relation R(ABCDEFGHIJKLM) is A  A,B,C,D,E B  C,D,E,F C  E,H D,E  F D  F,G K  L L  K a) Find a minimal cover of the set F. b) Find a candidate key for the relation R.

Normalization: The process of decomposing complex data structures into simple relations according to a set of dependency rules. – reduce anomalies (update, insert, delete) first proposed by Codd (1972) Assumption: all non-key fields will be updated frequently - tend to penalize retrieval

Problems with update anomalies • Insertion anomalies • Deletion anomalies • Modification anomalies

EXAMPLE OF AN UPDATE ANOMALY • Consider the relation: • EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) • Update Anomaly: • Changing the name of project number P1 from “Billing” to “Customer-Accounting” may cause this update to be made for all 100 employees working on project P1.

EXAMPLE OF AN INSERT ANOMALY • Consider the relation: • EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) • Insert Anomaly: • Cannot insert a project unless an employee is assigned to it. • Conversely • Cannot insert an employee unless a he/she is assigned to a project.

EXAMPLE OF AN DELETE ANOMALY • Consider the relation: • EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) • Delete Anomaly: • When a project is deleted, it will result in deleting all the employees who work on that project. • Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project.

15.3 Normalization of Relations (1) • Normalization: • The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations • Normal form: • Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form

Normalization of Relations (2) • 2NF, 3NF, BCNF • based on keys and FDs of a relation schema • 4NF • based on keys, multi-valued dependencies : MVDs; 5NF based on keys, join dependencies : JDs (Chapter 11) • Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation; Chapter 11)

Definitions of Keys and Attributes Participating in Keys (2) • If a relation schema has more than one key, each is called a candidate key. • One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. • A Prime attribute must be a member of some candidate key • A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key.

Problems:

Decomposition by Analysis and Synthesis techniques Decomposition by analysis Decomposition by synthesis

Functional Dependency Armstrong’s Axioms Closure of FDs Closure of attribute(X) – Find Keys