Fundamentals of Database Systems Fourth Edition El Masri & Navathe

103 Views

Download Presentation
## Fundamentals of Database Systems Fourth Edition El Masri & Navathe

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**University Of Palestine**Fundamentals ofDatabase SystemsFourth EditionEl Masri & Navathe Chapter 10 Functional Dependencies and Normalization for Relational Databases**Database Management CH 10**University Of Palestine • Functional Dependencies • Functional dependencies (FDs) are used to specify formal measures of the "goodness" of relational designs. • FDs and keys are used to define normal forms for relations • FDs are constraints that are derived from the meaning and interrelationships of the data attributes.**Database Management CH 10**University Of Palestine • Functional Dependencies • A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y • XY • X Y holds if whenever two tuples have the same value for X, they must have the same value for Y. • X Y in R specifies a constraint on all relation instances r(R).**Database Management CH 10**University Of Palestine • Examples of FD constraints • social security number determines employee name • SSN ENAME • project number determines project name and location • PNUMBER {PNAME, PLOCATION} • City name determines ZIP code • City ZIP • Employee’s ssn and project number determines the hours per week that the employee works on the project • {Level, Experience_Years} Salary**Database Management CH 10**University Of Palestine • Examples of FD constraints • If K is a key of R, then K functionally determines all attributes in R • (since we never have two distinct tuples with t1[K]=t2[K]) • So: • Keys are Special Cases of FDs**Database Management CH 10**University Of Palestine • Inference Rules for FDs • Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold. • Armstrong's inference rules: • IR1.(Reflexive) • If Y ⊇X, then X -> Y • IR2.(Augmentation) • If X Y, then XZ YZ • (Notation: XZ stands for X U Z) • IR3.(Transitive) • If X Y and Y Z, then X -> Z**Database Management CH 10**University Of Palestine • Inference Rules for FDs (Cont.) • IR1, IR2, IR3 form a sound and complete set of inference rules • Some additional inference rules that are useful: • (Decomposition) • If X YZ, then X Y and X Z • (Union) • If X Y and X Z, then X YZ • (Pseudo transitivity) • If X Y and WY Z, then WX Z • The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)**Database Management CH 10**University Of Palestine • F Closure • Closure of a set F of FDs is the set F+of all FDs that can be inferred from F. • F+ =F ∪{all FDs that can be logically imply from F using the inference rules} • So F ≡ F+**Database Management CH 10**University Of Palestine • F Closure • Example: • R(A,B,C,D,E,G,H) • F= { BCD, AC, EGA, CEH } • Find F+ ? • F+ = { BCD, AC, EGA, CEH , • BC, BD, • EG, EA, • CE, CH, • BE, • BH, • …. • …. }**Database Management CH 10**University Of Palestine • Attributes Closure • Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X • Given a relation schema R and a set of FDs F that hold on R, • Let X ⊇R ( X is a set of attributes that found in R) • X+ = X ∪ { all attributes that can be derived from X using F}**Database Management CH 10**University Of Palestine • Attributes Closure • Example: • R(A,B,C,D,E,G,H) • F= { BCD, AC, EGA, CEH } • A+ = ACEHG • (CE)+ = CEHGA • B+ = BCDEHGA • Note: Here B Give me all other attributes, So B is a key for R. • So We can Use X+ to find Keys for any relation**Database Management CH 10**University Of Palestine • Minimal Cover of F • Also called Canonical Cover of F and is denoted by Fc • Every set of FDs has an equivalent Fc. • There can be several equivalent minimal sets.**Database Management CH 10**University Of Palestine • Minimal Cover of F • Compute Fc from F • Let Fc F. • Repeat the following steps until no more reduction can be made on Fc • Remove all redundant attributes from X in XY • Ex. AB, ACB (C is redundant) • Remove all redundant attributes from Y in XY • Ex. ABC, BC (C is redundant) • Replace each pair of FDs of the form • XY1, XY2 • By XY1Y2 • Ex. AB, AC • Then ABC**Database Management CH 10**University Of Palestine • Minimal Cover of F • Example • F= { AC, • ACD, • EAD, • EH } • Remove C from ACD • Then Convert AC, AD to ACD • Then Convert EAD, EH to EADH • Remove D from EADH • Then Fc is { ADC, EAH} Find Fc?**Database Management CH 10**University Of Palestine • Problems caused by Bad Database Design • Wastes storage • Causes problems with update anomalies • Insertion anomalies • Deletion anomalies • Modification anomalies**Database Management CH 10**University Of Palestine • EXAMPLE OF AN UPDATE ANOMALY • Consider the relation: • EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) • Update Anomaly: • Changing the name of project number P1 from “Billing” to “Customer-Accounting” may cause this update to be made for all 100 employees working on project P1.**Database Management CH 10**University Of Palestine • EXAMPLE OF AN INSERT ANOMALY • Consider the relation: • EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) • Insert Anomaly: • Cannot insert a project unless an employee is assigned to it. • Conversely • Cannot insert an employee unless a he/she is assigned to a project.**Database Management CH 10**University Of Palestine • EXAMPLE OF AN DELETE ANOMALY • Consider the relation: • EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) • Delete Anomaly: • When a project is deleted, it will result in deleting all the employees who work on that project. • Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project.**Database Management CH 10**University Of Palestine • Normalization • Dependency Preserving:- • Suppose that the relation R is decomposed into R1,R2, and the set of FDs F that holds on R ar distributed on R1,R2 then: • We can say that the decomposition is dependency preserving iff (F1UF2) ≡F • Otherwise the decomposition is not Dependency Preserving.**Database Management CH 10**University Of Palestine • Normalization (Cont.) • Lossless-join decomposition:- • Suppose that the relation R is decomposed into R1,R2, and the set of FDs F that holds on R ar distributed on R1,R2, If the common attribute between R1 and R2 as a key in either R1 or R2 then: • We can say that the decomposition is Lossless-Join • Otherwise the decomposition is Lossy Join**Database Management CH 10**University Of Palestine • Normalization (Cont.) • Example • Suppose that we have the following relation std_info: • and the following set of FDs that hold on sd_info • F={Sname Major,Phone, • Cname Instructor, • Sname, Cname Score} • Then The Decomposition : • F1={ Sname Major, Department} F2={Cname instructor • Sname, Cname Score} • This Decomposition is Dependency Preserving and Lossless Join**Database Management CH 10**University Of Palestine • Normalization (Cont.) • Normalization: • The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations • Normal form: • Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form**Database Management CH 10**University Of Palestine • Practical Use of Normal Forms • Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties • The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect • The database designers need not normalize to the highest possible normal form • (usually up to 3NF, BCNF or 4NF)**Database Management CH 10**University Of Palestine • Definitions of Keys and Attributes Participating in Keys (1) • A superkeyof a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S] • A key K is a superkeywith the additional property that removal of any attribute from K will cause K not to be a superkey any more.**Database Management CH 10**University Of Palestine • Definitions of Keys and Attributes Participating in Keys (1) • If a relation schema has more than one key, each is called a candidate key. • One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. • A Prime attribute must be a member of some candidate key • A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key.**Database Management CH 10**University Of Palestine • First Normal Form • If a relation schema has more than one key, each is called a candidate key. • One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. • A Prime attribute must be a member of some candidate key • A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key.**Database Management CH 10**University Of Palestine • First Normal Form • Disallows • composite attributes • multivalued attributes • nested relations; attributes whose values for an individual tuple are non-atomic • Considered to be part of the definition of relation**Database Management CH 10**University Of Palestine**Database Management CH 10**University Of Palestine • BCNF (Boyce-Codd Normal Form) • The strongest Normal Form for the relation • Given arelation R and aset of FDs F hold on R, R is in BCNF (i.e. R is a good relation) iff for each FD in the form α -> β in F : α must be a super key. • Each normal form is strictly stronger than the previous one: • Every 2NF relation is in 1NF • Every 3NF relation is in 2NF • Every BCNF relation is in 3NF • There exist relations that are in 3NF but not in BCNF • The goal is to have each relation in BCNF (or 3NF)**Database Management CH 10**University Of Palestine BCNF (Boyce-Codd Normal Form) stdinfo F={ sname major,phone cname instructor sname,cname score } • Each FD on R must satisfy the condition to be in BCNF, otherwise we must divide the relation R into tow relation R1,R2 as follows: • R1(α, β) , R2(R- β) that satisfying the tow conditions • The decomposition is dependency preserving • The decomposition is loosless-join**Database Management CH 10**University Of Palestine • BCNF (Boyce-Codd Normal Form) • Stdinforelation is not in BCNF so we must devide the relation as follows: • R1(sname, major, phone), R2(cname, sname,instructor,score) • The second step is ditrbuting the FDs to R1,R2 • F1={snamemajor,phone}, F2={cnameinstructor • cname,snamescore} • The third step we check the decomposition for dependency preserving and lossless-join conditions. R2 R1 F1 F2**Database Management CH 10**University Of Palestine • BCNF (Boyce-Codd Normal Form) • This decomposition is dependency preserving because (F1UF2)=FD and loosless-join because the common attribute which is Sname is a key for R1. • The forth step is repeating checking for BCNF for the resulting relation R1,R2. • For F1 On R1, α which is • Sname is akey for R1, So • R1 Is in BCNF R1 F1**Database Management CH 10**University Of Palestine BCNF (Boyce-Codd Normal Form) • For R2: • Sname, cname score • α is akeyfor R2 • Cname instructor • α is not akeyfor R2 so,, • We must devide R2 into tow relations R2.1,R2.2 R2 F2 F2={ Sname,Cname->score, Cname->instructor} R2.2 R2.1 F2.1 F2.1 R2.1,R2.2 are both in BCNF Form so finally we have 3 relations which are : R1, R2.1, R2.2 which ar all in BCNF**Database Management CH 10**University Of Palestine Summary of BCNF Checking Check Each FD for the condition α is a key. If any FD fails then we divide the relation into tow relation R1(α , β) , R2(R- β(. Distribute the original FD on R1, R2. Check the decomposition for dependency preserving and loosless-join. Repeat the checking for every resulting relation till having all the relations in BCNF Form.**Database Management CH 10**University Of Palestine BCNF: another example R FD FD={Bname,SnameBnakerName , BankerNameBname,Office#} Now we must check each FD for the BCNF condition: Bname,CnameBankerName : √ ( Bname,Sname is a key) BankerNameBname,Office# : X ( BankerName is not A key) So we must divide R into tow relations R1,R2 R1 R2 F2={ Ø } F1={ BankerName->Bname,Office# }**Database Management CH 10**University Of Palestine • BCNF: another example (Cont.) • Hence BankerName is akey For R1 so the last decomposition is loosless-join, But it is Not dependency Preserving because F1UF2 ≠ FD. • So here we stop and don’t continue with BCNF, we try by 3NF OR 2NF. • Notes: • BCNF has avery restricted condition so not always we have a good decomposition . • BCNF sometimes looses FDs.**Database Management CH 10**University Of Palestine • Third Normal Form 3NF • Given arelation R and aset of FDs F hold on R, R is in 3NF iff for each FD in the form α -> β in F at least one of the following conditions satisfies: • α must be a key for R OR • Each attribute in β is Prime attribute. • FD={Bname,SnameBnakerName, • BankerNameBname,Office#} R FD**Database Management CH 10**University Of Palestine • 3NF (Cont.) • The last relation is not in 3NF Because for the second FD: α is not a key and Office# is not a prime attribute. • So we must divide R into two relations as follows: • R1(α , all non prime attributes in β ) • R2(R- non prime attributes in β ) R1 R2 F1 F2**Database Management CH 10**University Of Palestine • 3NF (Cont.) • F1UF2 = FD So the decomposition is dependency preserving. • BankerName is a key for R1 so The decomposition is loosless join • R1,R2 are in 3NF so finally we have two relations R1,R2 • Important Notes: • 3NF Never ever loose FD. • If R is in BCNF then surely it will be in 3NF • Butt R may be in 3NF but Not in BCNF. R1 R2 F1 F2**Database Management CH 10**University Of Palestine • Second Normal Form 2NF • Given arelation R and aset of FDs F hold on R, R is in 3NF iff for each FD in the form α -> β in F at least one of the following conditions satisfies: • α must be a key for R OR • Each attribute in β is Prime attribute. • α is not a subset of any key**Database Management CH 10**University Of Palestine 2NF: Example FD={ BC->ADE E->G } For F1 BC is a key For F2 E is not a subset of any key So R is in 2NF R FD**Database Management CH 10**University Of Palestine 2NF: Other Example R FD FD={ BC->ADE D->B E->G } F1: BC is akey F2: E is not a subset of any key F3: B is prime attribute So R is in 2NF The decomposition when the test fails is similar to 3NF