1 / 40

Chapter 14 Database Design Theory: Introduction to Normalization Using Functional Dependencies

Chapter 14 Database Design Theory: Introduction to Normalization Using Functional Dependencies. Functional Dependencies. R : Relation schema. r(R) : Relation instance (state) A1, A2, …, An : Attributes of R. X, Y : Sets of attributes. X  { A1, A2, …, An}. Functional Dependencies.

inge
Download Presentation

Chapter 14 Database Design Theory: Introduction to Normalization Using Functional Dependencies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 14Database Design Theory:Introduction to Normalization Using Functional Dependencies

  2. Functional Dependencies • R: Relation schema. • r(R): Relation instance (state) • A1, A2, …, An: Attributes of R. • X, Y: Sets of attributes. • X  {A1, A2, …, An}

  3. Functional Dependencies • A functional dependency (X  Y), between X and Y • specifies a constraint on the possible tuples that can form a relation instance r of R. • The constraint states that for any two tuples t1 and t2 in r(R) • such that t1[X] = t2[X], then t1[Y] = t2[Y]. • This means that the values of Ydepends on (or determined by) the values of the X. • the values of the Xuniquely or functionally determines the values of Y.

  4. Functional Dependencies • Note that: • if X  Y, this does not say whether Y X is correct or not. • Student(SSN, STNO, Name, Major) • SSN  Name • t1(91910, 980090012, Ahmed, …..) • t2(91910, 980090012, Ahmed, …..) • STNO  Major • t10((980090012,…, Math) • t12(980090012, …, Math) • SSN  {SSN, Name, STNO, Major}

  5. Functional Dependencies 1 Manage 1 Dept Manager • Manager(SSN, Name, …. Dept) • SSN  {Name, Dept} (correct) • SSN uniquely determines the Name and Dept • Dept  SSN (correct) • Dept uniquely determines the SSN • Name  SSN (incorrect)

  6. Inference Rules for Functional Dependencies • Let • F: set of functional dependencies defined on R • F+ (Closure of F): is the set of all functional dependencies that can be defined on R • The closure of F is denoted by F+ • F+ = { X  Y | F ╞X  Y} • We usually write F ╞ X  Y to denote that • the FD X Y is inferred from F • X, Y are subsets of attributes

  7. Inference Rules for Functional Dependencies • Emp-Dept(SSN, Ename, Bdate, Address, DNo, Dname, MGR-SSN) F = { SSN  {Ename,Bdate,Address,DNo}, DNo  {Dname, MGR-SSN} } • We can infer the following FDs: • SSN  SSN (Reflexive) • SSN  Ename (Decomposition) • SSN {Dname, MGR-SSN} (Transitive) • A BIG F+ may be derived from a small F • For R(A, B, C) and F = {A  B, B  C} • F+ = {A  B, B  C, A  C, A  A, B  B, C  C, AB  AB, AB  A, AB  B, ... }

  8. Functional Dependency • Question: if all X-values are different in all possible r(R), does X  Y in R? • Theorem 1: • If X is a superkey of R and Y is any subset of R, then X  Y in R • If X is a candidate key then X  Y is correct for any subset of attributes of R. • Note that X  Y in R is a property that must be true for all possible legal r(R), not just for the present r(R) • Note that we cannot infer (نشتق) FDs from r(R)

  9. Functional Dependency Example: which is true? A B C D A  B a1 b1 c1 d1 A  C a1 b2 c1 d2 C  A a2 b2 c2 d2 A  D a2 b3 c2 d3 B  D a3 b3 c2 d4 AB  D AB  C AB  CD

  10. Identify Functional Dependency • FD created by assertions. • Employees(SSN, Name, YearOfEmp, Salary, Bonus) • Assertion: • Employees hired the same year have the same salary • This assertion implies: • YearsOfEmp  Salary

  11. Inference Rules • IR1: Reflexive Rule ( منعكس) • Y  X, then X Y • e.g. • SSN  SSN • {P#, S#, Qty}  Qty • IR2: Augmentation Rule زيادة)) • { X  Y} ╞ XZ  YZ • e.g.: F = {SSN  Address} • F ╞ {SSN, Name}  {Address, Name}

  12. Inference Rules • IR3: Transitive rule: (متعدّي) • {X  Y, Y  Z} ╞ X  Z • e.g. • F = {SSN  DNo, DNo  Dname} • F ╞ SSN  Dname • IR4: Decomposition Rule: (التّحلّل) • X  {Y, Z} ╞ X  Y and ╞ X  Z • e.g. • F = {SSN  {Ename, BDate, Address, DNo}} • F ╞ SSN  Ename • ╞ SSN  Bdate • ╞ SSN  Address • ╞ SSN  DNo

  13. Inference Rules • IR5: Union (Additive) Rule: • {X  Y, X  Z} ╞ X  {Y, Z} • e.g. • F = {SSN  Ename, SSN  Bdate} • F ╞ SSN  {Ename, Bdate} • IR6: Pseudo-transitive Rule: • {X  Y, WY  Z} ╞ WX  Z • e.g. • F = { SSN  STNO, {Major, STNO}  Name} • F ╞ {Major, SSN}  Name

  14. Closure of Attributes • How to determine if F ╞ X  Y is true? • Compute X+ : the closure of X under F • X+ denotes the set of attributes that are functionally determined by X under F. • X+ = { Y | X  Y  F+ } • Theorem: • X Y  F+ if and only if Y  X+

  15. Algorithm for Computing X+ • Input: • a set of FDs F, a set of attributes X in R • Output: • X+ • Begin • X+ = X; • Repeat • oldX+ = X+ • for each FD Y Z in F do • if Y  X+ then X+ = X+Z; • until (X+ = oldX+ ) • end

  16. Algorithm for Computing X+ • Example: • R(A, B, C, G, H, I) = ABCGHI • X = AG • F = {A  B, CG  HI, B  H, A  C } • Compute X+ = (AG)+ • Initialization: • X+ = AG;

  17. Algorithm for Computing X+ • 1st iteration: X+ = AG • consider A  B: • since A is a subset of X+, X+ = X+ {B} = ABG; • consider CG  HI: • since CG is not a subset of X+, X+ = ABG; • consider B  H: • since B is a subset of X+, X+=X+ {H} = ABGH; • consider A C: • since A is a subset of X+, X+=X+{C}=ABCGH; • X+ is changed from AG to ABCGH

  18. Algorithm for Computing X+ • 2nd iteration: X+ = ABCGH • consider CG  HI: • since CG is a subset of X+, X+=X+{HI}=ABCGHI; • X+ is changed from ABCGH to ABCGHI

  19. Algorithm for Computing X+ • 3rd iteration: • consider each FD in F again, but there is no change to X+, exit • Result: • (AG)+ = ABCGHI.

  20. Algorithm for Computing X+ • Theorem: • Given R(A1, ..., An) and a set of FDs F in R, K  R is a • superkey if K+ = {A1, ..., An}; • candidate key if K is a superkey and for any proper subset X of K, X+{A1, ..., An}.

  21. Algorithm for Computing X+ • Continue the above example: • AG is a superkey of R since • (AG)+ = ABCGHI. • Since A+ = ABCH, G+ = G, neither A nor G is a superkey. • Hence, AG is a candidate key

  22. Normal Forms Based on Primary Keys • Normalization of data: • is a process during which unsatisfactory relation schemas are decomposed by breaking up their attributes into smaller relation schemas that posses desirable properties • We normalize data for several reasons one of them is to avoid update anomalies • We have • First Normal Form (1NF) • Second Normal Form (2NF) • Third Normal Form (3NF) • Boyce-Codd Normal Form (BCNF) (a stronger definition of 3NF) • All the above normal forms are based on functional dependencies.

  23. Basic Definitions • R = {A1, …, An) • Superkey: is set of attributes S  R and t1[S]  t2[S] i =1, ..., n. S may contain redundant attributes. • Key (K): is a superkey with no redundant attributes, i.e. removal of any attribute from K will no longer make it a superkey. • Candidate Key: if a relation has more than one key, each is called a candidate key. One of these keys is arbitrarily chosen as a primary key. • Prime Attribute: is an attribute which is a member of any key (primary or candidate); other attributes are called nonprime attributes.

  24. Basic Definitions • Student(SSN, STNO, Name, Address, Salary) • Superkeys • SSN • {SSN, Name} • {SSN, STNO, Name, Address, Salary} • Candidate keys • SSN • STNO • Primary Key • SSN • Prime Attribute: • SSN • STNO • Nonprime Attributes: • Name • Address • Salary

  25. 1NF (First Normal Form) • A relation schema R is in 1NF if every attribute of R takes only single and simple values. • In other words, multivalued and composite attributes are disallowed.

  26. 1NF (First Normal Form) • Student(STNO, StName, {Course(CNO, Ctitle)} • The set braces {} identify the attribute Course as multivalued • The set braces () identify the attribute Course as a composite attribute • Because of the last attribute (Course), the Student relation schema is not in 1NF.

  27. 1NF (First Normal Form) • Student(STNO, StName, {Course(CNO, Ctitle)} • To normalize it to 1NF: • Student(STNO, StName, CNO, Ctitle)

  28. Second Normal Form (2NF) • Prime attribute: an attribute in any candidate key. • Y is fully functionally dependent on X if • X  Y and • no proper subset of X functionally determines Y • FD X  Y is a fully functional dependency if removal of any attribute A from X means that the dependency does not hold any more. • In other words (X - {A}) does not determine Y • A FD X  Y is a partial FD if exist some attribute A which belongs to X and (X - {A})  Y still holds

  29. Second Normal Form (2NF) • General Definition of 2NF • A relation schema R is in 2NF: • if everynonprime attribute A in R is not partially dependent on any key of R • Or • if every nonprime attribute is fully functionally dependent on every key of R

  30. Second Normal Form (2NF) • This relation Emp-Proj is not in 2NF • Emp-Proj(SSN, Pnumber, Hours, Ename, Pname, Plocation) • FD1: {SSN, Pnumber}  Hours (FD) • FD2: SSN Ename (PD) • FD3: Pnumber {Pname, Plocation} (PD) • Because of FD2 and FD3 • Convert to 2NF Normalization • Emp(SSN, Ename) • Proj(Pnumber, Pname, Plocation) • Work(SSN, Pnumber, Hours)

  31. Second Normal Form (2NF) • Consider • Bank-Loans (Bank_name, Assets, Headquarter, Loan_no, Customer_name, Amount), • FD1: Bank_name {Assets, Headquarter} • FD2: {Bank_name, Loan_no}  {Customer_name, Amount} • Because of FD1, Bank-Loans is not in 2NF. • 2NF Normalization • Banks(Bank_name, Assets, Headquarter) • Loans(Bank_name, Loan_no, Customer_name, Amount)

  32. Second Normal Form (2NF) • 2NF is not good enough: • A relation schema in 2NF can still have serious redundancy problem as well as insertion and deletion anomalies. • Consider Parts(Part_no, Name, Location, Unit_price, Manu_id, Manu_name, Manu_Address) • It is obvious that Parts is in 2NF • Redundancy and various anomalies are introduced by • Manu_id  {Manu_name, Manu_Address}

  33. Second Normal Form (2NF) • Consider • EMP_DEPT(SSN, EName, BDate, Address, DNo, DName, DMGRSSN) • It is obvious that EMP_DEPT is in 2NF • Redundancy and various anomalies are introduced by • DNo  {DName, DMGRSNN}

  34. Third Normal Form (3NF) • A relation schema R is in 3NF if • for every FD X  A, where A is a single attribute, at least one of the following is true: • (a) A  X; • (b) A is a prime; • (c) X is a superkey • R is not in 3NF if a non-prime non-trivially depends on a non-superkey. • If R in 3NF, it should not have a nonkey attribute functionally determined by another nonkey attribute (or by a set of nonkey attributes)

  35. Third Normal Form (3NF) • Emp-Dept(SSN, Ename, Bdate, Address, Dnumber, Dname, DMGR-SSN) • FD1: SSN  {Ename, Bdate, Address, Dnumber, Dname, DMGR-SSN} • FD2: Dnumber  {Dname, DMGR-SSN} • Emp-Dept is in 1NF, 2NF, but because of • SSN  Dnumber and • Dnumber  Dname and • Dname is a nonprime attribute, and • Dnumber is not a superkey, • Emp-Dept is not in 3NF. • To transform Emp-Dept into 3NF: • Emp(Enam, SSN, Bdate, Address, Dnumber) • Dept(Dnumber, Dname, DMG-SSN)

  36. Third Normal Form (3NF) • Employees (SSN, Name, Age, Salary, Dept_name, Dept_manager_SSN). • Employees is in 2NF • since SSN is the only candidate key and every attribute is fully dependent on it. • Employees is not in 3NF because • Dept_name  Dept_manager_SSN

  37. Third Normal Form (3NF) • LOTS(Property-ID#, County-Name, Lot#, Area, Price, Tax-Rate) • FD1: Property-ID#  {County-Name,Lot#,Area, Price, Tax-Rate} • FD2: {County-Name,Lot#}{Property-ID#,Area,Price,Tax-Rate} • FD3: County-Name  Tax-Rate • FD4: Area  Price • 2 candidate keys: {Property-ID#},{County-Name, Lot#} • LOTS is not in 2NF, because of County-Name  Tax-Rate • Tax-Rate is partially dependent on the candidate key {County-Name, Lot#}. • 2NF: • LOTS1(Property-ID#, County-Name, LOT#, Area, Price) • LOTS2(County-Name, Tax-Rate)

  38. Third Normal Form (3NF) • The relation LOTS1 is not in 3NF, because of Area  Price • Area is not a superkey and Price is not prime attribute • 3NF: • LOTS1A(Property-ID#, County-Name, Lot#, Area) • LOTS1B(Area, Price) • LOT2(County-Name, Tax-Rate) • The above relation schemas are in 3NF

  39. Boyce-Codd Normal Form (BCNF) • Assume that we have thousands of lots, but two-counties: • Marion county and Liberty county. • Lot areas in Marion county are • .5, .6, .7, .8, .9 and 1 acres • Lot areas in Liberty county are • 1.1,1.2, …, 1.9, 2.0 acres • In this case we have : AREA  County-Name • LOTS1A(Property-ID#, County-Name, Lot#, Area) • FD5: Area  County-Name • Still in 3NF , since County-Name is a prime attribute

  40. Boyce-Codd Normal Form (BCNF) • A relation schema R is in BCNF if whenever a FD X  A holds in R, then X is a superkey of R. • R is in BCNF if for every non-trivial FD, the left side is a superkey. • LOTS1A-X(Property-ID#, Area, Lot#) • LOTS1A-Y(Area, County-Name) • To describe a relation schema R as “good” it should be at least in 3NF (general) or BCNF • If R is in BCNF, then R is also in 3NF • However, R in 3NF R in BCNF

More Related