Create Presentation
Download Presentation

CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition)

CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition)

198 Views

Download Presentation
Download Presentation
## CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition)

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**CS 222 Database Management SystemSpring 2010-11 Lecture 5**Database Design (Decomposition) KorraSathyaBabu Department of Computer Science NIT Rourkela**Recap**Design of DB is needed to reduce redundancy and anomalies The theory of Functional Dependency is completely studied Better Design requires schema refinement A solution for schema refinement is Synthesis of relations Database Design**Relation Decomposition**R1 R-X+ X X+-X R2 R Database Design**Relation Decomposition**Reason for Decomposition A solution for reducing redundancy and Anomalies Rules for synthesis Lossless Join (Information Preservation) Dependency Preservation (a special case of information preservation) Decomposition (synthesis) types By functional dependency By multi-valued dependency By Join dependency Database Design**Lossless Join**Definition A decomposition D = {R1, R2,..., Rm} of R has the lossless join property with respect to the set of dependencies F on R if, for every relation r of R that satisfies F, the following holds, (R1(r), ..., Rm(r)) = r where is the natural join of all the relations in D The word loss in lossless refers to loss of information, not to loss of tuples. Database Design**Test for Lossless Join**Lossless Join Test Algorithm: Step 1: Create an initial matrix S with one row i for each relation Ri in D, and one column j for each attribute Aj in R. Step 2: Set S(i, j) := bijfor all matrix entries Step 3: For each row i representing relation schema Ri Do {for each column j representing Aj do {if relation Ri includes attribute Aj then set S(i, j) := aj;} Step 4: Repeat the following loop until a complete loop execution results in no changes to S. Input: A relation R, a decomposition D = {R1, R2,..., Rm} of R, and a set F of Functional Dependencies Database Design**Test for Lossless Join**Lossless Join Test Algorithm: continues… Step 4: Repeat the following loop until a complete loop execution results in no changes to S. If {for each function dependency X Y in F do for all rows in S which have the same symbols in the columns corresponding to attributes in X do {make the symbols in each column that correspond to an attribute in Y be the same in all these rows as follows: if any of the rows has an “a” symbol for the column, set the other rows to the same “a” symbol in the column. If no “a” symbol exists for the attribute in any of the rows, choose one of the “b” symbols that appear in one of the rows for the attribute and set the other rows to that same “b” symbol in the column;}} Step 5:If a row is made up entirely of “a” symbols, then the decomposition has the lossless join property; otherwise it does not. Database Design**SSN**ENAME SSN PNUM hours Example 1 Emp_PROJ SSN PNUM hours ENAME PNAME PLOCATION F = {SSN ENAME, PNUM {PNAME, PLOCATION}, {SSN, PNUM} hours} R1 R2 PNUM PNAME PLOCATION R3 Database Design**R1**b15 R1 b13 b16 b11 b14 b12 b24 R2 R2 b21 b22 b25 b23 b26 b36 R3 b35 R3 b33 b32 b31 b34 a1 a2 b13 b14 b15 b16 b21 b22 a3 a4 a5 b26 a1 b32 a3 b34 b35 a6 Example 1 A1 SSN A2 ENAME A3 PNUM A4 PNAME A5 PLOCATION A6 hours Database Design**R1**R1 R2 R2 R3 R3 Example 1 SSN ENAME SSN ENAME a1 a2 b13 b14 b15 b16 b21 b22 a3 a4 a5 b26 a1 a2 a3 b34 b35 a6 PNUM {PNAME, PLOCATION} PNUM PNAME PLOCATION a1 a2 b13 b14 b15 b16 b21 b22 a3 a4 a5 b26 a1 a2 a3 a4 a5 a6 Database Design**Example 2**Emp_PROJ SSN PNUM hours ENAME PNAME PLOCATION F = {SSN ENAME, PNUM {PNAME, PLOCATION}, {SSN, PNUM} hours} R1 R2 SSN PNUM hours PNAME PLOCATION ENAME PLOCATION Database Design**R1**b11 b12 b13 b14 b15 b16 R2 b21 b22 b23 b24 b25 b26 Example 2 A1 SSN A2 ENAME A3 PNUM A4 PNAME A5 PLOCATION A6 hours R1 b11 a2 b13 b14 a5 b16 R2 a1 b22 a3 a4 a5 a6 SSN ENAME PNUM {PNAME, PLOCATION} {SSN, PNUM} hours Database Design**Problems**Check whether the following decompositions are lossy or lossless Let R=ABCDE, R1=AD, R2=AB, R3=BE, R4=CDE, R5=AE. Let F={AC, BC, CD, DEC, CEA} R(XYZWQ), FD={XZ, YZ, ZW, WQZ, ZQX}. R1(XW), R2(XY), R3(YQ), R4(ZWQ), R5(XQ) R(XYZ), F={XY, ZY}. R1(XY), R2(YZ) R(XYWZPQ), D={R1(ZPQ), R2(XYZPQ)} F={XYW, XWP, PQZ, XYQ} Database Design**Dependency Preservation**R was decomposed (normalisation) into R1, …, Rn S - the set of FDs for R S1, …, Sn - the set of FDs for R1, …, Rn (each Si refers to only the attributes of Ri) S’ = S1 … Sn (usually, S’ S) the decomposition is dependency preserving if S’+ = S+ Database Design**Test for Dependency Preservation**Input: decomposition D={D1,…,Dk} and a set of FDs F Dependency Preservation Test: Step 1: For each XY Є F initialize a set T of attributes with the attributes of X (the determinant of the FD under consideration). ie set T=X and continue with step 2 Step 2: Repeat step 3 until the set T no longer changes. When T no longer changes continue with step 4 Step 3: For each relation Ri (1≤ i ≤ k) of the input decomposition apply the corresponding Ri operation (on a set of attributes T with respect to set of dependencies F). i.e T=T ∩ ((T ∪ Ri)+ ∩ Ri) and repeat step 3 Step 4: Test to see if Y(the right hand side of the FD under consideration) is such that Y ⊂ T. There are two outcomes to this test. If the answer is negative. i.e. if Y not a subset of T then stop the execution of the algorithm and report that the decomposition does not preserve the FD. If the answer is affirmative, i.e. if Y ⊂ T then XY Є G+. If there are other FDs in F that need to be considered repeat step 1 with a FD that has not been considered before. If no more FDs in F then continue with step 4 Database Design**Problems**Given R(XYZ) and the set F = {ZX , XYZ}. Check if the decomposition R1(XY) and R2(XZ) preserve the set F. Given R(ABCD) and the set F = {AB , CD}. Check if the decomposition R1(AB) and R2(CD) preserve the set F. Determine if the decomposition D={R1(XY), R2(YZ), R3(ZW)} of the relation R(WXYZ) preserves the dependencies of the set F={XY, YZ, ZW, WX}. Given R(ABCDEF) and the set F = {AB , CDF, ACE, DF}. Check if the decomposition R1(ACE), R2(CD), R3(DF) and R4(AB) preserve the set F. Database Design**Normalization**Normalization is the process of successive reduction of a given set of relations to a better form (reduced redundancy and anomalies) The normalization that one needs to sustain depends on the work flow (tradeoff between fast access, maintenance of integrity) Assumes that all possible functional dependencies are known First construct a minimal set of FDs Then apply algorithms that construct a required Normal Form Additional criteria may be needed to ensure that the set of relations in a relational database are atisfactory Database Design**1 NF**A relation is in first normal form (1NF) if it does not contain any repeating columns or repeating groups of columns It is the process of converting complex data structures into more simple, stable data structures A relvar is in 1NF if and only if in every legal value of that relvar, every tuple contains exactly one value for each attribute First Normal From (1NF) Unique rows All attributes are atomic Database Design**2 NF**EMPLOYEE2(Emp_ID, Name, Dept, Salary, Course, Date_Completed) Functional dependencies: 1. Emp_ID Name, Dept, Salary 2. Emp_ID, Course Date_Completed partial key dependency A table is in the second normal form (2NF) if it is in the first normal form and if all non-key columns in the table depend on the entire primary key The following relation is in 1NF but not 2NF Decompose into 2NF EMPLOYEE1(Emp_ID, Name, Dept, Salary) Functional dependencies: Emp_ID Name, Dept, Salary EMPCOURSE(Emp_ID, Course,Date_Completed) Functional dependency: Emp_ID, Course Date_Completed Database Design**3 NF**SALES(Customer_ID, Customer_Name, SalesPerson, Region) Functional dependencies: 1. Customer_ID Customer_Name, SalesPerson, Region 2. SalesPerson Region Transitive Dependency A table is in the third normal form (3NF) if it is in the second normal form and if all non-key columns in the table depend non-transitively on the entire primary key Decompose into 3NF SALES1(Customer_ID, Customer_Name, SalesPerson) Functional dependencies: Customer_IDCustomer_Name, SalesPerson SPERSON(SalesPerson, Region) Functional dependency: SalesPerson Region Database Design**BCNF**SCHOOL(Student, Subject, Teacher) Functional dependencies: 1. Student, Subject Teacher 2. Student, Teacher Subject 3. Teacher Subject A table is in Boyce-Codd normal form (BCNF) if every column, on which some other column is fully functionally dependent, is also a candidate for the primary key of the table A table is in BCNF if the only determinants in the table are the candidate keys Decompose into BCNF SCHOOL1(Student, Subject) SCHOOL2(Subject, Teacher) All Functional Dependencies vanished except TeacherSubject Database Design**Comparison between 3NF and BCNF**It is always possible to decompose a relation into relations in 3NF such that: the decomposition is lossless the dependencies are preserved It is always possible to decompose a relation into relations in BCNF such that: the decomposition is lossless but it may not be possible to preserve dependencies But may eliminate more redundancy Database Design**Multivalued Dependency**Let R be a relation schema and let R and R. The multivalued dependency holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[] = t2 [], there exist tuples t3 and t4 in r such that: t1[] = t2 [] = t3 [] = t4[] t3[] = t1 [] t3[R – ] = t2[R – ] t4 ] = t2[] t4[R – ] = t1[R – ] MVD is a tuple generating Dependency Database Design**4 NF**A table is in the fourth normal form (4 NF) if it is in BCNF and does not have any independent multi-valued parts of the primary key If there are two attributes A and B and for a given value of A if there exists multiple values of B, then we say that an MVD exists between A and B The normal forms after BCNF are theoretical interests Database Design**4 NF**Student Subject Student Language Student Table Database Design**4 NF**Split the independent multi-valued components of the primary key into two tables The primary key is (student subject language) Student_Subject Table Student_Language Table Here we take care of the update anomaly Database Design**Surprise: Loss less Decomposition**There exists relations that cannot be nonloss-decomposed into two projects, but can be decomposed into three or more Database Design**Join Dependency**Definition: A relation R satisfies the join Dependency (JD) *(X,Y,…,Z) iff R is equal to the join of its projects on X,Y,..,Z, where X,Y,..,Z are subsets of the set of attributes of R. Consider the following Suppliers(S), Parts(P) and Location they Supply (L) table SPL Table ACTUAL DECOMPOSTION Database Design**Join Dependency**ACTUAL DECOMPOSTION Join Spurious Tuple Database Design**Join Dependency**DECOMPOSTION Join Database Design**5 NF**A table is in fifth normal form (5NF) if it is in the fourth normal form and every join dependency in the table is implied by the candidate key Its also called as the Project Join Normal Form (PJNF) Database Design**Normalization**Un-normalized Relation Arrange every atomic value in the cell (intersection of row and column) of a table First Normal Form (1NF) Eliminate Partial Dependencies Second Normal Form (2NF) Eliminate Transitive Dependencies Third Normal Form (3NF) Make every determinant as a key Boyce-Codd Normal Form Eliminate Multi-valued Dependencies that are not Functional Dependencies Fourth Normal Form (4NF) Eliminate Join Dependencies that are not implied by Candidate keys Fifth Normal Form (5NF) Database Design**Denormalization**Denormalization if a process in which we retain or introduce some amount of redundancy for faster data access Where there arise tradeoffs Database Design**Summary**Normalization helps to reduce redundancy and few anomalies The first 3 (1, 2 and 3) normal forms are practical but BCNF, 4NF and 5 NF are more of theoretical interests Denormalization is done for fast access Database Design