1 / 37

The objective of normalization is sometimes stated

Normalization. The objective of normalization is sometimes stated “to create relations where every dependency is on the primary key, the whole primary key, and nothing but the primary key”

keisha
Download Presentation

The objective of normalization is sometimes stated

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Normalization • The objective of normalization is sometimes stated • “to create relations where every dependency is on • the primary key, • the whole primary key, and • nothing but the primary key” • Database designers are always looking for (OLTP) databases that are as simple as possible - ones that are easiest to keep consistent, ones where the semantics are clear. 91.2914 R McFadyen

  2. Normalization Certain relation schemas have update anomalies - they may be difficult to understand and maintain Normalization theory recognizes this and gives us some principles to guide our designs Normal Forms: 1NF, 2NF, 3NF, BCNF, 4NF, … are each an improvement on the previous ones in the list 91.2914 R McFadyen

  3. Normalization Normalization is a process (involves decomposition) that generates tables of higher normal forms. Denormalization moves from higher to lower forms and is done for performance reasons. 91.2914 R McFadyen

  4. Anomalies Suppose we have EmployeeProject emp_num proj_num ep_hours emp_name proj_loc • EmployeeProject holds information about employees and the projects they work on • emp_num: employee’s social insurance number • proj_num: a project number • ep_hours: the hours the employee has worked on the project • emp_name: the employee’s name • proj_loc: the location of the project • PK is {emp_num, proj_num} 91.2914 R McFadyen

  5. Anomalies An instance of the table: EmployeeProject emp_num proj_num ep_hours emp_name proj_loc 11 23 6 Jones Edmt 11 36 2 Jones Wpg 11 99 22 Jones Brandon 17 23 5 Smith Edmt 17 36 2 Smith Wpg 17 101 13 Smith Wpg 91.2914 R McFadyen

  6. Anomalies EmployeeProject emp_num proj_num ep_hours emp_name proj_loc • we will have redundant information in the database … • if more than one employee works on the same project, then the project location is repeated • if some employee works on more than one project, then the employee’s name is repeated 91.2914 R McFadyen

  7. Anomalies • Redundant data leads to • additional space requirements • update anomalies 91.2914 R McFadyen

  8. Anomalies • Suppose EmployeeProject is the only relation where the Project Location is recorded • insert anomalies • adding a new project is complicated, unless there is also an employee for that project • deletion anomalies • if we delete all employees for some project, then what should happen to the project information? • modification anomalies • if we update the location of a project, then we must change it in all rows referring to that project 91.2914 R McFadyen

  9. Anomalies • If we design a database with a relation such as EmployeeProject then we will have complex update rules to enforce. • difficult to code correctly • will not be as efficient as possible • Such designs mix concepts. • E.g EmployeeProject mixes the Employee and Project concepts • Such designs are (generally) not good for OLTP 91.2914 R McFadyen

  10. Functional dependencies Suppose we have a relation R comprising attributes X,Y, … We say a functional dependency exists between the attributes X and Y, if, whenever a tuple exists with the value x for X, it will always have the same value for y for Y. emp_num emp_name emp_gender emp_phone X Y Employee emp_num emp_gender emp_name emp_phone Given a specific employee number, there is only one value for name, one value for gender, and only one value for phone Emp_number is a determinant for emp_name, emp_gender, emp_phone 91.2914 R McFadyen

  11. Normal Forms • a series of normal forms are known that have, successively, better update characteristics • we’ll consider 1NF, 2NF, 3NF, and BCNF • a technique used to improve a relation is decomposition, where one relation is replaced by two or more relations. When we do so, we want to eliminate update anomalies without losing any information. • Our target is BCNF 91.2914 R McFadyen

  12. The Dependency Diagram Employee emp_num emp_gender emp_name emp_phone • illustrates a single-attribute PK (simple PK) • all employee attributes are dependent on the PK 91.2914 R McFadyen

  13. Partial Dependency PROJ_CODE EMP_NUM PROJ_NAME HRS_WORKED Partial dependency • Multi-attribute PK (composite PK) • HRS_WORKED is dependent on the PK • But PROJ_CODE, which is only a part of the PK, determines PROJ_NAME (equivalently, PROJ_NAME is dependent on PROJ_CODE) 91.2914 R McFadyen

  14. Transitive Dependency STU_NUM STU_LNAME DEPT_CODE DEPT_NAME Transitive dependency • All student attributes are dependent on the PK • But DEPT_CODE determines DEPT_NAME (or DEPT_NAME is dependent on DEPT_CODE, a non-key attribute 91.2914 R McFadyen

  15. BCNF • all determinants are candidate keys Example, consider: INV_NUM PROD_CODE PROD_TITLE CUS_NUM LINE_NUM LINE_UNITS Transitive dependency Partial dependency 91.2914 R McFadyen

  16. BCNF - all determinants are candidate keys Three determinants, but only one is a candidate key. Therefore, not in BCNF • Determinants: • prod_code prod_title • inv_num cus_num • inv_num, prod_code line_units INV_NUM PROD_CODE PROD_TITLE CUS_NUM LINE_NUM LINE_UNITS 91.2914 R McFadyen

  17. BCNF - all determinants are candidate keys • Three determinants: • prod_code • inv_num • {prod_code, line_units} • but only one is a candidate key. • Therefore, not in BCNF A table with these attributes will have a lot of redundancy. INV_NUM PROD_CODE PROD_TITLE CUS_NUM LINE_NUM LINE_UNITS 91.2914 R McFadyen

  18. Decomposition • We can decompose the single table into three tables where there will be • no unnecessary redundancy • no loss of information - we can join the three to have what we had before • no loss of dependencies INV_NUM CUS_NUM PROD_CODE PROD_TITLE INV_NUM PROD_CODE LINE_NUM LINE_UNITS 91.2914 R McFadyen

  19. Decomposition Each of these tables is in BCNF INV_NUM CUS_NUM PROD_CODE PROD_TITLE INV_NUM PROD_CODE LINE_NUM LINE_UNITS 91.2914 R McFadyen

  20. Consider: INV_NUM PROD_CODE PROD_TITLE CUS_NUM LINE_NUM LINE_UNITS Partial dependency This table is not in 2NF because it has a partial dependency: CUS_NUM is dependent on INV_NUM 91.2914 R McFadyen

  21. First Normal Form (1NF) Each row/column intersection contains one and only one value, rather than a set of values All attributes are dependent on the primary key 91.2914 R McFadyen

  22. First Normal Form (1NF) Consider EMP_NUM EMP_LNAME EMP_FNAME EMP_DOB This table is in 1NF. Question to answer later: is it in 2NF? 3NF? BCNF? 91.2914 R McFadyen

  23. First Normal Form (1NF) Consider INV_NUM PROD_CODE PROD_TITLE CUS_NUM LINE_NUM LINE_UNITS This table is in 1NF. Question to answer later: is it in 2NF? 3NF? BCNF? 91.2914 R McFadyen

  24. The Second Normal Form (2NF) • Meets 1NF requirements • Does not contain partial dependencies • But may contain transitive dependencies 91.2914 R McFadyen

  25. Decomposition • A table can be decomposed into two or more tables. Ideally, this involves: • no loss of information • all information previously available can be obtained by joining the new tables - a lossless decomposition • no loss of dependencies • a dependency-preserving decomposition 91.2914 R McFadyen

  26. Consider: INV_NUM PROD_CODE PROD_TITLE CUS_NUM LINE_NUM LINE_UNITS Partial dependency This table is not in 2NF because it has a partial dependency: CUS_NUM is dependent on INV_NUM 91.2914 R McFadyen

  27. Consider: INV_NUM PROD_CODE PROD_TITLE CUS_NUM LINE_NUM LINE_UNITS Partial dependency We need to decompose the table - move the partial dependency to a new table. I.e. Invoice information belongs elsewhere. 91.2914 R McFadyen

  28. Decomposition • create a new table • don’t lose any information (can still derive) INV_NUM PROD_CODE PROD_TITLE LINE_NUM LINE_UNITS No partial dependencies present! INV_NUM CUS_NUM 91.2914 R McFadyen

  29. The Third Normal Form (3NF) • Meets 2NF requirements • Does not contain transitive dependencies 91.2914 R McFadyen

  30. Consider: INV_NUM PROD_CODE PROD_TITLE LINE_NUM LINE_UNITS Transitive dependency Because of the transitive dependency, this table is not in 3NF We need to decompose the table - move the transitive dependency to a new table. I.e. Product information belongs elsewhere. 91.2914 R McFadyen

  31. Decomposition • create a new table • don’t lose any information (can still derive) INV_NUM PROD_CODE LINE_NUM LINE_UNITS There is no transitive dependency here! PROD_CODE PROD_TITLE 91.2914 R McFadyen

  32. The Boyce-Codd Normal Form (BCNF) • Meets 3NF requirements • Every determinant in the table is a candidate key • Focus is on determinants 91.2914 R McFadyen

  33. Example consider the following functional dependencies and table structure city, streetname postalcode postalcode city Address city streetname postalcode Note that this table does have redundant data, and from a theoretical perspective, would have anomalies associated with it. 91.2914 R McFadyen

  34. Example city, streetname postalcode postalcode city Address city streetname postalcode • Let us consider the more formal/complete definitions in the 91.3902 text, and then ask: • What are the non-key attributes? • What is the primary key? • What is the normal form of Address? 91.2914 R McFadyen

  35. Address city streetname postalcode • From the 91.3902 text: A relation R is in 2NF if every nonprime attribute of A is not partially dependent on any key R. • nonprime: an attribute is nonprime if it is not a member of a candidate key. • prime: an attribute is prime if it is a member of a candidate key. In our example, there are no nonprime attributes Hence the table is in 2NF 91.2914 R McFadyen

  36. Address city streetname postalcode • From the 91.3902 text: A relation R is in 3NF if, whenever a functional dependency x --> A holds in R, either • a) X is a superkey of R, or • b) A is a prime attribute of R. One of the things that this says is: if a nonprime attribute A is dependent on some attribute, that determinant must include the key. In our example, there are no nonprime attributes Hence the table is in 3NF 91.2914 R McFadyen

  37. Address city streetname postalcode • Is every determinant a candidate key? • NO Hence the table is not BCNF What decomposition would preserve dependencies and have BCNF tables? Is this a practical example? 91.2914 R McFadyen

More Related