1 / 73

Instructor Dr. Martha Malaty Text & Original Presentations

Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts. Instructor Dr. Martha Malaty Text & Original Presentations Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel, 2004. Chapter 5.

claireb
Download Presentation

Instructor Dr. Martha Malaty Text & Original Presentations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Orange Coast CollegeBusiness DivisionCS/CIS DepartmentFall 2004 CIS 182Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original Presentations Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel, 2004

  2. Chapter 5 Normalization of Database Tables Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel

  3. In this chapter, you will learn: • What normalization is and what role it plays in the database design process • About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF • How normal forms can be transformed from lower normal forms to higher normal forms • That normalization and ER modeling are used concurrently to produce a good database design • That some situations require denormalization to generate information efficiently Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  4. Database Tables & Normalization • Good table structure is essential for good DB design • Normalization is the way to reach good table structure • Normalization is the process for assigning attributes to entities through evaluating and correcting table structures • Reduces data redundancies • Helps eliminate data anomalies • Produces controlled redundancies to link tables • Works through sequence of stages (normal forms) Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  5. Database Tables & Normalization • Normalizationworks through a series of stages called normal forms: • First normal form (1NF) • Second normal form (2NF) • Third normal form (3NF) • Fourth normal form (4NF) • . . . • 2NF is better than 1NF; 3NF is better than 2NF; … • For most business database design purposes, 3NF is highest we need to go in the normalization process • Highest level of normalization is not always most desirable Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  6. Tutorials Websites On Normalization • What do you need to know about • Webopedia • Rules of data normalization • 3NF Tutorial • ServerWatch • VBMySQL.com Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  7. Unnormalized vs. Denormalized • Unnormalized data: • No ER model is developed when creating the database • Denormalized data • Starting with normalized data, adding redundancy for better performance • We cannot denormalize unnormalized data Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  8. The Need for Normalization • Example: A company manages building projects • Charges its clients by billing hours spent on each contract • Hourly billing rate is dependent on employee’s position • Periodically, a report is generated that contains information displayed in Table 5.1 • See RPT_FORMAT table in ConstructCo.mdb Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  9. A Sample Report Layout Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  10. A Table in the Report Format Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  11. Example: Construction company • Three major problems: • PROJ_NUM contains repeating groups • Table entries invite data inconsistencies • Table displays data anomalies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  12. Construction Company Problems • PROJ_NUM contains “repeating groups” • Several data entries with NULL entries indicating that all entries belong to the same PROJ_NUM • PRO_NUM is intended to be primary key • Primary key can’t be NULL • Relational table can’t contain repeating groups Repeating groups Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  13. Construction Company Problems Possible inconsistency • Table entries invite data inconsistencies • Elect. Engineer could mistakenly be entered in another abbreviated form (e.g. Elect.Eng, EE, …) Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  14. Construction Company Problems • Table displays data anomalies • Update anomalies • Modifying JOB_CLASS needs several alterations • Insertion anomalies • New employee must be assigned project • Deletion anomalies • If employee is deleted, other vital data might be lost Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  15. Construction Company Problems • Structure of data set in Fig. 5.1 (ConstructCo.mdb) does not handle data very well • The table structure appears to work; report is generated with ease • Unfortunately, the report may yield different results, depending on what data anomaly has occurred Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  16. Levels of Normalizations Un-normalized) 1NF 2NF 3NF BCNF 4NF 5NF Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  17. Conversion to First Normal Form • Repeating group • Derives its name from the fact that a group of multiple (related) entries can exist for any single key attribute occurrence • Relational table must not contain repeating groups • Normalizing the table structure will reduce these data redundancies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  18. Conversion to First Normal Form • Normalization is three-step procedure • Eliminate Repeating Groups • Identify the Primary Key • Identify all Dependencies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  19. Step 1: Eliminate Repeating Groups • Present data in a tabular format, where each cell has a single value and there are no repeating groups • Eliminate repeating groups by eliminating nulls, making sure that each repeating group attribute contains an appropriate data value Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  20. Data Organization: 1NF • See DATA_ORG_INF Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  21. Step 2: Identify the Primary Key • Primary key must uniquely identify attribute values • New key must be composed • What would be the primary key for the DATA_ORG_INF table? Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  22. Step 3: Identify all Dependencies • Dependencies can be depicted with the help of a “Dependency Diagram” • Depicts all dependencies found within a given table structure • Helpful in getting bird’s-eye view of all relationships among a table’s attributes • Makes it much less likely that an important dependency will be overlooked Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  23. Dependency Diagram: 1NF Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  24. First Normal Form • Tabular format in which: • All key attributes are defined • There are no repeating groups in the table • All attributes are dependent on primary key • All relational tables satisfy 1NF requirements • Some tables contain “Partial dependencies” • Dependencies based on only part of the primary key • Sometimes used for performance reasons, but should be used with caution • Still subject to data redundancies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  25. Conversion to Second Normal Form • Relational database design can be improved by converting the database into second normal form (2NF) • Two steps • Identify All key components • Identify the dependent attributes Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  26. Step 1: Identify All Key Components • Write each key component on separate line, and then write the original (composite) key on the last line • Each component will become the key in a new table • Example: • Proj_Num • Emp_Num • Proj_Num, Emp_Num Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  27. Step 2: Identify Dependent Attributes • Determine which attributes are dependent on which other attributes • At this point, most anomalies have been eliminated • Example: • Proj_Num  Proj_name • Emp_Num  Emp_Name, Job_Class, Chg_Hour • Proj_Num, Emp_Num  Hours Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  28. Second Normal Form (2NF) Conversion Results Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  29. Second Normal Form • Table is in second normal form (2NF) if: • It is in 1NF and • It includes no partial dependencies: • No attribute is dependent on only a portion of the primary key Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  30. Conversion to Third Normal Form • Three steps • Identify each new determinant • Identify the dependent attributes • Remove the dependent attributes from transitive dependencies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  31. Step 1: Identify New Determinants • Determinant • Any attribute whose value determines other values within a row • For every transitive dependency, write its determinant as a PK for a new table • Example • Job_Class Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  32. Step 2: Identify Dependent Attributes • Identify the attributes dependent on each determinant identified in Step 1 and identify the dependency • Name the table to reflect its contents and function • Example • Job_Class  Chg_Hour Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  33. Step 3: Remove Dependent Attributes from Transitive Dependencies • Eliminate all dependent attributes in transitive relationship(s) from each table that has such a transitive relationship • Draw a new dependency diagram to show all tables defined in Steps 1–3 • Check new tables and modified tables from Step 3 to make sure that each has a determinant and does not contain inappropriate dependencies • Example: • Proj_Num  Proj_name • Emp_Num  Emp_Name, Job_Class • Proj_Num, Emp_Num  Hours • Job_Class  Chg_Hour Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  34. Third Normal Form (3NF) Conversion Results Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  35. Third Normal Form • A table is in third normal form (3NF)if: • It is in 2NF and • It contains no transitive dependencies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  36. Improving the Design • Table structures are cleaned up to eliminate the troublesome initial partial and transitive dependencies • Normalization cannot, by itself, be relied on to make good designs • It is valuable because its use helps eliminate data redundancies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  37. Improving the Design (continued) • The following changes were (or should be) made: • PK assignment • Naming conventions • Attribute atomicity • Adding attributes • Adding relationships • Refining PKs • Maintaining historical accuracy • Using derived attributes Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  38. PK assignment • Addition of JOB_CODE attribute greatly decreases the likelihood of referential integrity violation • Example: • One would write a job class as “DB Designer” and another as “Database Designer” although both refer to the same class • Problem: The new key produces transitive dependency • JOB_CODE -> JOB_CLASS, CHG_HOUR • JOB_CLASS -> CHG_HOUR • The effect of transitive dependency is less problematic than integrity violation Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  39. Naming conventions • Adhere to naming convention by having the table name as the first prefix of the attribute name • Example: • CHG_HOUR is changed to JOB_CHG_HOUR • JOB_CLASS should be changed to JOB_DESCRIPTION to better describe the entity • HOURS is changed to ASSIGNED_HOURS to associate it with the ASSIGN table Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  40. Attribute atomicity • Atomic attribute: • Attribute that cannot be further divided • Gains querying flexibility • Example: EMP_NAME is not atomic • Use EMP_LNAME, EMP_FNAME, EMP_INITIAL instead Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  41. Adding Attributes • Adding attributes that are needed in the real-world environment • Example: EMP_HIREDATE can be used to track employee’s “job longevity” to award bonuses to long-term employees & other morale enhancing measures Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  42. Adding Relationships • To avoid unnecessary data duplication • Example: • To be able to supply detailed information about each project’s manager, use EMP_NUM as a FK in PROJECT table Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  43. Refining PKs • If an employee makes two or more entries for the same project, replace EMP_NUM & PROJ_NUM as a combined key, by a system-assigned, surrogate key, ASIGN_NUM, in the ASSIGN table • Surrogate key: • At the implementation level, a system-defined attribute that is created and managed by the DBMS. • It is automatically incremented for each row. • In Oracle for example we use the "sequence" object for surrogate keys. • EMP_NUM & PROJ_NUM can still be used as FK’s Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  44. Maintaining historical accuracy • Writing the job charge per hour in ASSIGN table is crucial to maintain historical accuracy, in case that the job charge per hour changes over time Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  45. Using derived attributes • Derived attributes can be calculated when needed • Storing a derived attribute makes it easier to write application SW to produce the desired results and saves reporting time • ASSIGN_CHARGE is the result of multiplying ASSIGN_HOURS by ASSIGN_CHG_HOUR Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  46. The Completed Database • See Ch05_ConstructCo.mdb Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  47. The Completed Database (continued) • See Ch05_ConstructCo.mdb Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  48. The Completed Database (continued) • See Ch05_ConstructCo.mdb Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  49. Limitations on System-Assigned Keys • System-assigned (surrogate) primary key may not prevent confusing entries • Example: • JOB_CODE attribute was designed to be a PK of JOB table • This does not prevent duplicating existing records with different PK values • Data entries in Table 5.2 are inappropriate because they duplicate existing records • Yet there has been no violation of either entity integrity or referential integrity Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

  50. The Boyce-Codd Normal Form (BCNF) • Google search • About.com • A relation is in Boyce-Codd Normal Form (BCNF) if every determinant is a candidate key. • Candidate key • Has same characteristics as primary key, but for some reason, not chosen to be primary key • If a table contains only one candidate key, the 3NF and the BCNF are equivalent • BCNF can be violated only if the table contains more than one candidate key • Most designers consider the BCNF as a special case of 3NF Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel

More Related