1 / 27

Chapter 5

Chapter 5. Normalization of Database Tables Database Systems: Design, Implementation, and Management, Rob and Coronel Special adaptation for INFS-3200. Database Tables and Normalization. Table is basic building block in database design

Download Presentation

Chapter 5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 5 Normalization of Database Tables Database Systems: Design, Implementation, and Management, Rob and Coronel Special adaptation for INFS-3200

  2. Database Tables and Normalization • Table is basic building block in database design • Normalization is process for assigning attributes to entities. Why: • Reduces data redundancies • Helps eliminate data anomalies • Produces controlled redundancies to link tables and therefore, establish relationships. • GENERAL GUIDELINES: • Define business rules • Define level of detail (granularity) – details & aggregates • Each table must represent one and only one subject only • All attributes in the table must be fully dependent on the PK, the entire PK and nothing but the PK. Database Systems: Design, Implementation, & Management, Rob & Coronel

  3. Database Tables and Normalization • Normalization stages • 1NF - First normal form: • Put data in table format. • Eliminate repeating groups. • Select a suitable primary key. • 2NF - Second normal form • Eliminate partial dependencies. • 3NF - Third normal form • Eliminate transitive dependencies • BCNF - Boyce-Codd normal form • Every determinant in the table is a candidate key. • 4NF - Fourth normal form • Eliminate independent multi-valued set of facts. Database Systems: Design, Implementation, & Management, Rob & Coronel

  4. Database Tables and Normalization Original Report Data Database Systems: Design, Implementation, & Management, Rob & Coronel

  5. Sample Data for Project Report Report Data in Table Format(incomplete) • Observations • PROJ_NUM intended to be primary key • Table entries invite data inconsistencies • Table tend to create data anomalies: • Update • Modifying JOB_CLASS • Insertion • New employee must be assigned project • Deletion • If employee deleted, other vital data lost Database Systems: Design, Implementation, & Management, Rob & Coronel

  6. Conversion to 1NF • Repeating groups must be eliminated (duplicated column types/multi-valued columns) • Tabular format • Each cell has single value - No repeating groups • In our case: PROJ_NUM, EMP_NUM, etc. • Proper primary key developed • Uniquely determines (identifies) attribute values (in each row) • In our case: combination of PROJ_NUM and EMP_NUM • Identify Dependencies • Desirable dependencies based on primary key • Less desirable dependencies • Partial • based on part of composite primary key • Transitive • one nonprime attribute depends on another nonprime attribute Database Systems: Design, Implementation, & Management, Rob & Coronel

  7. Data Organization: 1NF Database Systems: Design, Implementation, & Management, Rob & Coronel

  8. Dependency Diagram (1NF) 1NF (PROJ_NUM, EMP_NUM, PROJ_NUM,EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS) Partial Dependencies: (PROJ_NUM -> PROJ_NAME) (EMP_NUM -> EMP_NAME, JOB_CLASS, CHG_HOUR) Transitive Dependencies: (JOB_CLASS -> CHG_HOUR) Database Systems: Design, Implementation, & Management, Rob & Coronel

  9. 1NF Summarized • Data in a tabular format • No repeating groups in table (repeated attributes or multi-value attributes ) • Primary key attribute(s) identified • All dependent attributes depend on the primary key • Identify all partial and transitive dependencies • Partial • Attributes that depend on part of PK • Can only exist if the table has a composite PK • Transitive • Non-PK attribute(s) determines other attributes • JOB_CLASS determines CHG_HOUR • JOB_CLASS is the determinant attribute • CHG_HOUR is the dependent attribute Database Systems: Design, Implementation, & Management, Rob & Coronel

  10. Conversion to 2NF • Start with 1NF format: • Write each key component on separate line • Write original key on last line • Each component is new table • Write dependent attributes after each key PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR) ASSIGN (PROJ_NUM, EMP_NUM, ASSIGN_HOURS) Transitive Dependencies: (JOB_CLASS -> CHG_HOUR) Database Systems: Design, Implementation, & Management, Rob & Coronel

  11. 2NF Conversion Results Table name: PROJECT PROJ_NUM PROJ_NAME Table name: EMPLOYEE EMP_NUM EMP_NAME JOB_CLASS CHG_HOUR Transitive dependency Table name: ASSIGN EMP_NUM PROJ_NUM ASSIGN_HOURS Database Systems: Design, Implementation, & Management, Rob & Coronel

  12. 2NF Summarized • In 1NF • Includes no partial dependencies • No attribute dependent on a portion of primary key • Still possible to exhibit transitive dependency • Attributes may be functionally dependent on non-key attributes Database Systems: Design, Implementation, & Management, Rob & Coronel

  13. Conversion to 3NF • Create separate table(s) to eliminate transitive functional dependencies • For every transitive dependency: • Write determinant as PK of new table • Write all dependent attributes for each determinant • Delete the dependent attribute(s) from the original table(s) • Leave the determinant attribute(s) in the original table(s) PROJECT (PROJ_NUM, PROJ_NAME) ASSIGN (PROJ_NUM, EMP_NUM, ASSIGN_HOURS) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS) JOB (JOB_CLASS, CHG_HOUR) Database Systems: Design, Implementation, & Management, Rob & Coronel

  14. Conversion to 3NF Database Systems: Design, Implementation, & Management, Rob & Coronel

  15. 3NF Summarized • In 2NF • Contains no transitive dependencies Database Systems: Design, Implementation, & Management, Rob & Coronel

  16. Improving the Design • Evaluate PK Assignment (meet PK guidelines) • Evaluate Naming Conventions • Refine Attribute Atomicity (simple, single-valued) • Identify New Attributes • Identify New Relationships (decompose M:M) • Refine Primary Keys (as required for data granularity) • Maintain Historical Transactional Accuracy • Identify Use of Derived Attributes Database Systems: Design, Implementation, & Management, Rob & Coronel

  17. Evaluate PK Assignment Introduce a better suited PK free of semantic content (non-intelligent PK) Add JOB_CODE as surrogate PK. To reduce data entry errors Repeat for other tables (see # 6 – Assign)* Evaluate Naming Conventions JOB_CLASS is actually a description of the job, change to JOB_DESCRIPTION. CHG_HOURS should be JOB_CHG_HOUR Improving the Design(Job Table) Database Systems: Design, Implementation, & Management, Rob & Coronel

  18. Refine Attribute Atomicity Decompose composite attributes into simple attributes EMP_NAME should be EMP_LNAME EMP_FNAME EMP_INITIAL Identify New Attributes Add new attributes that describe real world entity characteristics EMP_HIREDATE Improving the Design(Employee Table) Database Systems: Design, Implementation, & Management, Rob & Coronel

  19. Improving the Design(Project Table) • Identify New Relationships • Add new relationships as required by business rules. • A project is managed by an employee, an employee can be the manager of only one project. • Add EMP_NUM as FK in PROJECT Database Systems: Design, Implementation, & Management, Rob & Coronel

  20. Improving the Design(Assign Table) • Identify new attributes • Add ASSIGN_DATE • Refine Primary Keys • Consider the “granulity” of the data being represented in order to determine the PK. • Can an employee have multiple hours worked entries for a given day in a given project? • If yes, add ASSIGN_NUM as surrogate PK. • Maintain Historical Transaction Accuracy • Add ASSIGN_CHG_HOUR • Identify Use of Derived Attributes • Add ASSIGN_CHARGE Database Systems: Design, Implementation, & Management, Rob & Coronel

  21. Limitations of System Assigned PK • Surrogate PK ensures that each row has an unique ID, not that the row’s dependent values are unique. • JOB_CODE system assigned PK • We still could have duplicate values: • 511 Programmer 35.75 • 512 Programmer 35.75 • Clearly, entries are duplicated! • To ensure unique values we must have create an unique index on all candidate keys. • Unique index on JOB_DESCRIPTION • This still will still not avoid data entry errors! • 513 Progranmer 35.75 Database Systems: Design, Implementation, & Management, Rob & Coronel

  22. Normalization and Database Design • First, business rules must be determined • Determine the granularity of the data in each entity. • Normalization should be part of the design process • E-R Diagram provides macro view (conceptual) • Normalization provides micro view of entities (logical) • Focuses on characteristics of specific entities • May yield additional entities/relationships • Difficult to separate normalization from E-R diagramming – complementary Database Systems: Design, Implementation, & Management, Rob & Coronel

  23. Initial ERD for Contracting Company • A company has many projects. • Each project requires the services of many employees. An employee may be assigned to several different projects. • Some employees are not assigned to a project. • Each employee has a single primary job classification. Many employees can have the same job classification. • The job classification determines the hourly billing rate. Database Systems: Design, Implementation, & Management, Rob & Coronel

  24. Modified ERD for Contracting Company Figure 4.11 Database Systems: Design, Implementation, & Management, Rob & Coronel

  25. Final ERD for Contracting Company Figure 4.12 Database Systems: Design, Implementation, & Management, Rob & Coronel

  26. Denormalization • Normalization is one of many database design goals • Normalization creates many small tables with PK/FK • Reporting requirements over normalized tables requiresmultiple table joins to get complete data: • Additional processing (join operations) • Additional I/Os operations • Design must find right balance among: • Design Integrity requirements • Information requirements • Performance requirements Database Systems: Design, Implementation, & Management, Rob & Coronel

  27. Unnormalized Table Defects • Data updates less efficient • Normalization ensures that data is updated (insert/update/delete)only once in one place. • Indexing more cumbersome • No simple strategies for creating views Database Systems: Design, Implementation, & Management, Rob & Coronel

More Related