PMIT-6102 Advanced Database Systems

PMIT-6102Advanced Database Systems By- JesminAkhter Assistant Professor, IIT, Jahangirnagar University

Lecture 03 Relational Database DesignNormalization-Part-1

Outline • Overview of Relational DBMS • Normalization(1st lecture)

Normalization • The aim of normalization is to eliminate various anomalies (or undesirable aspects) of a relation in order to obtain “better” relations. • The following four problems might exist in a relation scheme: • Repetition anomaly • Update anomaly • Insertion anomaly • Deletion anomaly

Repetition Anomaly • The NAME,TITLE, SAL attribute values are repeated for each project that the employee is involved in. • Waste of space • Complicates updates • Contrary to the spirit of databases EMP ENO ENAME TITLE SAL PNO RESP DUR P1 Manager 12 E1 J. Doe Elect. Eng. 40000 P1 Analyst 24 E2 M. Smith Analyst 34000 P2 Analyst 6 E2 M. Smith Analyst 34000 E3 A. Lee Mech. Eng. 27000 P3 Consultant 10 E3 A. Lee Mech. Eng. 27000 P4 Engineer 48 E4 J. Miller Programmer 24000 P2 Programmer 18 E5 B. Casey Syst. Anal. 34000 P2 Manager 24 P4 Manager 48 E6 L. Chu Elect. Eng. 40000 P3 Engineer 36 E7 R. Davis Mech. Eng. 27000 Syst. Anal. E8 J. Jones 34000 P3 Manager 40

Update Anomaly • If any attribute of project (say SAL of an employee) is updated, multiple tuples have to be updated to reflect the change. EMP ENO ENAME TITLE SAL PNO RESP DUR P1 Manager 12 E1 J. Doe Elect. Eng. 40000 P1 Analyst 24 E2 M. Smith Analyst 34000 P2 Analyst 6 E2 M. Smith Analyst 34000 E3 A. Lee Mech. Eng. 27000 P3 Consultant 10 E3 A. Lee Mech. Eng. 27000 P4 Engineer 48 E4 J. Miller Programmer 24000 P2 Programmer 18 E5 B. Casey Syst. Anal. 34000 P2 Manager 24 E6 L. Chu Elect. Eng. 40000 P4 Manager 48 P3 Engineer 36 E7 R. Davis Mech. Eng. 27000 Syst. Anal. E8 J. Jones 34000 P3 Manager 40

Insertion Anomaly • It may not be possible to store information about a new project until an employee is assigned to it. EMP ENO ENAME TITLE SAL PNO RESP DUR P1 Manager 12 E1 J. Doe Elect. Eng. 40000 P1 Analyst 24 E2 M. Smith Analyst 34000 P2 Analyst 6 E2 M. Smith Analyst 34000 E3 A. Lee Mech. Eng. 27000 P3 Consultant 10 E3 A. Lee Mech. Eng. 27000 P4 Engineer 48 E4 J. Miller Programmer 24000 P2 Programmer 18 E5 B. Casey Syst. Anal. 34000 P2 Manager 24 E6 L. Chu Elect. Eng. 40000 P4 Manager 48 P3 Engineer 36 E7 R. Davis Mech. Eng. 27000 Syst. Anal. E8 J. Jones 34000 P3 Manager 40

Deletion Anomaly • If an engineer, who is the only employee on a project, leaves the company, his personal information cannot be deleted, or the information about that project is lost. • May have to delete many tuples. EMP ENO ENAME TITLE SAL PNO RESP DUR P1 Manager 12 E1 J. Doe Elect. Eng. 40000 P1 Analyst 24 E2 M. Smith Analyst 34000 P2 Analyst 6 E2 M. Smith Analyst 34000 E3 A. Lee Mech. Eng. 27000 P3 Consultant 10 E3 A. Lee Mech. Eng. 27000 P4 Engineer 48 E4 J. Miller Programmer 24000 P2 Programmer 18 E5 B. Casey Syst. Anal. 34000 P2 Manager 24 E6 L. Chu Elect. Eng. 40000 P4 Manager 48 P3 Engineer 36 E7 R. Davis Mech. Eng. 27000 Syst. Anal. E8 J. Jones 34000 P3 Manager 40

What to do? • Take each relation individually and “improve” it in terms of the desired characteristics • Normal forms • Atomic values (1NF) • Can be defined according to keys and dependencies. • Functional Dependencies ( 2NF, 3NF, BCNF) • Multivalued dependencies (4NF) • Normalization • Normalization is a process of concept separationwhich applies a top-down methodology for producing a schema by subsequent refinements and decompositions. • Do not combine unrelated sets of facts in one table; each relation should contain an independent set of facts. • Universal relation assumption

Normalization Issues • How do we decompose a schema into a desirable normal form? • What criteria should the decomposed schemas follow in order to preserve the semantics of the original schema? • Reconstructability: recover the original relation  no spurious joins • Lossless decomposition: no information loss • Dependency preservation: the constraints (i.e., dependencies) that hold on the original relation should be enforceable by means of the constraints (i.e., dependencies) defined on the decomposed relations.

Consider combining relations sec_class(sec_id, building, room_number) and section(course_id, sec_id, semester, year) into one relation section(course_id, sec_id, semester, year, building, room_number) No repetition in this case A Combined Schema Without Repetition

Suppose we had started with inst_dept. How would we know to split up (decompose) it into instructor and department? Write a rule “if there were a schema (dept_name, building, budget), then dept_name would be a candidate key” Denote as a functional dependency: dept_namebuilding, budget In inst_dept, because dept_name is not a candidate key, the building and budget of a department may have to be repeated. This indicates the need to decompose inst_dept Not all decompositions are good. Suppose we decomposeemployee(ID, name, street, city, salary) into employee1 (ID, name) employee2 (name, street, city, salary) The next slide shows how we lose information -- we cannot reconstruct the original employee relation -- and so, this is a lossy decomposition. What About Smaller Schemas?

A Lossy Decomposition

Lossless join decomposition Decomposition of R = (A, B, C) R1 = (A, B) R2 = (B, C) Example of Lossless-Join Decomposition A B C A B B C   1 2 A B   1 2 1 2 A B r B,C(r) A,B(r) A B C A (r) B (r)   1 2 A B

Stages of Normalization Remove repeating groups First normal form (1NF) Remove partial dependencies Second normal form (2NF) Remove transitive dependencies Third normal form (3NF) Remove remaining functional dependency anomalies Boyce-Codd normal form (BCNF) Remove multivalued dependencies Fourth normal form (4NF) Remove remaining anomalies Fifth normal form (5NF) Unnormalized (UDF)

A repeating group is an attribute (or set of attributes) that can have more than one value for a primary key value. Repeating Groups Example We have the following relation that contains staff and department details and a list of telephone contact numbers for each member of staff. Repeating Groups are not allowed in a relational design, since all attributes have to be ‘atomic’ - i.e., there can only be one value per cell in a table!

Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part). Repeating Groups STUDENT

Formal Definition: Attribute B is functionally dependant upon attribute A (or a collection of attributes) if a value of A determines a single value of attribute B at any one time. Example: Functional Dependencies staffNo job staffNo dept staffNo dname dept dname staffNojob dept dname SL10 Salesman 10 Sales SA51 Manager 20 Accounts DS40 Clerk 20 Accounts OS45 Clerk 30 Operations Functional Dependency Formal Notation: A  B This should be read as ‘AdeterminesB’ or ‘B is functionally dependant on A’. A is called the determinant and B is called the object of the determinant.

Full Functional Dependency: Only of relevance with composite determinants. This is the situation when it is necessary to use all the attributes of the composite determinant to identify its object uniquely. Example: Full Functional Dependencies order# line# qty price A001 001 10 200 A002 001 20 400 A002 002 20 800 A004 001 15 300 (Order#, line#)  qty (Order#, line#)  price Functional Dependency Compound Determinants: If more than one attribute is necessary to determine another attribute in an entity, then such a determinant is termed a composite determinant.

Partial Functional Dependency: This is the situation that exists if it is necessary to only use a subset of the attributes of the composite determinant to identify its object uniquely. Full Functional Dependencies (student#, unit#)  grade Partial Functional Dependencies unit#  room Repetition of data! Functional Dependency

Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key. Functional Dependency Partial Dependency

Definition: A transitive dependency exists when there is an intermediate functional dependency. Transitive Dependencies staffNo dept dept dname staffNodeptdname Repetition of data! Transitive Dependency Formal Notation: If A  B and B  C, then it can be stated that the following transitive dependency exists: A  B  C Example:

Transitive Dependency – when a non-key attribute determines another non-key attribute. Transitive Dependency Transitive Dependency

Unnormalized – There are multivalued attributes or repeating groups 1 NF – No multivalued attributes or repeating groups. 2 NF – 1 NF plus no partial dependencies 3 NF – 2 NF plus no transitive dependencies Normal Forms: Review

ISBN  Title ISBN  Publisher Publisher  Address Example 1: Determine NF All attributes are directly or indirectly determined by the primary key; therefore, the relation is at least in 1 NF

ISBN  Title ISBN  Publisher Publisher  Address Example 1: Determine NF The relation is at least in 1NF. There is no COMPOSITE primary key, therefore there can’t be partial dependencies. Therefore, the relation is at least in 2NF

ISBN  Title ISBN  Publisher Publisher  Address Example 1: Determine NF Publisher is a non-key attribute, and it determines Address, another non-key attribute. Therefore, there is a transitive dependency, which means that the relation is NOT in 3 NF.

ISBN  Title ISBN  Publisher Publisher  Address Example 1: Determine NF We know that the relation is at least in 2NF, and it is not in 3 NF. Therefore, we conclude that the relation is in 2NF.

ISBN  Title ISBN  Publisher Publisher  Address Example 1: Determine NF • In your solution you will write the following justification: • No M/V attributes, therefore at least 1NF • No partial dependencies, therefore at least 2NF • There is a transitive dependency (Publisher  Address), therefore, not 3NF • Conclusion: The relation is in 2NF

Product_ID  Description Example 2: Determine NF All attributes are directly or indirectly determined by the primary key; therefore, the relation is at least in 1 NF

Product_ID  Description Example 2: Determine NF The relation is at least in 1NF. There is a COMPOSITE Primary Key (PK) (Order_No, Product_ID), therefore there can be partial dependencies. Product_ID, which is a part of PK, determines Description; hence, there is a partial dependency. Therefore, the relation is not 2NF. No sense to check for transitive dependencies!

Product_ID  Description Example 2: Determine NF We know that the relation is at least in 1NF, and it is not in 2 NF. Therefore, we conclude that the relation is in 1 NF.

Product_ID  Description Example 2: Determine NF In your solution you will write the following justification: 1) No M/V attributes, therefore at least 1NF 2) There is a partial dependency (Product_ID  Description), therefore not in 2NF Conclusion: The relation is in 1NF

Thank You

PMIT-6102 Advanced Database Systems