Normalization

Normalization Database Management System

Normalization • Normalization is the process of decomposing relations with anomalies to produce smaller, well structured relations. • It is a technique for producing a set of relations with desirable properties, given the data requirements of an enterprise. • The process of normalization was first produced developed by E. F. Codd. It is basically a process of efficiently organizing data in a database. • Purpose is to produce an anomaly free design.

Goal of Normalization • There are two goals of the normalization process. • Eliminate redundant data (for example, storing same data in more than one table) and • Ensure data dependencies make sense (only storing related data in a table). • Both of these are worthy goals as they reduce the amount of space a database consumes and ensures that data is logically stored.

Normalization Details • Normalization is often performed as a series of tests on a relation to determine whether it satisfies or violates the requirements of a given normal form. When a requirement is not met, the relation violating the requirement must be decomposed into relations that individually meet the requirements of normalization. • The process of normalization is a formal method that identifies relations based on their primary key (or candidate keys in the case of BCNF) and the functional dependencies among their attributes. As normalization proceeds, the relations become progressively more restricted in format, and also less vulnerable to update anomalies.

Normalization Details • A strongly recommended step. • Normalized design makes the maintenance of database easier. • Normalization applied on each table of a DB design. • Performed after the logical database design. • Informally also performed during conceptual DB design.

Normal Forms • Different forms or levels of normalization called first, second, third and so on forms. • Each form has got certain conditions. If a table fulfils the condition(s) for a normal form then the table is in that normal form. • Three normal forms were initially proposed, called first (1NF), second (2NF) and third (3NF) normal forms. • Subsequently, a stronger definition of third normal form was introduced by R. Boyce and E. F. Codd, referred to as Boyce-Codd Normal Form (BCNF). • Higher normal forms that go beyond BCNF were introduced later. For example, there are fourth (4NF) and fifth (5NF) normal forms. However, these later normal forms deal with practical situations that are very rare.

Steps in Normalization

Anomalies • An inconsistent, incomplete or incorrect state of database • Four types of anomalies are of concern here; Redundancy, insertion, deletion and updation.

Un-normalized Form • A table (not a relation) that contains one or more repeating groups.

First Normal Form

Functional Dependency • Normalization is based on functional dependencies (FDs). • A type of relationship between attributes of a relation. • Definition: If A and B are attributes of a relation R, then B is functionally dependent on A if each value of A in R is associated with exactly one value of B; written as A B • It does not mean that A derives B, although it may be the case sometime. • Means that if we know value of A then we can precisely determine a unique value of B. • Attribute of set of attributes on the left side are called determinant and on the right are called dependents • Like R (a, b, c, d, e) a b, c, d d d, e

FD Diagrammatic Representation

FD Example STD(stId, stName, stAdr, prName, credits) stId stName, stAdr, prName, credits prName credits

First Normal Form (1NF) • A relation in which the intersection of each row and column contains one and only one value. • A relation is in 1NF if and only if each attribute is single valued for each tuple. This means that each attribute in each row, or each cell of the table, contains only one value. • No repeating fields or groups are allowed.

1NF Contd. • There are two common approaches to achieve 1NF. • 1st Approach – Flattening: In the first approach, we remove the repeating groups by entering appropriate data in the empty columns of rows containing the repeating data. With this approach, redundancy is introduced into the resulting relation, which is subsequently removed during the normalization process. • 2nd Approach – Decomposition: In the second approach, we remove the repeating group by placing the repeating data along with a copy of the original key attributes in a separate relation. A primary key is identified for the new relation.

Un-normalized Table

Relations after applying 1NF • Relation1 (stID, stName, stAdr, Major) • Relation2 (stID, CourseID, CourseTitle, InstName, InstLoc, Grade) Relation 1

Relations after applying 1NF Relation 2

Problems in 1NF • Update Problem: If we want to change courseTitle from ‘IT Basics’ to ‘Intro. To IT’, we must bring this change in every record in which this course title appears. Thus updating can be a lengthy process. • Insertion Problem: If we want to add a new course, we cannot do so until a student is enrolled in that course, because both stId and CourseId are primary key in relation2. • Deletion Problem: If we delete student with id S105, the information about course with ID CS503 will also be deleted.

Second Normal Form

Full Functional Dependency • 2NF is based on the concept of full functional dependency. • Definition: Full functional dependency indicates that if A and B are attributes of a relation, B is fully functionally dependent on A if B is functionally dependent on A, but not on any proper subset of A. • A functional dependency is full functional dependency if removal of any attribute from A results in the dependency not being sustained any more. • Partial Functional Dependency: If A and B are attributes of R, B is partially dependent on A if B is dependent on any proper subset of A. • If primary key is not composite key, then there is no chance of partial functional dependency.

Example: Full Functional Dependency • stID, CourseID CourseName • stID, CourseID InstName • stID, stName Major stID Major stName Major • stID, CourseID Grade

Second Normal Form (2NF) • 2NF applies to relations with composite keys, that is, relations with a primary key composed of two or more attributes. A relation with a single attribute primary key is automatically in at least 2NF. • A relation that is in first normal form and every non-primary key attribute is fully functionally dependent on the primary key. • The normalization from 1NF relations to 2NF involves the removal of partial dependencies. If a partial dependency exists, we remove the functionally dependent attributes from the relation by placing them in a new relation along with a copy of their determinant.

Example – 2NF • Relation1 is in 2NF because of non-composite primary key. • Relation2 has following functional dependencies: • stID, CourseID Grade (FFD) • CourseID CourseTitle (PD) • CourseID InstName (PD) • CourseID InstLoc (PD)

Result of applying 2NF • Relation1 (stID, stName, stAdr, Major) • Relation2 (stID, CourseID, Grade) • Relation3 (CourseID, CourseTitle, InstName, InstLoc) Relation 2

Result of applying 2NF Relation 3

Steps for converting 1NF to 2NF • The process for transforming a 1NF table to 2NF is: • Identify any determinants other than the composite key, and the columns they determine. • Create and name a new table for each determinant and the unique columns it determines. • Move the determined columns from the original table to the new table. The determinant becomes the primary key of the new table. • Delete the columns you just moved from the original table except for the determinant which will serve as a foreign key. • The original table may be renamed to maintain semantic meaning.

Analysis of 2NF • Update Problem: If we want to change courseTitle from ‘IT Basics’ to ‘Intro. To IT’, we can easily do so as each course title appears only once in Relation 3. • Insertion Problem: If we want to add a new course, we can easily do so in Relation 3, irrespective of whether a student is enrolled in it or not. • Deletion Problem: If we delete student with id S105, the information about course with ID CS503 wont be deleted now. But if we delete a row from Relation3, we may loose data regarding an instructor. So deletion anomaly remains in 2NF.

Third Normal Form

Transitive Dependency • Third normal form is based on transitive dependency. • Transitive Dependency is a condition where A, B and C are attributes of a relation such that if A  B and B  C, then C is transitively dependent on A through B. (Provided that A is not functionally dependent on B or C). • It is a type of functional dependency.

Example: Transitive Dependency • CourseID InstName • InstName InstLoc • So CourseID InstLoc

Third Normal Form • A relation that is in 1NF and 2NF and in which no non-primary-key attribute is transitively dependent on the primary key. • If a transitive dependency exists, we remove the transitively dependent attribute(s) from the relation by placing the attribute(s) in a new relation long with a copy of the determinant.

Example – 3NF • Relation1 and Relation2 are in 3NF. • Relation3 has transitive dependency shown on slide 31.

Result of applying 3NF • Relation1 (stID, stName, stAdr, Major) • Relation2 (stID, CourseID, Grade) • Relation3 (CourseID, CourseTitle, InstName) • Relation4 (InstName, InstLoc)

Result of applying 3NF Relation 3 Relation 4

Steps for converting 2NF to 3NF • The process of transforming a table into 3NF is: • Identify any determinants, other than primary key, and the columns they determine. • Create and name a new table for each determinant and the unique columns it determines. • Move the determined columns from the original table to the new table. The determinant becomes the primary key of the new table. • Delete the columns you just moved from the original table except for the determinant which will serve as a foreign key. • The original table may be renamed to maintain semantic meaning.

Analysis of 3NF • Deletion anomaly no more exists. If we delete a row from Relation3, it will not affect data bout instructors as there is now an individual table (Relation4) containing data about instructors.

Exercise • Normalize the following relations. • WORK (projName, projMgr, empId, hours, empName, budget, startDate, salary, empMgr, empDept) • PROPERTY(PropertyNo, Paddress, InspDate, InspTime, Staff_No, Sname, Comments) • CUSTOMER ORDER (CustName, OrderNo, ProdNo, ProdDesc, Qty, CustAddress, DateOrdered)Assumption: A customer can have multiple orders but an order can be for only 1 product. CustName and OrderNo preassigned as keys.

Boyce-Codd Normal Form

Boyce-Codd Normal Form • Boyce-Codd normal form (BCNF) is based on functional dependencies that take into account all candidate keys in a relation. • Definition: A relation is in BCNF, if and only if, every determinant is a candidate key.

Difference of 3NF and BCNF • The difference between 3NF and BCNF is that for a functional dependency A  B, 3NF allows this dependency in a relation if B is a primary key attribute and A is not a candidate key, whereas BCNF insists that for this dependency to remain in a relation, A must be a candidate key. Therefore, BCNF is a stronger form of 3NF, such that every relation in BCNF is also in 3NF. However, a relation in 3NF is not necessarily in BCNF. • BCNF differs from 3NF only when there are more than one candidate keys and the keys are composite and overlapping.

Violating BCNF • Violation of BCNF is quite rare, since it may only happen under specific conditions. The potential to violate BCNF may occur in a relation that: • Contains two (or more) composite candidate keys; • The candidate keys overlap, that is have at least one attribute in common.

Example- BCNF Relation with sample data Functional dependencies

Example- BCNF (Contd.) • The primary key for this relation is the composite key consisting of SID and Major. Thus the two attributes Advisor and Maj_GPA are functionally dependent on this key. This reflects the constraint that although a given student may have more than one major, for each major a student has exactly one advisor and one GPA. • There is a second functional dependency in this relation: Major is functionally dependent on Advisor. That is, each advisor advises in exactly one major. That means that a key attribute (Major) is functionally dependent on a non-key attribute (Advisor).

Example- BCNF (Anomalies) • The above relation is clearly in 3NF, since there are no partial functional dependencies and no transitive dependencies. • Because of the functional dependency between Major and Advisor, there are anomalies in this relation. • Update Problem: Suppose that in Physics the advisor Hammad is replaced by Adil. This change must be made in two (or more) rows in the table. • Insertion Problem: Suppose we want to insert a row with the information that Ali advises in IT. This cannot be done until at least one student majoring in IT is assigned Ali as an advisor. • Deletion Problem: If student number 789 withdraws from school, we loose the information that Asif advises in Maths.

Converting a relation to BCNF • A relation that is in 3NF can be converted to relations in BCNF using a simple two-step process. • In the first step, the relation is modified so that the determinant in the relation that is not a candidate key becomes a component of the primary key of the revised relation. The attribute that is functionally dependent on that determinant becomes a non-key attribute. • The second step is to decompose the relation to eliminate the partial functional dependency.

Result of applying BCNF

Result of applying BCNF (Contd.) Relation1 Relation2

Normalization

Normalization

Presentation Transcript

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

NORMALIZATION

Normalization

Normalization

Normalization

Normalization