200 likes | 685 Views
A Comprehensive Approach to Normalization and Denormalization. by Abdul Rashid Ahmad. Normalization overview. E.F. Codd proposed three normal forms: The first, second, and third normal forms 1NF, 2NF and 3NF are based on the functional dependencies among the attributes of a relation
E N D
A Comprehensive Approach to Normalization and Denormalization by Abdul Rashid Ahmad
Normalization overview • E.F. Codd proposed three normal forms: The first, second, and third normal forms • 1NF, 2NF and 3NF are based on the functional dependencies among the attributes of a relation • Boyce-Codd normal form was proposed by Boyce and Codd and is known as a stronger definition of 3NF
Continued… • In the normalization process, we start with a single universal relation schema (URS) which is a table with collection of attributes present in the set of functional dependencies • Then apply the normal form restrictions to develop the complete relational schema, (decompose the tables as they specify).
Normalization of data • based on set of functional dependencies (FD) and Prime Key(PK) you analyze the relation schemas to achieve two desirable properties: 1) Minimizing data redundancy 2) Minimizing insertion, deletion and update anomalies • Therefore the ideal goal of DB design will be developing a fully normalized relational schema, preferably in BCNF.
Using normal forms • Decomposed the relation when a NF test fails starting from 1NF. • Disclaimers to use of normal forms: 1) Do not guarantee good database design. 2) Sometimes highest normal form may not be the best design for performance reasons.
Normalization Approach • Normalizing the relation into 1NF • Consider this schema for departments: The FD of EMP_DEPT is dnumber -> {dname, dmgrssn} • Problems with this? • department location is multivalued. • Possible solution are: • Make composite primary key {dnumber, dlocation}
Continued… • Problem solved ? • Make attributes for the locations if you know the possible number of locations. • Better solution? • Make a new table with dname and dlocation • This is the preferred solution
Normalizing the relation into 2NF • Based on the concept of a full functional dependency. • A FD X -> Y is a full functional dependency if removal of any attribute from X means that the dependency does not hold any more. • A partial dependency can occur only if the determinate is composite. • Partial dependency, if there is some attribute A in X that can be removed from X and the dependency still holds.
Continued… • A relation is in 2NF if: every non-key attribute is fully functionally dependent on the prime attribute. • If a relation is not in 2NF, it can be normalized into a number of 2NF relations • This involves decomposing the table
EMP_PROJ • Normalize the EMP_PROJ table • Steps: 1) Figure out the dependencies, noting the determinates that is any attribute(s) on which some other attribute(s) are dependent) 2) Put each determinate in a table by itself. 3) Include in each table the attributes that are dependent on that determinate
Normalizing the relation into 3NF EMP_DEPT • Analyze the EMP_DEPT relation: 1) In 1NF ? 2) In 2NF ? 3) Still a problem ? 4)Identify the problem. • The problem with the EMP_DEPT table • DNAME depends on DNUM, not the prime key.(transitive dependency) • X-> Y is a transitive dependency if there is a set of attributes Z of the relation and both X -> Z and Z ->Y hold. • A relation is in 3NF if every non-key attribute is: • In 2NF). • Non transitively dependent on the prime key.
Continued… EMP_DEPT • We can normalize EMP_DEPT by decomposing it into two 3NF relations. • D {R1, R2} where: R1 :EMP (ename, SSN, bdate, address) R2: DEPT(dnumber, dname, dmgrssn) • Intuitively, the two results represent independent entity facts. • To recover the original relation use natural join
Generalizing the definition of 2NF and 3NF • To assure we do not have update anomalies we need to extend this definition to include all candidate key rather than just PK. • Disallowed partial dependencies on any key in 2NF. • Disallowed transitive dependencies on any key in 3NF • Simpler definition for 3NF: • Every nonprime attribute must be: Fully functionally dependent on every key Non transitively dependent on every key
Boyce-Codd Normal Form • Simpler, yet stricter form of 3NF. • A relation is in BCNF if and only if every determinate is a candidate key. • Decomposed into an equivalent set of BCNF if relation is not in BCNF. • If relations have a composite candidate key, one of whose members are determined by a non prime attribute are in 3NF but not in BCNF.
Continued… • Few cases where a database is in 3NF and not in BCNF. • Example: • Define another FD on the Lots DB • All the lots in the DB come from only two counties • In one county, all lots are >= one acre • In the other county, all lots are < one acre • The FD is area -> county_name
Summary • In general, it is best to have a relational schema in BCNF. • If that is not possible, 3NF will do. • 2NF and 1NF are not considered good relation schema designs. • They allow too much data redundancy which leads to update anomalies.
Denormalization • Jovial Remark, “Normalize until it hurts, and denormalize until it works”. • Easier to query but reintroduce data redundancies. • Always improves data retrieval performance. • Example: R: EMPLOYEE Employee of a larger corporation have access to a handful of mutual fund companies for their retirement investment
Continued… • The FD M_fund# -> {Fund_nm, Fund_mgr} causing a violation of 3NF in R. • Decompose R to eliminate the 3ND violation • D {R1, R2} where: R1: FUND (M_fund#, Fund_nm, Fund_mgr); R2: EMPLOYEE (Emp#, Name, Age, Salary, M_fund#) • Query to the normalized relation which is EMPLOYEE requiring retirement financial data lead to joining R1 and R2. • The (denormalized) R is a better option in such situation
Continued… • However, if there are thousands of employees and just a handful of these retirement mutual funds. • Query requiring only mutual fund data will execute rather inefficiently in denormalized design. • In contrary, denormalization might not improve data retrieval performance in certain situation.
References • http://www.cs.sjsu.edu/faculty/lee/cs157/cs157a.html • http://www.cs.montana.edu/~defrance/classes/spring-2003/cs435/lectures/FD.html