1 / 34

# Chapter 6: Functional Dependencies & Normalization - PowerPoint PPT Presentation

Chapter 6: Functional Dependencies & Normalization. Dr. Hassan Ismail Abdalla. Objectives. Normalisation is a technique for analyzing and modelling data within an organisation It aims to facilitate the use of shared information by reducing the amount of redundancy in stored data

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Chapter 6: Functional Dependencies & Normalization' - lucian

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Chapter 6:Functional Dependencies & Normalization

Dr. Hassan Ismail Abdalla

• Normalisation is a technique for analyzing and modelling data within an organisation

• It aims to facilitate the use of shared information by reducing the amount of redundancy in stored data

• Data normalisation aims to derive record structures which avoid anomalies in

• Insertion (Occurs when it is impossible to store a fact until another fact is known)

• Deletion (Occurs when the deletion of a fact causes other still relevant facts to be deleted)

• Modification(Occurs when a change in a fact causes multiple modifications to be necessary)

• Data normalisation ensures single valuedness of facts

• Usually three steps (in industry) giving rise to

• First Normal Form (1NF)

• Second Normal Form (2NF)

• Third Normal Form (3NF)

• Boyce -Codd Normal Form (BCNF)

• Fourth Normal Form (4NF)

• At each step we consider relationships between an entity's attributes

• These relationships are known asfunctional dependencies

UNORMALISED ENTITY

step1 ...

removerepeatinggroups

1st NORMALFORM

step2 ...

removepartialdependencies

2ndNORMALFORM

step3 ...

removeindirectdependencies

3rdNORMALFORM

step4 ..everydeterminatea key

step4 ...

removemulti-dependencies

4thNORMALFORM

BOYCE-CODDNF

• A number of rules are applied to the tables so that they can be manipulated and redundancy removed

1. The ordering of rows is not significant

2.The ordering of columns is not significant (column has a distinct name)

3.The intersection of each row/column can contain only one value, multiple values are not allowed

4. Each row in a table must be distinct

• The process of normalisation seeks to establish tables which conform to a further more rigorous set of rules.

Relational Rules

In the above table there are number of ways in

which the rules are broken:

• There are multiple values for description e.g. in row 1 Sales, Marketing.

• The ordering of rows is significant, the major cost code 0002 is intended to apply to rows 3 and 4

• Rows 5 and 6 are the same.

• It is important to distinguish between redundancy and duplicated data when considering normalisation

• Duplicated data exists when an attribute has two or more identical values in a table.

• Redundancy exist if data can be deleted without any information being lost.

• Redundancy may be viewed as unnecessary duplication

• If the part description on line 3 of the table is deleted, no information is lost since the description for part number p876 can still be determined from the table.

Supplier No Part No Part Description

s123 p876 fan belt

s125 p873 master cylinder

s125 p876 fan belt

• The redundancy can be eliminated by splitting the above table into two tables

SupplierNoPartNoPartNo Part Description

s123 p876 p876fan belt

s125 p873 p873 master cylinder

s125 p876

It should be noted that no information is lost by representing the original table in

the two separate tables

SUPPLIER

• An entity identifier uniquely determines an occurence on the entity

• A Superkey - a combination of attributes that uniquely identify a row

• When more than one identifier exists we have Candidate identifiers (Keys) - minimal superkey

• Primary Key - designated

Supplier#

• A functional dependency is a constraint between two sets of attributes from the database

• B is functionally dependent on A if a value of A uniquely determines a value of B

When a group of attributes has multiple values then we say there is a repeating group of attributes in the entity

• (BRANCH_NAME, BRANCH_ADDRESS) is a repeating group

• Consider the situation where a customer makes a number of orders to a company.

• This may be represented in a table as follows:

Customer No Customer Name Order No

C123 Aldridge O678

C123 Aldridge O789

C123 Aldridge O791

C131 Archer O649

C131 Archer O682

C151 Grundy 0655

• It can be seen that Order No may be repeated for a given customer. In thissituation Order No is said to be a repeating group.

• In the above exampleOrder No is the only attribute in the repeating group, this is not usually thecase.

• It can be seen that there is redundant duplication presentin the attribute Customer Name.

• The repeating group, and hence the redundancy, can be eliminated by splitting the table into two.

• In order topreserve the amount of information after the split, the two tables must shareat least one attribute.

• In the example below Customer No is present in both tables.

• Splitting results in the following:

CustomerNo CustomerNameCustomerNo OrderNo

C123 Aldridge C123 O678

C131 Archer C123 O789

C151 Grundy C123 O791

C131 O649

C131 O682

C151 0655

• 1. List Data in an Unnormalised Table

• In this stage data items are extracted from the source and listed in a simple tabular format.

• Note the unnormalised tables does not conform to the table rules above.

• For example the following which represents the training record for a company

EmpNo EmpName DeptNo DeptName CourseNo CourseName Rating

123 J Smith 21 Systems S2 SSADM Poor

S3 dBaseIV Average

S5 Data Anal Good

137 D Brown 23 Operations O1 JCL Good

O9 Cobol Good

154 J Patel 21 Systems S2 SSADM Average

• Select a key for the table. Where multiple attributes are necessary to uniquely identify a row, choose the compound key with the minimum number of attributes. EmpNo is selected in the above.

• 2.Remove Repeating Groups (First Normal Form)

• In this stage the attributes which are dependent on and repeat for another given attribute are separated into another table

• This is done by filling in the blank attribute values & then splitting the table

Emp Emp Dept Dept Course Course Rating

No Name No Name No Name

123 J Smith 21 Systems S2 SSADM Poor

123 J Smith 21 Systems S3 dBaseIV Average

123 J Smith 21 Systems S5 Data Anal Good

137 D Brown 23 Operations O1 JCL Good

137 D Brown 23 Operations O9 Cobol Good

154 J Patel 21 Systems S2 SSADM Average

On removing the repeating group the above Figure becomes:

EmpNo EmpName DeptNo DeptName EmpNo CourseNo CourseName Rating

123 Smith 21 Systems123 S2 SSADM Poor

137 D Brown 23 Operations123 S3 dBaseIV Average

154 J Patel 21 Systems123 S5 Data AnalGood

137 O1 JCL Good

137 O9 Cobol Good

(Fig 02)

• The key EmpNo has been incorporated into the table containingthe repeating group to preserve the overall information

• This step will have to be repeated for each repeating group in the table

• Any un-normalised entity type is transformed to 1NF

• Remove all repeating attribute groups

• Repeating attribute groups become new entity types in their own right

• The identifier of the original entity type must be an attribute (but not necessarily an identifier) of the derived entity type.

• Any 'hidden' entities are identified

• A relation is in 2NF if it is in 1NF and each non identifying attribute depends upon the whole identifier

• Remove Part Key Dependencies

• Attributes which are dependent on part of a compound key are put into aseparate table along with that part of the compound key.

• In Fig 02 EmpNo and Course No together may be considered to be a compound key since both are required to identify a row in the table.

• Separating the attributes which are only concerned with Course No gives:

EmpNo CourseNo Rating CourseNoCourseName

123 S3 Average S3 dBaseIV

123 S5 GoodS5 Data Anal

137 O1 Good O1 JCL

137 O9 Good O9 Cobol

154 S2 Average

(Fig 03) (Fig 04)

• A relation is in 3NF if it is in 2NF and all non identifying attributes are independent

• A relation in 2NF is transformed in 3NF

• Determine functional dependencies between non identifying attributes

• Decompose relation into new relations

• Remove Transitive Dependency and Inter-Key Dependency

• Separating attributes which are dependent on another attribute other than the primary key within the table

• Dependency between non-key attributes is known as ‘transitive dependency’

• In Fig (01), it can be seen that Dept Name is dependent on Dept No. Splitting the table in (Fig 01) gives:

EmpNo EmpName DeptNo DeptNo DeptName

123 J Smith 21 21 Systems

137 D Brown 2323 Operations

154 J Patel 21

(Fig 05)(Fig 06)

• Note that Dept No is retained in the table in (Fig 05), to preserve the information content.

• Dept No is an example of a foreign key, since it is of one table and also a key of another table.

• Figures 03, 04, 05 and 06 together represent third normal form.

The example above would give rise to the followingTable Type

Notation:

Unnormalised Form

Emp No, Emp Name, Dept No, Dept Name, (Course No, Course Name, Rating)

First Normal Form

Emp No, Emp Name, Dept No, Dept Name

Emp No, Course No, Course Name, Rating

Second Normal Form

Emp No, Emp Name, Dept No, Dept Name(unchanged from1NF)

Course No, Course Name

Emp No, Course No, Rating

Third Normal Form

Emp No, Emp Name, Dept No*

Dept No, Dept Name

Course No, Course Name (unchanged from 2NF)

Emp No,Course No, Rating (unchanged from 2NF)

Note the conventions employed; parenthesis for a

repeating group,underlining for a key or compound

key, and an asterisk for a foreign key.

• A relation is in BCNF if every determinant is a key

• For a relation with only one candidate key, 3NF and BCNF are equivalent

• Violation of BCNF is rare & may occur in a relations that Contains two (or more) candidate keys

• 3NF is concerned with FDs between primary key and the

nonkey attributes and with transitive dependencies.

• A relation may still have redundancy problems with 3NF as it ignores relationships between or within candidate keys.

• The rule for producing tables in BCNF is that each determinant must be a candidate identifier.

• To achieve this, where a table contains a determinant which is not an identifier, the table is split into two.

• The non-identifying determinant is put into the new table along with those attributes which are dependent on it.

• For example considering a relation, Directory:

(Employee_no, Emp_name, Dept_name, Room_no,Tel_no)

• Where:

• No employee works for more than one department

• Many employees may occupy one room

• Employee numbers are unique, names may not be

• No room is shared by between departments

• Where the following FDs hold:

• Employee_no -> Emp_name, Dept_name, Room_no,Tel_no

• Room_no -> Dept_name

• Here all attributes are are dependent on Employee_no - the primary key

• Room_no is also a determinant but not a candidate key.

• This violates the definition of BCNF and therefore Directory table must be decomposed into two relations

• EMP (Employee_no, Emp_Name, Room_no,Tel_no)

• ALLOC (Room_no, Dept_name)

The amount of unnecessary redundant data is reduced

Data integrity is easily maintained within the database

The database & application design processes are much more flexible

Security is easier to manage

• Produces lots of tables with a relatively small number of columns

• Probably requires joins in order to put the information back together in the way it needs to be used – effectively reversing the normalization

• Impacts computer performance (CPU, I/O, memory)

• Data Normalisation is a bottom-up technique that ensures the basic properties of the relational model

• no duplicate tuples

• no nested relations

• Data normalisation is often used as the only technique for database design - implementation view

• A more appropriate approach is to complement conceptual modeling with data normaliztion