Pmit 6102 advanced database systems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

PMIT-6102 Advanced Database Systems PowerPoint PPT Presentation


  • 142 Views
  • Uploaded on
  • Presentation posted in: General

PMIT-6102 Advanced Database Systems. By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University. Lecture 03 Relational Database Design Normalization-Part-1. Outline. Overview of Relational DBMS Normalization(1 st lecture). Normalization.

Download Presentation

PMIT-6102 Advanced Database Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Pmit 6102 advanced database systems

PMIT-6102Advanced Database Systems

By-

JesminAkhter

Assistant Professor, IIT, Jahangirnagar University


Lecture 03 relational database design normalization part 1

Lecture 03 Relational Database DesignNormalization-Part-1


Outline

Outline

  • Overview of Relational DBMS

    • Normalization(1st lecture)


Pmit 6102 advanced database systems

Normalization

  • The aim of normalization is to eliminate various anomalies (or undesirable aspects) of a relation in order to obtain “better” relations.

  • The following four problems might exist in a relation scheme:

    • Repetition anomaly

    • Update anomaly

    • Insertion anomaly

    • Deletion anomaly


Repetition anomaly

Repetition Anomaly

  • The NAME,TITLE, SAL attribute values are repeated for each project that the employee is involved in.

    • Waste of space

    • Complicates updates

    • Contrary to the spirit of databases

EMP

ENO

ENAME

TITLE

SAL

PNO

RESP

DUR

P1

Manager

12

E1

J. Doe

Elect. Eng.

40000

P1

Analyst

24

E2

M. Smith

Analyst

34000

P2

Analyst

6

E2

M. Smith

Analyst

34000

E3

A. Lee

Mech. Eng.

27000

P3

Consultant

10

E3

A. Lee

Mech. Eng.

27000

P4

Engineer

48

E4

J. Miller

Programmer

24000

P2

Programmer

18

E5

B. Casey

Syst. Anal.

34000

P2

Manager

24

P4

Manager

48

E6

L. Chu

Elect. Eng.

40000

P3

Engineer

36

E7

R. Davis

Mech. Eng.

27000

Syst. Anal.

E8

J. Jones

34000

P3

Manager

40


Update anomaly

Update Anomaly

  • If any attribute of project (say SAL of an employee) is updated, multiple tuples have to be updated to reflect the change.

EMP

ENO

ENAME

TITLE

SAL

PNO

RESP

DUR

P1

Manager

12

E1

J. Doe

Elect. Eng.

40000

P1

Analyst

24

E2

M. Smith

Analyst

34000

P2

Analyst

6

E2

M. Smith

Analyst

34000

E3

A. Lee

Mech. Eng.

27000

P3

Consultant

10

E3

A. Lee

Mech. Eng.

27000

P4

Engineer

48

E4

J. Miller

Programmer

24000

P2

Programmer

18

E5

B. Casey

Syst. Anal.

34000

P2

Manager

24

E6

L. Chu

Elect. Eng.

40000

P4

Manager

48

P3

Engineer

36

E7

R. Davis

Mech. Eng.

27000

Syst. Anal.

E8

J. Jones

34000

P3

Manager

40


Insertion anomaly

Insertion Anomaly

  • It may not be possible to store information about a new project until an employee is assigned to it.

EMP

ENO

ENAME

TITLE

SAL

PNO

RESP

DUR

P1

Manager

12

E1

J. Doe

Elect. Eng.

40000

P1

Analyst

24

E2

M. Smith

Analyst

34000

P2

Analyst

6

E2

M. Smith

Analyst

34000

E3

A. Lee

Mech. Eng.

27000

P3

Consultant

10

E3

A. Lee

Mech. Eng.

27000

P4

Engineer

48

E4

J. Miller

Programmer

24000

P2

Programmer

18

E5

B. Casey

Syst. Anal.

34000

P2

Manager

24

E6

L. Chu

Elect. Eng.

40000

P4

Manager

48

P3

Engineer

36

E7

R. Davis

Mech. Eng.

27000

Syst. Anal.

E8

J. Jones

34000

P3

Manager

40


Deletion anomaly

Deletion Anomaly

  • If an engineer, who is the only employee on a project, leaves the company, his personal information cannot be deleted, or the information about that project is lost.

  • May have to delete many tuples.

EMP

ENO

ENAME

TITLE

SAL

PNO

RESP

DUR

P1

Manager

12

E1

J. Doe

Elect. Eng.

40000

P1

Analyst

24

E2

M. Smith

Analyst

34000

P2

Analyst

6

E2

M. Smith

Analyst

34000

E3

A. Lee

Mech. Eng.

27000

P3

Consultant

10

E3

A. Lee

Mech. Eng.

27000

P4

Engineer

48

E4

J. Miller

Programmer

24000

P2

Programmer

18

E5

B. Casey

Syst. Anal.

34000

P2

Manager

24

E6

L. Chu

Elect. Eng.

40000

P4

Manager

48

P3

Engineer

36

E7

R. Davis

Mech. Eng.

27000

Syst. Anal.

E8

J. Jones

34000

P3

Manager

40


What to do

What to do?

  • Take each relation individually and “improve” it in terms of the desired characteristics

    • Normal forms

      • Atomic values (1NF)

      • Can be defined according to keys and dependencies.

      • Functional Dependencies ( 2NF, 3NF, BCNF)

      • Multivalued dependencies (4NF)

    • Normalization

      • Normalization is a process of concept separationwhich applies a top-down methodology for producing a schema by subsequent refinements and decompositions.

      • Do not combine unrelated sets of facts in one table; each relation should contain an independent set of facts.

      • Universal relation assumption


Normalization issues

Normalization Issues

  • How do we decompose a schema into a desirable normal form?

  • What criteria should the decomposed schemas follow in order to preserve the semantics of the original schema?

    • Reconstructability: recover the original relation  no spurious joins

    • Lossless decomposition: no information loss

    • Dependency preservation: the constraints (i.e., dependencies) that hold on the original relation should be enforceable by means of the constraints (i.e., dependencies) defined on the decomposed relations.


A combined schema without repetition

Consider combining relations

sec_class(sec_id, building, room_number) and

section(course_id, sec_id, semester, year)

into one relation

section(course_id, sec_id, semester, year, building, room_number)

No repetition in this case

A Combined Schema Without Repetition


What about smaller schemas

Suppose we had started with inst_dept. How would we know to split up (decompose) it into instructor and department?

Write a rule “if there were a schema (dept_name, building, budget), then dept_name would be a candidate key”

Denote as a functional dependency:

dept_namebuilding, budget

In inst_dept, because dept_name is not a candidate key, the building and budget of a department may have to be repeated.

This indicates the need to decompose inst_dept

Not all decompositions are good. Suppose we decomposeemployee(ID, name, street, city, salary) into

employee1 (ID, name)

employee2 (name, street, city, salary)

The next slide shows how we lose information -- we cannot reconstruct the original employee relation -- and so, this is a lossy decomposition.

What About Smaller Schemas?


A lossy decomposition

A Lossy Decomposition


Example of lossless join decomposition

Lossless join decomposition

Decomposition of R = (A, B, C)R1 = (A, B)R2 = (B, C)

Example of Lossless-Join Decomposition

A

B

C

A

B

B

C

1

2

A

B

1

2

1

2

A

B

r

B,C(r)

A,B(r)

A

B

C

A (r) B (r)

1

2

A

B


Pmit 6102 advanced database systems

Stages of Normalization

Remove repeating groups

First normal form

(1NF)

Remove partial dependencies

Second normal form

(2NF)

Remove transitive dependencies

Third normal form

(3NF)

Remove remaining functional

dependency anomalies

Boyce-Codd normal

form (BCNF)

Remove multivalued dependencies

Fourth normal form

(4NF)

Remove remaining anomalies

Fifth normal form

(5NF)

Unnormalized

(UDF)


Repeating groups

A repeating group is an attribute (or set of attributes) that can have more than one value for a primary key value.

Repeating Groups

Example We have the following relation that contains staff and department details and a list of telephone contact numbers for each member of staff.

Repeating Groups are not allowed in a relational design, since all attributes have to be ‘atomic’ - i.e., there can only be one value per cell in a table!


Repeating groups1

Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part).

Repeating Groups

STUDENT


Functional dependency

Formal Definition: Attribute B is functionally dependant upon attribute A (or a collection of attributes) if a value of A determines a single value of attribute B at any one time.

Example:

Functional Dependencies

staffNo job

staffNo dept

staffNo dname

dept dname

staffNojob dept dname

SL10 Salesman 10 Sales

SA51 Manager 20 Accounts

DS40 Clerk 20 Accounts

OS45 Clerk 30 Operations

Functional Dependency

Formal Notation: A  B This should be read as ‘AdeterminesB’ or ‘B is functionally dependant on A’. A is called the determinant and B is called the object of the determinant.


Functional dependency1

Full Functional Dependency: Only of relevance with composite determinants. This is the situation when it is necessary to use all the attributes of the composite determinant to identify its object uniquely.

Example:

Full Functional Dependencies

order# line# qty price

A001 001 10 200

A002 001 20 400

A002 002 20 800

A004 001 15 300

(Order#, line#)  qty

(Order#, line#)  price

Functional Dependency

Compound Determinants: If more than one attribute is necessary to determine another attribute in an entity, then such a determinant is termed a composite determinant.


Functional dependency2

Partial Functional Dependency: This is the situation that exists if it is necessary to only use a subset of the attributes of the composite determinant to identify its object uniquely.

Full Functional Dependencies

(student#, unit#)  grade

Partial Functional Dependencies

unit#  room

Repetition of data!

Functional Dependency


Functional dependency3

Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key.

Functional Dependency

Partial Dependency


Transitive dependency

Definition: A transitive dependency exists when there is an intermediate functional dependency.

Transitive Dependencies

staffNo dept

dept dname

staffNodeptdname

Repetition of data!

Transitive Dependency

Formal Notation: If A  B and B  C, then it can be stated that the following transitive dependency exists: A  B  C

Example:


Transitive dependency1

Transitive Dependency – when a non-key attribute determines another non-key attribute.

Transitive Dependency

Transitive Dependency


Normal forms review

Unnormalized – There are multivalued attributes or repeating groups

1 NF – No multivalued attributes or repeating groups.

2 NF – 1 NF plus no partial dependencies

3 NF – 2 NF plus no transitive dependencies

Normal Forms: Review


Example 1 determine nf

ISBN  Title

ISBN  Publisher

Publisher  Address

Example 1: Determine NF

All attributes are directly or indirectly determined by the primary key; therefore, the relation is at least in 1 NF


Example 1 determine nf1

ISBN  Title

ISBN  Publisher

Publisher  Address

Example 1: Determine NF

The relation is at least in 1NF. There is no COMPOSITE primary key, therefore there can’t be partial dependencies. Therefore, the relation is at least in 2NF


Example 1 determine nf2

ISBN  Title

ISBN  Publisher

Publisher  Address

Example 1: Determine NF

Publisher is a non-key attribute, and it determines Address, another non-key attribute. Therefore, there is a transitive dependency, which means that the relation is NOT in 3 NF.


Example 1 determine nf3

ISBN  Title

ISBN  Publisher

Publisher  Address

Example 1: Determine NF

We know that the relation is at least in 2NF, and it is not in 3 NF. Therefore, we conclude that the relation is in 2NF.


Example 1 determine nf4

ISBN  Title

ISBN  Publisher

Publisher  Address

Example 1: Determine NF

  • In your solution you will write the following justification:

  • No M/V attributes, therefore at least 1NF

  • No partial dependencies, therefore at least 2NF

  • There is a transitive dependency (Publisher  Address), therefore, not 3NF

  • Conclusion: The relation is in 2NF


Example 2 determine nf

Product_ID  Description

Example 2: Determine NF

All attributes are directly or indirectly determined by the primary key; therefore, the relation is at least in 1 NF


Example 2 determine nf1

Product_ID  Description

Example 2: Determine NF

The relation is at least in 1NF.

There is a COMPOSITE Primary Key (PK) (Order_No, Product_ID), therefore there can be partial dependencies. Product_ID, which is a part of PK, determines Description; hence, there is a partial dependency. Therefore, the relation is not 2NF. No sense to check for transitive dependencies!


Example 2 determine nf2

Product_ID  Description

Example 2: Determine NF

We know that the relation is at least in 1NF, and it is not in 2 NF. Therefore, we conclude that the relation is in 1 NF.


Example 2 determine nf3

Product_ID  Description

Example 2: Determine NF

In your solution you will write the following justification:

1) No M/V attributes, therefore at least 1NF

2) There is a partial dependency (Product_ID  Description), therefore not in 2NF

Conclusion: The relation is in 1NF


Thank you

Thank You


  • Login