Loading in 5 sec....

Normalization Hussam Eldin Najib CS Department An-Najah National UniversityPowerPoint Presentation

Normalization Hussam Eldin Najib CS Department An-Najah National University

- By
**wayde** - Follow User

- 124 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Normalization Hussam Eldin Najib CS Department An-Najah National University' - wayde

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### NormalizationHussam Eldin NajibCS Department An-Najah National University

HUSSAM NAJIB- CS DEPT- AN-NAJAH NATIONAL UN.

Introduction

- The purpose of normalization is to design “good” database schemas; normalized schemas avoid certain types of anomaly
- If we use methodologies based on developing a definition of the schema in terms of the ER model and then mapping such schema in a relational schema, most of these anomalies do not arise
- Very often, however, the normalization process results in a large number of relations with few attributes each; notice as this may result in performance inefficiency, due to the increased need of join operations

job

budget

name

salary

project

Red 2 DB 300 technician

Green 3 WEB 500 designer

Green 3 CAD 1500 designer

Black 7 CAD 1500 director

Black 7 WEB 500 consultant

Black 7 DB 300 consultant

Mori 6 DB 300 director

Mori 6 CAD 1500 designer

White 6 CAD 1500 designer

White 6 DB 300 designer

Redundancy and anomalies

1) redundancies:

- The information concerning the salary of each employee is repeated several times
- The information concerning the budget of each project is repeated several times
- The values of columns ‘name’ and ‘project’ are repeated and therefore each one of these two columns cannot be taken as key
- The key is thus (name, project): we do not have repetitions of information related to the relationship project-employee

Redundancy and anomalies

2) updates:

- Because the information about the salary of each employee is repeated multiple times, if the salary of an employee is updated, such update must be propagated to all the tuples concerning such employee
- The same applies to updates to the budget of projects

Redundancy and anomalies

3) deletions:

- Suppose that an employee resigns from the company or does not participate on any of the company project, then data about some projects can be lost if this employee was the only one working on these projects
- The same happens for employee data if a project is deleted
- If the key is (project, name) in both cases of deletions the key would have null values, which is a situation not admitted

Redundancy and anomalies

4) insertions:

- Because the key is (project, name) it is not possible to insert the data of an employee if such employee has not been assigned to at least a project.
- The same holds for projects. It is not possible to insert the data about a project, if this project does not have at least an employee working on it
- Accepting the insertion of a project (without an employee) or an employee (without a project) results in introducing null values (not compatible with the key)

Redundancy and anomalies

- Such anomalies do not in general arise if one follows a good database design methodology: design of an ER schema and mapping of this schema on the relational model
- It can anyhow happen that deficiencies in the specification or errors in the conceptual shema may result in relations with anomalies
- The most frequent cases are when dealing with database that are not well documented or when trying to understand the nature of the data from documents that describe the information based on paper forms

Redundancy and anomalies

- ER correct
- Mapping onto the relational model
employees (name,salary)

projects (project,budget)

Works-in (name,project,job)

jobs

name

project

(0,n)

(0,n)

projects

employees

budget

salary

job

Problem

- Given a relational schema:
- How can we determine if there the schema has anomalies by only analyzing the schema?
- In the case in which the schema has anomalies, how can we transform this schema in a schema which is equivalent and free of anomalies?

Approach

- Formalization of some knowledge about the domain through some special integrity constraints
- Functional dependencies

- Characterization of the properties of the schemas that do not have anomalies
- Normal forms

- Algorithms to translate schemas into normalized schemas
- There is one such algorithm for each normal form

Functional dependencies

- A functional dependency is an integrity constraint for the relational model
- In the employees-projects example, each employee has a unique salary
- In the table, given an employee, each time such employee appears in a tuple, the value of the salary column is the same

- We can say that the value of the column ‘name’ determines the value of the column ‘salary’, that is:
- There exists a function that associates with each value of the domain of the column ‘name’ one and only one value from the domain of the column ‘salary’

Functional dependencies

- let R(A1,A2,....An) be a relational schema, and let X and Y be subsets of {A1,A2,....An}; we say that X functionally determines Y, denoted by X Y, if for each state r (or extension) of R, it is not possible that two tuples exist in r having equal values for all attributes in X, and different values for the attributes in Y
(in other words, if t1, t2 r and t1.X=t2.X, then t1.Y=t2.Y)

- In our example:
name salary project budget

name project job

Functional dependencies - Example

- Consider a database with the following schema:
Members (Name, Address, Balance)

Orders (Order_no, Name, Item, Quantity)

Suppliers(Sname, Saddress, Item, Price)

- Suppose that each customer has a unique address and balance; suppose that there are no two customers with the same name. Such assumptions can be formalized by the following functional dependency
Name Address Balance

Functional dependencies - Example

- In the Orders relation we can assume that the order_no determines all other attributes, in that different orders have different order numbers; thus
Order_no Name Item Quantity

- In the Suppliers relation, if we assume that a supplier has a unique address and that a supplier supplies a given item at a unique price, we can observe the following dependencies:
Sname Saddress

Sname Item Price

Functional dependencies - Example

- A question is whether dependencies such as:
Saddress Sname or Address Name

are valid dependencies

- The reply depends from the data semantics:
if it is possible that two different customers have the same address, then the dependency

Address Name

is not valid

(given an address, we can find two different customers with different names associated with this address)

Functional dependencies - Example

- In addition some trivial dependencies hold, such as:
1) Name Name

2) Sname Item Item

- such dependencies state that:
1) if two tuples have the same value for Name, they have the same value for Name

2) two tuples having the same values for Item and Sname have the same value for Item

Functional dependencies - Example

- Other non trivial dependencies can be derived from the dependencies we have already specified
- Sname Item Saddress Price
- This dependency states that two tuples, having the same values for Sname and Item, must have the same values for both attributes Saddress and Price
- This dependency is obtained by observing that Sname by itself functionally determines Saddress, whereas Sname and Item together functionally determine Price
- Thus if we fix a value of Sname and a value of Item, we find a single value for Saddress and for Price

Functional dependencies - Closure

- Let F be a set of functional dependencies defined over a relational schema R, and let X Y be a functional dependency, we say that F logically implies X Y, denoted as F |= X Y, if each relation r of schema R that verifies the functional dependencies in F, also verifies X Y
- Example: {A B , B C} |= A C
- The closure of a set F of functional dependencies, denoted as F+, is defined as the set of functional dependencies logically implied by F
F+ = {X Y | F |= X Y}

Functional dependencies - Closure

Example

- Notation: in what follows we will use the compact notation A B C to denote {A B, A C}
- R(A,B,C) and F = {AB, BC}
- F+ consists of all the functional dependencies XY that verify one of the following three conditions
- X contains A,
- X contains B but not A and Y does not contain A,
- X Y is the dependency C C

- F+ contains for example ABC BC, AB BC, A C, BC B

Functional dependencies - Closure

- A set of inference rules has been defined that allows the computation of all the dependencies in F+
- Such rules are sound and complete, that is, they generate all and only the dependencies in F+
- Such set of rules is known as Armstrong’s axioms (from the name of the researcher that defined it in 1974)

Armstrong axioms

- Let Dbe a set of attributes and F be a set of functional dependencies defined over the attributes in D, the inference rules are as follows:
- A1: (reflexivity) If Y X D, then X Y is logically implied by F
such rule generates trivial dependencies, that is dependencies in which the set of attributes in the right hand side of the dependency is a subset of the set of attributes on the left hand side of the dependency; the use of such rule does not depend from F

- A2: (additivity ) If X Y, and Z D, then XZ YZ is logically implied by F
- A3: (transitivity ) If X Y, and Y Z, then
X Z is logically implied by F

- A1: (reflexivity) If Y X D, then X Y is logically implied by F

Rules derived from Armstrong axioms

- It is possible to derive additional rules from the Armstrong axioms
a)Union rule

{X Y, X Z} |= X YZ

b) Pseudotransitivity rule

{X Y, WY Z} |= XWZ

c) Decomposition rule

If Z Y, XY |= X Z

Keys

- The notion of key, that we have already seen, can be formalized by using the functional dependencies
- Let R be a relational schema with attributes A1 A2.....An and F be a set of functional dependencies. Let X be a subset of R. X is a key of R if:
- X A1 A2.....An is in F+
that is, the functional dependency of each attribute of R from X is either given or logically follows from the given functionally dependencies

- There is no Y, proper subset of X, such that
Y A1 A2.....An is in F+

(minimality condition)

- X A1 A2.....An is in F+
- We use the term superkey to denote a superset of the key

Keys - Example

- Person(SSN, FName, LName, BirthDate)
- F={SSN FName LName BirthDate}
(such dependency specifies that the SSN is unique for each person, that is, no two or more tuples exist that have the same value for SSN and different values for the other attributes)

- It is easy to see that SSN is a key for Person, where as (SSN, LastName) is not, in that a proper subset exists (SSN) that functionally determines all the attributes

Keys - Example

- R(A,B,C) and F = {AB, BC}
- The key is A
because: AA, AB, AC (for the transitivity rule) or equivalently AABC

- In addition no proper subset X (not including A) of the attributes of R exists such that
XABC

Closure of a set of attributes

- Let F be a set of functional dependencies defined on a set D of attributes; let X be a subset of D. The closure of X with respect to F is denoted by X+ and is defined as the set of attributes
{A / F |= XA} =

{A / XA follows from F by application of the Armstrong axioms}

- XYF+ if and only if Y X+ with respect to F

Closure of a set of attributes

- The computation of F+ is very expensive
- In the worst case it is exponential in the number of the attributes of the schema

- Computing the transitive closure with respect to a set X of attributes is less expensive
- Very often we are not interested in computing F+ but only in verifying if F+ includes a certain dependency

Closure of a set of attributes - algorithm

- Input: A finite set D of attributes, a set F of functional dependencies, and a set X D
- Output: X+, closure of X with respect to F
- Approach: A sequence of attribute sets X(0), X(1),....... is computed by using the following rules
- X(0) is X
- X(i+1) = X(i) È A , where A is a set of attributes such that:
- A dependency YZ is in F
- A is in Z
- Y X(i)

- because X=X(0) ... X(i) … D and D is finite, at the end we obtain an index i such that X(i)=X(i+1)
- If follows that X(i)=X(i+1)=X(i+2)=....... The algorithm then terminates when X(i)=X(i+1)
- X+ is X(i), where i is such that X(i)=X(i+1)

Closure of a set of attributes - example

- Let F be the following set and suppose we would like to compute the (BD)+
ABC CA BCD ACDB

DEG BEC CGBD CEAG

- X(0)=BD
- X(1): the dependency DEG is used
X(1)=BDEG

- X(2): the dependency BECis used
X(2)=BCDEG

- X(3): we consider CA, BCD, CGBD e CEAG
X(3)=ABCDEG

- (BD)+=X(3)

Equivalence of sets of functional dependencies

- Let F and G be setof functional dependencies, we say that F and G are equivalent if F+=G+
- To determine the equivalence we proceed as follows:
- For each dependency YZ in F, we test if YZ is in G+ by using the attribute closure algorithm to compute Y+ and then verifying if Z Y+
- If some dependency YZ in F is not in G+, then F+G+
- We then check that each functional dependency in G is in F+

Minimal sets

- A set F of functional dependencies is minimal if:
- The right hand side of each dependency in Fis a single attribute
- For no dependency XA in F the set F \ {XA} is equivalent to F
- For no dependency XA in F and no subset Z of X, the set
F \ {XA} U {ZA} is equivalent to F

- Intuitively:
- condition (2) assures that F does not contain redundant dependencies
- condition (3) assures that the left hand side of each dependency does not have redundant attributes
- Because on the right hand side of each rule, there is a single attribute (because of condition (1)), certainly there is no rule having redundancies in the right hand side

- Each set F of functional dependencies is equivalent to a minimal set F'

Minimal sets

- To determine the minimal set we proceed as follows:
- We generate F' from F by replacing each dependency XY where Y has the form A1....An with a set of dependencies of the form XA1, ....., XAn
- We consider each dependency in F' having more that one attribute on the left hand side; if it is possible to eliminate an attribute from the left hand side and still have an equivalent set of functional dependencies, such attribute is eliminated (approach: B in X is redundant with respect to the dependency XA if (X\{B})+ computed with respect to F' contains A)
- Let F" be the set generated by the previous step. We consider each dependency XA in F" according to some order and if F"\ {XA} is equivalent to F", we remove XA from F" (approach: computer X+ with respect to F"\{XA} and if X+ includes A, then XA is redundant)

Minimal sets

- Note that at step (3) the order according to which the functional dependencies are analyzed may result in different minimal sets:
AB AC BA CA BC

it is possible to eliminate BA and AC, or BC but not all three

- Note that at step(3) the order according to which the attributes are analyzed influences the obtained result
ABC AB BA

we can eliminate either A or B from ABC but not both

Minimal sets

- Let F be the following set
ABC CA BCD ACDB

DEG BEC CGBD CEAG

- By simplifying the right hand sides we obtain
ABC CA BCD ACDB DE DG

BEC CGB CGD CEA CEG

- Notice that:
- CEA is redundant in that it is implied by CA
- CGB è is redundant in that CG+ = CGADBE
CGD, CA, and ACDB imply CGB as it can be verified by computing (CG)+

- In addition ACDB can be replaced by CDB because CA

- result:
ABC CA BCD CDB DE DG

BEC CGD CEG

Minimal sets

- It is possible to obtain an other minimal set by eliminating:
CEA, CGD and ACDB

- The resulting minimal set is:
ABC CA BCD DE

DG BEC CGB CEG

Boyce-Codd normal form

- A relational schema R with a set F of functional dependencies is in Boyce-Codd normal form (BCNF) if:
- For each X A in F, such that A is not in X, then X is a superkey of R (that is X is a key or includes the key)

- In other words: all non trivial dependencies for R are dependencies in which a key functionally determines all the other attributes

Boyce-Codd normal form

In our example, the key is (name, project)

- name, project function verifies the BCNF,
- name salary and
project budget

are partial dependencies that cause anomalies

- The redundancies and anomalies arise because of dependencies X Y that introduce repetitions in the relation

Boyce-Codd normal form

The schema CSZ with dependencies F={CS Z, Z C} is not in BCNF

Key CS: CS+ = CSZ

Key ZS: ZS+=ZSC

The dependency Z C violates the BCNF:

Z+ = {Z,C} and thus because Z+ does not include S, Z is not a key

Boyce-Codd normal form

The schemas

Members (Name, Address, Balance)

Orders (Order_no, Name, Item, Quantity)

with the functional dependencies

Name Address Balance

Order_no Name Item Quantity

are in BCNF

Third normal form

- there exists a normal form weaker than BCNF; such normal form is known as third normal form (3NF)
- an attribute A of a relational schema is called
prime if it is an element of some key of R (note: R can have several keys)

- Example: CSZ with F = {CS Z, Z C}
all attributes are prime in that in both CS and SZ are keys

- Example: ABCD with F = {AB C, B D, BC A}
the keys are AB and BC, thus A,B,C are prime, whereas D is not prime

Third normal form

- A relation schema R with dependencies F is in third normal form (3NF) if:
- For each X A in F, such that A is not in X, then:
(i) X is a superkey of R (that is X is a key or includes a key)

or

(ii) A is prime

- For each X A in F, such that A is not in X, then:
- The main difference is that the 3NF admits functional dependencies that do not have a key on the left hand side, as long as the right hand side only has prime attributes

Example

- CSZ with F = {CS Z, Z C}
- is in 3NF in that all attributes are prime

- SAIP with F ={ SI P, S A}
- The only key for this relation is SI
- The attributes prime are {S,I}
- The dependency SI P does not violate the 3NF in that it verifies condition (i)
- The dependency S A violate the 3NF in that S is not a superkey and A is not prime

Example

- Schema R = (S, I, D, M)
S = Store I=Item D=Department M=Manager

- functional dependencies:
- SI D each item is sold by at most one department per store
- SD M each department has only one manager

- the key is SI: it is determined by computing (SI)+
- the dependency SD M violates the 3NF in that SD is not a superkey for R and M is not prime

Decomposition of relational schemas

- The decomposition of a relational schema
R={A1, A2,...,An} is the substitution of R with a collection C={R1, R2,...,Rk} of subsets of the set of attributes of R such that R = R1 U R2 U... U Rk

- The various schemas Ri, i=1,...,k, are not typically disjoint
- If we have a schema which is not normalized, it is possible to decompose it into normalized subschemas

Decomposition of relational schemas - examples

- A decomposition (in BCNF) of the example schema is
employees (name,salary)

projects (project,budget)

worksin (name,project,function)

- a decomposition (in BCNF) of
supplier (sname, saddress, item, price)

with functional dependencies

sname saddress and sname item price

is

SA (sname, saddress) and SIP(sname , item, price)

Decomposition of relational schemas

- It is important however that decompositions have certain properties
- wrong decompositions may generate relations that, combined with the join, produce relations with additional data
- We have thus information loss
- Consider the following example:
- Branches (name, project, branch)
- with dependencies:
- name branch and
- project branch

Decomposition of relational schemas

The key of Branches is:

name, project

project

branch

name

Red DB Milan

Green WEB Turin

Green SE Turin

White CAD Milan

Black SE Turin

If we decompose according to the two dependencies:

Decomposition of relational schemas

employees

projects

name

branch

project

branch

Red Milan

DB Milan

Green Turin

WEB Turin

White Milan

SE Turin

Black Turin

CAD Milan

The join on the common attribute:

SELECT E.name, P.project, P.branch

FROM employees E, projects P

WHERE E.branch = P. branch

Decomposition of relational schemas

project

branch

name

Red DB Milan

It creates

tuples that

did not exist!

Red CAD Milan

Green SE Turin

Green WEB Turin

White DB Milan

White CAD Milan

Black SE Turin

Black WEB Turin

Lossless join decompositions

- Let R be a relational schema, R1,R2,.....Rk be a decomposition of R, and D a set of functional dependencies, we say that the decomposition is lossless join(with respect to D) if for each relation r of schema R that satisfies D we have:
that is r is obtained by the natural join of its projections on the schemas Ri (i=1,..,k)

- If a relation is decomposed, it is important that it can be re-obtained by executing the natural join of the relations into which the initial relation has been decomposed

Dependency preserving decompositions

- Another important property of a decomposition of a schema R in r=(R1,...,Rk) is that the set F of functional dependencies defined for R be preserved by the dependencies defined on the schemas R1,...,Rk
- The reason because dependencies are to be preserved is that they represent some integrity constraints
- R(ABC) AC C-->B
- Decomposition R1(AC) R2(AB)
- R1(AC) AC
- CB ???

Dependency preserving decompositions

- Let F be a set of functional dependencies defined over an attribute set D and let Z D, the projection of F on Z, denoted by PZ(F ), is the set of dependencies XY in F+ such that XY Z
Note that XY must not be necessarily be in F, it is sufficient that be in F+

- A decomposition rpreserves a set F of functional dependencies, if the union all dependencies PRi(F ) (for i=1,....,k) logically implies all the dependencies in F

Dependency preserving decompositions

- Schema R=(CSZ)
with functional dependencies {CSZ, ZC}

and decomposition (lossless join) in R1=SZ and R2=CZ

- the projection of {CSZ, ZC} on SZ contains only the trivial dependencies following from the reflexivity rule (that is, SS and ZZ)
- the projection on CZ contains the trivial dependencies plus the dependency ZC
- the dependency CSZ is not preserved in that the trivial dependencies and the ZC dependency do not imply the CSZ dependency

Dependency preserving decompositions

- Consider the decomposition R1=SZ and R2=CZ
R1 S Z R2 C Z

s1 z1 c1 z1

s1 z2 c1 z2

- Execute the join of R1 and R2
S Z C

s1 z1 c1

s1 z2 c1

- This join violates the CSZ dependency in that the obtained relation has two tuples that have the same value of C and S, and different values for Z

Dependency preserving and lossless join decompositions

- There are algorithms (that we will not see) to verify whether a decomposition is lossless join
- An approach to verify whether a decomposition
r =(R1,......, Rk) preserves a set of dependencies F is the following (deriving from the definition):

- compute F+ and project it on all Ri (i=1,...,k);
- compute the union of the dependencies obtained as projection and check if this set implies all dependencies in F

Dependency preserving and lossless join decompositions

- Such an approach is expensive because the number of dependencies in F+ is exponential in the dimension of F
- There is an algorithm (that we do not see) to verify whether a decomposition preserves the functional dependencies with a cost which polynomial in the dimension of F

Decomposition in BCNF and 3NF

- Each relational schema has a BCNF decomposition that has the lossless join property
- Each relational schema has a decomposition 3NF decomposition that is lossless join and preserves the dependencies
- On the other hand it is possible that, given a relational schema, no lossless join BCNF decomposition exists for this schema which preserves the dependencies
- Example: R=(CSZ) F = {CSZ, ZC}
the decomposition in R1=SZ and R2=CZ does not preserve the functional dependency CSZ

- If no decomposition BCNF exists preserving the dependencies, it is preferable to use the 3NF

An algorithm for lossless join BCNF decomposition

- Input: A relational schema R and a set F of functional dependencies
- Output: A decomposition of R which is lossless join, such that each schema obtained from the decomposition is in BCNF with respect to the projections of F on such schemas
- Method: A decomposition r for R is iteratively built

Algorithm for lossless join decomposition in BCNF

- At each step r is lossless join with respect to F
- Initially, r contains only R
- (*) If S is a schema in r and S is not in BCNF,
- let XA be a dependency that holds for S, such that X is not a superkey of S and A is not in X
- S is replaced in r with two schemas S1 and S2 such that
- S1 consists of the attributes in X and A
- S2 consists of the attributes of S minus A

- (the projections of F+ on S1 and on S2 are computed)

- The step (*) is repeated until each schema in r is in BCNF

Example

- R=CTHRSG
C=course, T=teacher , H=hour, S=student G=grade R=room

- The following dependencies hold:
- CT each course has only one teacher
- HRC only one course can be given in a given classroom and hour
- HTR a teacher cannot be in two classrooms at the same hour
- CSG each student receives a single grade for each exam
- HSR a student cannot be in two different classrooms at the same time

- Such schema has only one key, that is, HS

Example

- Consider the dependency CSG that violates the BCNF
- By applying the algorithm, we decompose R in two schemas
S1=(CSG) and S2=(CTHRS)

note: given a dependency XA, S1=X U A S2=R-A

- The projection of F+ on CSG and CTHRS are computed
- PCSG(F ) = {CSG} key CS
- PCTHRS(F ) = {CT, HRC, THR, HSR} key HS

- CSG is in BCNF

Example

- CTHRS is not in BCNF
- Consider the dependency CT that violates the BCNF and let’s decompose CTHRS in CT and CHRS
- PCT(F ) = {CT} key C
- PCHRS(F ) = {CHR, HSR, HRC} key HS
notice that CHR is required in the projection on CHRS but non in CTHRS in that in CTHRS the dependency is implied by CT and THR

- CT is in BCNF
- CHRS is not in BCNF
- Let’s consider the dependency CHR and let’s decompose CHRS in
- CHR with {CHR, HRC}, with keys {CH, HR}
- CHS with {HSC}, with keys SH

- The obtained decomposition is in BCNF

Decomposition tree

CTHRSG

Key = HS

C T

CS G

HR C

HS R

TH R

CTHRS

Key = HS

C T

HR C

HS R

TH R

CSG

Key = CS

CS G

CHRS

Key = HS

HR C

HS R

CH R

CT

Key = C

C T

CHR

Key = CH, HR

HR C

CH R

CHS

Key = HS

HS C

Example

- The final decomposition is:
- CSG: contains the grade for each student for each course
- CT: contains the teacher for each course
- CHR: contains for each course the hour and classroom
- CHS: contains for each student the courses and the hours

- Notice that different decompositions can be obtained depending on the order according to which the dependencies are considered
- Example: consider the last step
- CHRS has been decomposed in CHR and CHS based on the use of the CHR dependency
- If the HRC dependency had been used, one would have obtained CHR and HRS

Example

- The THR dependency is not preserved by the obtained decomposition
- The union of the projections ofF on CSG, CT, CHR, and CHS is (in minimal form) {CSG, HRC, CT, CHR,HSC} and does notimply THR
- For example the relation CTHRSG
C T H R S G

c1 t h r1 s1 g1

c2 t h r2 s2 g2

does not verify THR, even though its projections on CSG, CT, CHR, and CHS verify all the dependencies obtained by the projection

Algorithm for lossless join decomposition in 3NF

- Input: A relational schema R and a minimal set F of functional dependencies
- Output: A lossless join decomposition of R such that each schema is in 3NF with respect to F and each dependency is preserved
- Method: i=0; S=Æ
- for each functional dependency XA in F do begin
- i:=i+1;
- Ri := XA;
- S := S È {Ri};
end

- Optionally: the relations of schemas (XA1),(XA2),...,(XAn) obtained from the dependencies of the form XA1, XA2, ..., XAn can be replaced from the relation of schema (X A1 A2 ... An)

- for each functional dependency XA in F do begin

Algorithm for lossless join decomposition in 3NF

- if no schema Rk (1 £ k £ i) contains a candidate key then begin
- i:=i+1;
- Ri:= any candidate key of R
- S := S È {Ri};
end

- for each Ri (RiÎ S)
- if exists RkÎ S, i k, such that schema(Rk) É schema(Ri), then S:=S-{Ri};

- return (S)

Example: decomposition in 3NF

- R=CTHRSG
- F = {CT, HRC, HTR,CSG, HSR}
key HS

- First part of the algorithm:
R1=CT R2=CHR R3=HRT

R4=CGS R5=HRS

- Second part: we check that at least a schema exists that includes the key
- R5 contains the key; therefore we do not need to create the additional relation

Example

S = (J,K,L) with F = {JKL, LK} keys JK, JL

- Decomposition in BCNF
- S is not in BCNF
- Consider the dependency LK that violates the BCNF
- We obtain R1=(LK), R2=(JL)
- Such schema does not preserve the dependency JKL

- Decomposition in 3NF
- the schema S is already in 3NF in that
- JKL has on the left hand side a key
- LK has on the right hand side a prime attribute

- the schema S is already in 3NF in that

Example

Schema:

(branch-name,assets,branch-city,loan-number,customer-name,amount)

- branch-name assets
branch-name branch-city

loan-number amount

loan-number branch-name

- The key is (loan-number, customer-name)

Example

- The schema R is not in BCNF
1) consider a dependency that violates the BCNF

branch-name assets

- R is replaced by
- R1 = (branch-name, assets)
- R2 =(branch-name, branch-city, loan-number, customer-name, amount)

- R1 is in BCNF with key branch-name
- R2 is not in BCNF

- R is replaced by

Example

2) consider the dependency

branch-name branch-city

- R2 is replaced by
- R3 = (branch-name, branch-city)
- R4 = (branch-name, loan-number, customer-name, amount)

- R3 is in BCNF with key branch-name
- R4 is not in BCNF

Example Final schema: R1, R3, R5, R7, R8

3) consider the dependency

loan-number amount

- R4 is replaced by
- R5 = (loan-number, amount)
- R6 = (branch-name, loan-number, customer-name)

- R5 is in BCNF with key loan-number
- R6 is not in BCNF
4) consider the dependency

loan-number branch-name

- R6 is replaced by
- R7 = (loan-number, branch-name)
- R8 = (loan-number, customer-name)

Download Presentation

Connecting to Server..