Normalisation

1 / 36

# Normalisation - PowerPoint PPT Presentation

Normalisation. 5. Outline. Boyce-Codd Normal Form (BCNF) normalisation non-loss decomposition Heath’s theorem normalisation process semantic assumptions and FDs CKs decomposition normalisation vs dependency preservation a decomposition may yield to a better solution than another one

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Normalisation' - Patman

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Normalisation

5

Outline
• Boyce-Codd Normal Form (BCNF)
• normalisation
• non-loss decomposition
• Heath’s theorem
• normalisation process
• semantic assumptions and FDs
• CKs
• decomposition
• normalisation vs dependency preservation
• a decomposition may yield to a better solution than another one
• either-or situations: normalise or preserve FDs
2NF and 3NF
• optional
• 2NF
• a relation is in 2NF if and only if it is in 1NF and all non-key attributes are irreducibly dependent on the candidate keys
• 3NF (Zaniolo)
• R is a relation; X is any set of attributes of R; A is any single attribute of R; consider the following conditions:
• X contains A
• X contains a candidate key of R
• A is contained in a candidate key of R
• if either of the three is true for every FD X  A then R is in 3NF
BCNF
• a relation is in Boyce/Codd normal form (BCNF) if and only if every non-trivial irreducible FD has a candidate key as its determinant
• informally
• the determinant of each relevant FD is a CK
Example
• devise examples in class
• relations in BCNF
• relations not in BCNF
BCNF
• any relation can be non-loss decomposed into an equivalent set of BCNF relations
• BCNF  3NF  2NF  1NF
• BCNF is still not guaranteed to be free of any update anomalies
Normalisation
• the process of transforming a relation with redundancies into an “equivalent” set of relations that have less redundancies
• “transformation”  projection
• input :: one relation, say R
• output :: many relations, say R1, …, Rn
• “equivalent”  non-loss decomposition
• R1 join R2 … join Rn = R
• R1, …, Rn should have normal forms higher than or equal to that of R
Non-loss decomposition
• (Patient, Symptom, Doctor, Office, Diagnosis)
• semantic assumptions
• exercise
Lossy Decomposition
• (Patient, Symptom, Doctor, Office, Diagnosis)
• semantic assumptions
• exercise
Heath’s theorem
• can be used as the basis for normalisation
• theorem
• suppose
• R = (A, B, C), where A, B and C are disjoint sets of attributes
• AB
• then
• R = (A, B) join (A, C)
• state “in English”
Normalisation – rules of thumb
• take as basis for normalisation/Heath’s theorem a “problem” FD
• maximise B when applying Heath’s theorem, on the basis of AB
• try to maintain a one-to-one correspondence with real life entities
Normalisation
• steps
• semantic assumptions
• FDs
• CKs
• decomposition
Simple example
• (M_id, M_name, Type, Value)
• M_id  M_name
• M_id  Type
• M_id  Value
• Type  Value
• not BCNF
• Heath’s theorem for Type  Value results
• (Type, Value)
• (M_id, M_name, Type)
• both relations are now in BCNF
Example (R)

(project, task, max-budget, duration, payment-rate, contractor, contr-time)

FDs:

(project, task, max-budget, duration, payment-rate, contractor, contr-time)

Example – decomposition for R
• Heath’s theorem for R (the initial relation) based on
• task, max_budget, duration  payment_rate
• R1 (task, max_budget, duration, payment_rate)
• R2 (project, task, max_budget, duration, contractor, contr_time)
• R1 is in BCNF
• R2 is not in BCNF, due to
• project, task  max_budget, duration
Example – decomposition for R2
• Heath’s theorem for R2, based on
• project, task  max_budget, duration
• R21 (project, task, max_budget, duration)
• R22 (project, task, contractor, contracted_time)
• R21 is in BCNF
• R22 is in BCNF
Example – solution
Decomposition – 2 or more solutions
• in the normalisation process, it may be possible that a certain (non-loss) decomposition yields to a better solution than another one
Decomposition – 2 solutions – example
• Modules(M_id, M_name, Type, Value)
• solution #1
• Modules_Descr(M_id, M_name, Type)
• Type_Val(Type, Val)
• solution #2
• Modules_Descr(M_id, M_name, Type)
• Module_Val(M_id, Val)
• are they both non-loss? (apply Heath’s theorem)
• is there one better than the other?
Solution #1 vs Solution #2
• u1: insert the fact that a 3 semester module is worth 1.5cu
• u2: modify 1 semester modules; they are not worth 0.5cu any longer, they are 0.75cu
• u3: change the type of a module but forget to change its value
• solution #2
• u1 and u2 are impossible or very difficult to perform
• u3 is allowed
• solution #1
• u1 and u2 are straightforward
• u3 is not allowed
Solution #1 vs Solution #2
• solution #1
• more expressive
• certain facts cannot be expressed in solution #2; e.g. the value of a new type
• updates can be independently performed on the two component relations (i.e. all constraints are properly expressed)
• in solution #2: Type  Value is lost, so this constraint must be enforced by the user by procedural code
• independent projections
• updates can be performed independently on each projection, without the danger of ending with inconsistent data

M-id

Type

M_name

M-id

Type

M_name

Value

M_id

Value

Type

Independent projections

M_name

M-id

Type

Value

Solution #1

Solution #2

one transitive : intra

one direct : lost

all direct : intra

all transitive : inter

Independent projections - Risanen
• R1 and R2 are two projections of R; R1 and R2 are independent if and only if
• every FD in R is a logical consequence of the FDs in R1 and R2
• the common attributes of R1 and R2 for a candidate key for at least one of R1 or R2
• atomic relation
• cannot be decomposed into independent projections
Dependency preservation
• R was decomposed (normalisation) into R1, …, Rn
• S - the set of FDs for R
• S1, …, Sn - the set of FDs for R1, …, Rn (each Si refers to only the attributes of Ri)
• S’ = S1 …  Sn (usually, S’  S)
• the decomposition is dependency preserving if S’+ = S+
Normalisation vs dependency preservation
• there are cases when there is an either-or situation regarding the normalisation and the preserving of functional dependencies:
• either the relation is normalised and some FDs are lost
• or, some FDs are not lost (they are expressed in the original relation), but the relation is not in its higher normal form possible
• in this case, no solution is better than the other
• other criteria will have to be considered to judge better

Patient

Doctor

Disease

Normalisation vs dependency preservation: Example
• a patient is treated by a single doctor for a certain disease
• each doctor only treats one kind of disease
• a doctor can treat more than one patient
• is this relation BCNF?
• can you identify update anomalies?
• consider also (Patient, Disease, Doctor, Treatment)
• with Patient, Disease  Treatment
Possible decompositions

non-loss? (choose PKs)

non-loss? (choose PKs)

Heath’s theorem (choose PKs)

BCNF vs dependency preservation

and

do not enforce a FD existing in the original specification, namely:

e.g. a patient can be given two doctors that treat the same disease (the system

will not disallow this); the constraint would have to be maintained by procedural code

BCNF vs dependency preservation
• not every FD is expressible through normalisation
• when the relation was in its original form (3NF)
• (Patient, Disease)  Doctor was expressed
• a doctor could not be assigned to more than one patient-disease
• Doctor  Disease was not expressed
• generated update anomalies
• in BCNF (decomposed)
• Doctor  Disease was expressed
• (Patient, Disease)  Doctor was not expressed
• generated update anomalies (refer to previous slide)
• this latter FD would not have been expressed even if the decomposition in all three 2-attribute relations had been considered
Conclusions
• normal forms : formalisation of common sense
• art  engineering
• possibility for automation; difficult, because of non-determinism (more than one choices at one step)
• BCNF
• always achievable
• notalways free of update anomalies, because it cannot always express all the FDs existing in the problem
• there are higher normal forms (4NF, 5NF)
• defined on the basis of other concepts (not FDs)