# Normalisation - PowerPoint PPT Presentation  Download Presentation Normalisation

Normalisation
Download Presentation ## Normalisation

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Outline • Boyce-Codd Normal Form (BCNF) • normalisation • non-loss decomposition • Heath’s theorem • normalisation process • semantic assumptions and FDs • CKs • decomposition • normalisation vs dependency preservation • a decomposition may yield to a better solution than another one • either-or situations: normalise or preserve FDs

2. 1

3. 2NF and 3NF • optional • 2NF • a relation is in 2NF if and only if it is in 1NF and all non-key attributes are irreducibly dependent on the candidate keys • 3NF (Zaniolo) • R is a relation; X is any set of attributes of R; A is any single attribute of R; consider the following conditions: • X contains A • X contains a candidate key of R • A is contained in a candidate key of R • if either of the three is true for every FD X  A then R is in 3NF

4. BCNF • a relation is in Boyce/Codd normal form (BCNF) if and only if every non-trivial irreducible FD has a candidate key as its determinant • informally • the determinant of each relevant FD is a CK

5. Example • devise examples in class • relations in BCNF • relations not in BCNF

6. BCNF • any relation can be non-loss decomposed into an equivalent set of BCNF relations • BCNF  3NF  2NF  1NF • BCNF is still not guaranteed to be free of any update anomalies

7. 2

8. Normalisation • the process of transforming a relation with redundancies into an “equivalent” set of relations that have less redundancies • “transformation”  projection • input :: one relation, say R • output :: many relations, say R1, …, Rn • “equivalent”  non-loss decomposition • R1 join R2 … join Rn = R • R1, …, Rn should have normal forms higher than or equal to that of R

9. Non-loss decomposition • (Patient, Symptom, Doctor, Office, Diagnosis) • semantic assumptions • exercise

10. Lossy Decomposition • (Patient, Symptom, Doctor, Office, Diagnosis) • semantic assumptions • exercise

11. Heath’s theorem • can be used as the basis for normalisation • theorem • suppose • R = (A, B, C), where A, B and C are disjoint sets of attributes • AB • then • R = (A, B) join (A, C) • state “in English”

12. Normalisation – rules of thumb • take as basis for normalisation/Heath’s theorem a “problem” FD • maximise B when applying Heath’s theorem, on the basis of AB • try to maintain a one-to-one correspondence with real life entities

13. Normalisation • steps • semantic assumptions • FDs • CKs • decomposition

14. Simple example • (M_id, M_name, Type, Value) • M_id  M_name • M_id  Type • M_id  Value • Type  Value • not BCNF • Heath’s theorem for Type  Value results • (Type, Value) • (M_id, M_name, Type) • both relations are now in BCNF

15. 3

16. Example (R) (project, task, max-budget, duration, payment-rate, contractor, contr-time) FDs: (project, task)  max_budget, duration (task, max_budget, duration)  payment_rate (project, task, contractor)  contr_time (project, task, max-budget, duration, payment-rate, contractor, contr-time)

17. Example – decomposition for R • Heath’s theorem for R (the initial relation) based on • task, max_budget, duration  payment_rate • leads to: • R1 (task, max_budget, duration, payment_rate) • R2 (project, task, max_budget, duration, contractor, contr_time) • R1 is in BCNF • R2 is not in BCNF, due to • project, task  max_budget, duration

18. Example – decomposition for R2 • Heath’s theorem for R2, based on • project, task  max_budget, duration • leads to • R21 (project, task, max_budget, duration) • R22 (project, task, contractor, contracted_time) • R21 is in BCNF • R22 is in BCNF

19. Example – solution • (task, max_budget, duration, payment_rate) • (project, task, max_budget, duration) • (project, task, contractor, contracted_time)

20. 4

21. Decomposition – 2 or more solutions • in the normalisation process, it may be possible that a certain (non-loss) decomposition yields to a better solution than another one

22. Decomposition – 2 solutions – example • Modules(M_id, M_name, Type, Value) • solution #1 • Modules_Descr(M_id, M_name, Type) • Type_Val(Type, Val) • solution #2 • Modules_Descr(M_id, M_name, Type) • Module_Val(M_id, Val) • are they both non-loss? (apply Heath’s theorem) • is there one better than the other?

23. Solution #1 vs Solution #2 • updates • u1: insert the fact that a 3 semester module is worth 1.5cu • u2: modify 1 semester modules; they are not worth 0.5cu any longer, they are 0.75cu • u3: change the type of a module but forget to change its value • solution #2 • u1 and u2 are impossible or very difficult to perform • u3 is allowed • solution #1 • u1 and u2 are straightforward • u3 is not allowed

24. Solution #1 vs Solution #2 • solution #1 • more expressive • certain facts cannot be expressed in solution #2; e.g. the value of a new type • updates can be independently performed on the two component relations (i.e. all constraints are properly expressed) • in solution #2: Type  Value is lost, so this constraint must be enforced by the user by procedural code • independent projections • updates can be performed independently on each projection, without the danger of ending with inconsistent data

25. M-id Type M_name M-id Type M_name Value M_id Value Type Independent projections M_name M-id Type Value Solution #1 Solution #2 one transitive : intra one direct : lost all direct : intra all transitive : inter

26. Independent projections - Risanen • R1 and R2 are two projections of R; R1 and R2 are independent if and only if • every FD in R is a logical consequence of the FDs in R1 and R2 • the common attributes of R1 and R2 for a candidate key for at least one of R1 or R2 • atomic relation • cannot be decomposed into independent projections

27. Dependency preservation • R was decomposed (normalisation) into R1, …, Rn • S - the set of FDs for R • S1, …, Sn - the set of FDs for R1, …, Rn (each Si refers to only the attributes of Ri) • S’ = S1 …  Sn (usually, S’  S) • the decomposition is dependency preserving if S’+ = S+

28. 5

29. Normalisation vs dependency preservation • there are cases when there is an either-or situation regarding the normalisation and the preserving of functional dependencies: • either the relation is normalised and some FDs are lost • or, some FDs are not lost (they are expressed in the original relation), but the relation is not in its higher normal form possible • in this case, no solution is better than the other • other criteria will have to be considered to judge better

30. Patient Doctor Disease Normalisation vs dependency preservation: Example • a patient is treated by a single doctor for a certain disease • each doctor only treats one kind of disease • a doctor can treat more than one patient • is this relation BCNF? • can you identify update anomalies? • consider also (Patient, Disease, Doctor, Treatment) • with Patient, Disease  Treatment

31. Possible decompositions non-loss? (choose PKs) non-loss? (choose PKs) Heath’s theorem (choose PKs)

32. BCNF vs dependency preservation and do not enforce a FD existing in the original specification, namely: e.g. a patient can be given two doctors that treat the same disease (the system will not disallow this); the constraint would have to be maintained by procedural code

33. BCNF vs dependency preservation • not every FD is expressible through normalisation • when the relation was in its original form (3NF) • (Patient, Disease)  Doctor was expressed • a doctor could not be assigned to more than one patient-disease • Doctor  Disease was not expressed • generated update anomalies • in BCNF (decomposed) • Doctor  Disease was expressed • (Patient, Disease)  Doctor was not expressed • generated update anomalies (refer to previous slide) • this latter FD would not have been expressed even if the decomposition in all three 2-attribute relations had been considered

34. Conclusions • normal forms : formalisation of common sense • art  engineering • possibility for automation; difficult, because of non-determinism (more than one choices at one step) • BCNF • always achievable • notalways free of update anomalies, because it cannot always express all the FDs existing in the problem • there are higher normal forms (4NF, 5NF) • defined on the basis of other concepts (not FDs)