normalisation l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Normalisation PowerPoint Presentation
Download Presentation
Normalisation

Loading in 2 Seconds...

play fullscreen
1 / 36

Normalisation - PowerPoint PPT Presentation


  • 217 Views
  • Uploaded on

Normalisation. 5. Outline. Boyce-Codd Normal Form (BCNF) normalisation non-loss decomposition Heath’s theorem normalisation process semantic assumptions and FDs CKs decomposition normalisation vs dependency preservation a decomposition may yield to a better solution than another one

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Normalisation' - Patman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Boyce-Codd Normal Form (BCNF)
  • normalisation
    • non-loss decomposition
    • Heath’s theorem
    • normalisation process
      • semantic assumptions and FDs
      • CKs
      • decomposition
  • normalisation vs dependency preservation
      • a decomposition may yield to a better solution than another one
      • either-or situations: normalise or preserve FDs
2nf and 3nf
2NF and 3NF
  • optional
    • 2NF
      • a relation is in 2NF if and only if it is in 1NF and all non-key attributes are irreducibly dependent on the candidate keys
    • 3NF (Zaniolo)
      • R is a relation; X is any set of attributes of R; A is any single attribute of R; consider the following conditions:
        • X contains A
        • X contains a candidate key of R
        • A is contained in a candidate key of R
      • if either of the three is true for every FD X  A then R is in 3NF
slide5
BCNF
  • a relation is in Boyce/Codd normal form (BCNF) if and only if every non-trivial irreducible FD has a candidate key as its determinant
  • informally
    • the determinant of each relevant FD is a CK
example
Example
  • devise examples in class
    • relations in BCNF
    • relations not in BCNF
slide7
BCNF
  • any relation can be non-loss decomposed into an equivalent set of BCNF relations
  • BCNF  3NF  2NF  1NF
  • BCNF is still not guaranteed to be free of any update anomalies
normalisation9
Normalisation
  • the process of transforming a relation with redundancies into an “equivalent” set of relations that have less redundancies
  • “transformation”  projection
    • input :: one relation, say R
    • output :: many relations, say R1, …, Rn
  • “equivalent”  non-loss decomposition
    • R1 join R2 … join Rn = R
  • R1, …, Rn should have normal forms higher than or equal to that of R
non loss decomposition
Non-loss decomposition
  • (Patient, Symptom, Doctor, Office, Diagnosis)
      • semantic assumptions
  • exercise
lossy decomposition
Lossy Decomposition
  • (Patient, Symptom, Doctor, Office, Diagnosis)
      • semantic assumptions
  • exercise
heath s theorem
Heath’s theorem
  • can be used as the basis for normalisation
  • theorem
    • suppose
      • R = (A, B, C), where A, B and C are disjoint sets of attributes
      • AB
    • then
      • R = (A, B) join (A, C)
  • state “in English”
normalisation rules of thumb
Normalisation – rules of thumb
  • take as basis for normalisation/Heath’s theorem a “problem” FD
  • maximise B when applying Heath’s theorem, on the basis of AB
  • try to maintain a one-to-one correspondence with real life entities
normalisation14
Normalisation
  • steps
    • semantic assumptions
    • FDs
    • CKs
    • decomposition
simple example
Simple example
  • (M_id, M_name, Type, Value)
      • M_id  M_name
      • M_id  Type
      • M_id  Value
      • Type  Value
      • not BCNF
    • Heath’s theorem for Type  Value results
  • (Type, Value)
  • (M_id, M_name, Type)
    • both relations are now in BCNF
example r
Example (R)

(project, task, max-budget, duration, payment-rate, contractor, contr-time)

FDs:

(project, task)  max_budget, duration

(task, max_budget, duration)  payment_rate

(project, task, contractor)  contr_time

(project, task, max-budget, duration, payment-rate, contractor, contr-time)

example decomposition for r
Example – decomposition for R
  • Heath’s theorem for R (the initial relation) based on
    • task, max_budget, duration  payment_rate
  • leads to:
  • R1 (task, max_budget, duration, payment_rate)
  • R2 (project, task, max_budget, duration, contractor, contr_time)
  • R1 is in BCNF
  • R2 is not in BCNF, due to
    • project, task  max_budget, duration
example decomposition for r 2
Example – decomposition for R2
  • Heath’s theorem for R2, based on
    • project, task  max_budget, duration
  • leads to
  • R21 (project, task, max_budget, duration)
  • R22 (project, task, contractor, contracted_time)
  • R21 is in BCNF
  • R22 is in BCNF
example solution
Example – solution
  • (task, max_budget, duration, payment_rate)
  • (project, task, max_budget, duration)
  • (project, task, contractor, contracted_time)
decomposition 2 or more solutions
Decomposition – 2 or more solutions
  • in the normalisation process, it may be possible that a certain (non-loss) decomposition yields to a better solution than another one
decomposition 2 solutions example
Decomposition – 2 solutions – example
  • Modules(M_id, M_name, Type, Value)
    • solution #1
      • Modules_Descr(M_id, M_name, Type)
      • Type_Val(Type, Val)
    • solution #2
      • Modules_Descr(M_id, M_name, Type)
      • Module_Val(M_id, Val)
    • are they both non-loss? (apply Heath’s theorem)
    • is there one better than the other?
solution 1 vs solution 2
Solution #1 vs Solution #2
  • updates
      • u1: insert the fact that a 3 semester module is worth 1.5cu
      • u2: modify 1 semester modules; they are not worth 0.5cu any longer, they are 0.75cu
      • u3: change the type of a module but forget to change its value
  • solution #2
      • u1 and u2 are impossible or very difficult to perform
      • u3 is allowed
  • solution #1
      • u1 and u2 are straightforward
      • u3 is not allowed
solution 1 vs solution 225
Solution #1 vs Solution #2
  • solution #1
    • more expressive
      • certain facts cannot be expressed in solution #2; e.g. the value of a new type
      • updates can be independently performed on the two component relations (i.e. all constraints are properly expressed)
      • in solution #2: Type  Value is lost, so this constraint must be enforced by the user by procedural code
  • independent projections
      • updates can be performed independently on each projection, without the danger of ending with inconsistent data
independent projections

M-id

Type

M_name

M-id

Type

M_name

Value

M_id

Value

Type

Independent projections

M_name

M-id

Type

Value

Solution #1

Solution #2

one transitive : intra

one direct : lost

all direct : intra

all transitive : inter

independent projections risanen
Independent projections - Risanen
  • R1 and R2 are two projections of R; R1 and R2 are independent if and only if
      • every FD in R is a logical consequence of the FDs in R1 and R2
      • the common attributes of R1 and R2 for a candidate key for at least one of R1 or R2
  • atomic relation
      • cannot be decomposed into independent projections
dependency preservation
Dependency preservation
  • R was decomposed (normalisation) into R1, …, Rn
    • S - the set of FDs for R
    • S1, …, Sn - the set of FDs for R1, …, Rn (each Si refers to only the attributes of Ri)
    • S’ = S1 …  Sn (usually, S’  S)
    • the decomposition is dependency preserving if S’+ = S+
normalisation vs dependency preservation
Normalisation vs dependency preservation
  • there are cases when there is an either-or situation regarding the normalisation and the preserving of functional dependencies:
    • either the relation is normalised and some FDs are lost
    • or, some FDs are not lost (they are expressed in the original relation), but the relation is not in its higher normal form possible
  • in this case, no solution is better than the other
    • other criteria will have to be considered to judge better
normalisation vs dependency preservation example

Patient

Doctor

Disease

Normalisation vs dependency preservation: Example
  • a patient is treated by a single doctor for a certain disease
  • each doctor only treats one kind of disease
  • a doctor can treat more than one patient
  • is this relation BCNF?
  • can you identify update anomalies?
  • consider also (Patient, Disease, Doctor, Treatment)
    • with Patient, Disease  Treatment
possible decompositions
Possible decompositions

non-loss? (choose PKs)

non-loss? (choose PKs)

Heath’s theorem (choose PKs)

bcnf vs dependency preservation
BCNF vs dependency preservation

and

do not enforce a FD existing in the original specification, namely:

e.g. a patient can be given two doctors that treat the same disease (the system

will not disallow this); the constraint would have to be maintained by procedural code

bcnf vs dependency preservation34
BCNF vs dependency preservation
  • not every FD is expressible through normalisation
  • when the relation was in its original form (3NF)
    • (Patient, Disease)  Doctor was expressed
      • a doctor could not be assigned to more than one patient-disease
    • Doctor  Disease was not expressed
      • generated update anomalies
  • in BCNF (decomposed)
    • Doctor  Disease was expressed
    • (Patient, Disease)  Doctor was not expressed
      • generated update anomalies (refer to previous slide)
      • this latter FD would not have been expressed even if the decomposition in all three 2-attribute relations had been considered
conclusions
Conclusions
  • normal forms : formalisation of common sense
    • art  engineering
    • possibility for automation; difficult, because of non-determinism (more than one choices at one step)
  • BCNF
    • always achievable
    • notalways free of update anomalies, because it cannot always express all the FDs existing in the problem
  • there are higher normal forms (4NF, 5NF)
    • defined on the basis of other concepts (not FDs)