400 likes | 610 Views
Normal Forms. First Normal Form: all table cells must contain atomic values no sets, arrays, lists, or other collection types no structured objects all relational databases satisfy first normal form by definition
E N D
First Normal Form: • all table cells must contain atomic values • no sets, arrays, lists, or other collection types • no structured objects • all relational databases satisfy first normal form by definition • the definition restricts attribute values to singletons from specified domains • indeed, relations that do not satisfy first normal form are specifically called unnormalized relations
Definitions: Goal of functional dependency analysis is to decompose the universal relation into components which cannot become inconsistent with respect to any functional dependency constraint
Some requirements: + indicates union of attributes lossless join
Lossy join: A B C 1 2 3 5 2 4 A B 1 2 5 2 B C 2 3 2 4 A B C 1 2 3 1 2 4 5 2 3 5 2 4 spurious tuples
Trick: compute minimal cover directly from application FDs without calculating closure Whence functional dependencies:
Theorem: Armstrong's Axioms are both sound and complete. That is, if you apply them repeatedly to a given set of FDs, then (1) every new FD generated belongs to the closure (sound) (2) every FD in the closure will eventually be generated (complete)
Definition: Because inferences via Armstrong's Axioms yields precisely the FD set closure, this definition is the same as:
Generating the closure is computationally expensive because of its exponential size But, checking a given FD for inclusion in the closure is easy:
Algorithm uses only the application-driven FDs, not the closure
But, still have a violator here no longer BCNF violators A is a superkey of first relation FDs no longer apply to the second Example: BCNF decomposition process
BCNF: sno sname sfood fno fname fcolor fweight sno tno tno tname tvolume tvolume tcolor eno edate enote fno Aquarium database --- minimal cover • tvolume → tcolor • eno → fno • fno → tno • fno → sno • sno → sname • sno → sfood • tno → tname • tno → tvolume • fno → fname • fno → fcolor • fno → fweight • eno → edate • eno → enote Only superkeys: supersets of eno ==> all FDs are BCNF violators except those with eno on left side
Note: it suffices to look for BCNF violators in minimal cover of original application-driven FDs In applying Armstrong's axioms to generate FD closure, only augmentation and pseudotransitivity generate new left hand sides; none generate any new right-side attributes
Example: BCNF decomposition is lossless, but not always dependency-preserving
In general: • if decomposition does NOT split any FD from the minimal cover, then decomposition IS dependency-preserving • if decomposition does split an FD from the minimal cover, then it may or may not be dependency-preserving (text presents an algorithm for deciding)
Example: • sno → sname • sno → sfood • tno → tname • tno → tvolume • fno → fname • fno → fcolor • fno → fweight • eno → edate • eno → enote • tvolume → tcolor • eno → fno • fno → tno • fno → sno sno sname sfood tno tname tcolor tvolume • only key is (sno, tno), but (sno → sname) and sname is non-prime • have several 2NF violators • signals mixing of application entities in a single table • decompose into: sno sname sfood tno tname tcolor tvolume Note: (1) tables with single-attribute keys cannot violate 2NF (2) tvolume --> tcolor is not a 2NF violator because tvolume is not part of a key
insertion forces nulls into inappropriate attributes, such as part of the key (sno, tno) Storage anomalies (irregularities): • classify negative consequences of 2NF violators • insertion anomaly • relation with a 2NF violator mixes two or more application entities • can't insert an instance of just one of them • e.g., in previous example, want to insert a dolphin species that is as yet unassociated with any tank ... sno sname sfood tno tname tcolor tvolume 12 shark everything 25 lagoon blue 5000 12 shark everything 27 deep dive green 50000 ...... ...... 17 dolphin herring ---- ------ ----- ------
Storage anomalies (irregularities): • deletion anomaly • relation with a 2NF violator mixes two or more application entities • deleting last tank associated with shark species removes all information about the species • update anomaly • change sfood attribute of a shark => update several tuples • invites inconsistency sno sname sfood tno tname tcolor tvolume 12 shark everything 25 lagoon blue 5000 12 shark everything 27 deep dive green 50000 ...... ...... • 2NF decomposition removes most of anomalies: sno sname sfood 12 shark everything 17 dolphin herring tno tname tcolor tvolume 25 lagoon blue 5000 27 deep dive green 50000
predictable entry Storage anomalies (irregularities): • sno → sname • sno → sfood • tno → tname • tno → tvolume • fno → fname • fno → fcolor • fno → fweight • eno → edate • eno → enote • tvolume → tcolor • eno → fno • fno → tno • fno → sno sno sname sfood 12 shark everything 17 dolphin herring tno tname tcolor tvolume 25 lagoon blue 5000 27 deep dive green 50000 84 puddle blue 5000 • there remains the negative effect of tvolume --> tcolor • if change all 5000 volume tanks to purple, must update several tuples • still invites inconsistency • in general, anomalies arise when the contents of some cells can predict the content of others
BCNF => 3NF => 2NF • 3NF decomposition forces further decomposition in previous example: • sno → sname • sno → sfood • tno → tname • tno → tvolume • fno → fname • fno → fcolor • fno → fweight • eno → edate • eno → enote • tvolume → tcolor • eno → fno • fno → tno • fno → sno sno sname sfood 12 shark everything 17 dolphin herring tno tname tcolor tvolume 25 lagoon blue 5000 27 deep dive green 50000 84 puddle blue 5000 tno tname tvolume 25 lagoon 5000 27 deep dive 50000 84 puddle 5000 tcolor tvolume blue 5000 green 50000
In practice, 3NF is usually BCNF • only exception occurs when a non-superkey determines a prime attribute • recall earlier example:
There always exists a lossless, dependency-preserving decomposition into 3NF • Recall that the BCNF decomposition may not be dependency-preserving • For proof of algorithm's validity, see text page 788.
e.g., Some limitations of functional dependency analysis: • Cannot always achieve dependency-preserving BCNF; may have to settle for 3NF and some redundancy • excessive decomposition is possible, e.g., decompose species into two tables: (sno, sname) and (sno, sfood) • general idea is that each table represent one distinct application entity plus additional decomposition necessary to accommodate non-relationship constraints, such as tvolume --> tcolor • there remain redundancies that persist through BCNF, e.g., fish weight in a given tank must sum to 1000 pounds • can predict weight of last fish in the tank ==> redundancy • constraint is not an FD ==> redundancy is not removed by decomposition • FD analysis misses multivalued dependencies and join dependencies