E N D
1. CS411Database Systems Kazuhiro Minami
2. Given a relation constructed from an ER diagram, what is its key? 2
3. Given a relation constructed from an ER diagram, what is its key? 3
4. Given a relation constructed from an ER diagram, what is its key? 4
5. 5 The textbook has more rules on finding keys:
Many-one, one-many, one-one relationships
Multi-way relationships
Weak entity sets
(but there is no substitute for common sense)
6. Reminder: redundancy causes trouble
7. Non-solution: multiple values in one field
8. Your common sense will tell you how to fix this schema
9. What if you dont have common sense? There is a theory to tell you what to do!
10. Functional Dependencies
11. Functional dependencies generalize the idea of a key
12. EmpID ? Name, Phone, Office
Office ? Phone Phone ? Office
Name ? EmpID isnt likely to hold in all instances of this schema, though it holds in this instance
More generally, an instance can tell you many FDs that dont hold, but not all those that do.
13. Use your common sense to find the FDs in the world around you
14. We can define keys in terms of FDs Key of a relation R is a set of attributes that
functionally determines all attributes of R
none of its proper subsets have this property.
Superkey = set of attributes that contains a key.
15. Reasoning with FDs 1) Closure of a set of FDs2) Closure of a set of attributes Slide numberSlide number
16. The closure S+ of a set S of FDs is the set of all FDs logically implied by S. R = {A, B, C, G, H, I}
S = {A ? B, A ? C, CG ? H, CG ? I, B ? H}
Does A ? H hold?
You can prove whether it does!
17. Compute the closure S+ of S using Armstrongs Axioms 1. Reflexivity
A1 ... An ? every subset of A1 ... An
2. Augmentation
If A1 ... An ? B1 ... Bm,
then A1 ... An C1 ... Ck ? B1 ... Bm C1 ... Ck
3. Transitivity
If A1 ... An ? B1 ... Bm and B1 ... Bm ? C1 ... Ck,
then A1 ... An ? C1 ... Ck
18. How to compute S+ using Armstrong's Axioms S+ = S;
loop {
For each f in S,
apply the reflexivity and augmentation rules and add the new FDs to S+.
For each pair of FDs in S,
apply the transitivity rule and add the new FDs to S+
} until S+ does not change any more.
19. You can infer additional rules from Armstrongs Axioms Union
If X ? Y and X ? Z, then X ? YZ
(X, Y, Z are sets of attributes)
Decomposition
X ? YZ, then X ? Y and X ? Z
Pseudo-transitivity
X ? Y and YZ ? U, then XZ ? U - How to infer:
e.g., the union rule:
1. X? Y; XX ? XY or X ? XY --- augmentation
2. X? Z; XY ? YZ augmentation
3. X ? XY ? YZ transitivity
-- e.g., Decomposition
1. YZ ? Y, YZ ? Z, reflexivity
2. X ? YZ ? Y, X ? YZ ? Z transitivity
- How to infer:
e.g., the union rule:
1. X? Y; XX ? XY or X ? XY --- augmentation
2. X? Z; XY ? YZ augmentation
3. X ? XY ? YZ transitivity
-- e.g., Decomposition
1. YZ ? Y, YZ ? Z, reflexivity
2. X ? YZ ? Y, X ? YZ ? Z transitivity
20. The closure of a set of attributes contains everything they functionally determine
21. It is easy to compute the closure of a set of attributes
23. What is the attribute closure good for? Test if X is a superkey
compute X+, and check if X+ contains all attrs of R
Check if X ? Y holds
by checking if Y is contained in X+
Another (not so clever) way to compute closure S+ of FDs
for each subset of attributes X in relation R, compute X+ with respect to S
for each subset of attributes Y in X+, output the FD X ? Y
24. Reminder: intended goals of schema refinement Minimize redundancy
Avoid information loss
Easy to check dependencies
Ensure good query performance
25. Normal Forms First Normal Form = all attributes are atomic
Second Normal Form (2NF) = obsolete
Boyce Codd Normal Form (BCNF)
Third Normal Form (3NF)
Fourth Normal Form (4NF)
Others...
26. Boyce-Codd Normal Form
28. What if we are in a situation where Phone Number ? SSN?
29. What about that alternative schema we recommended earlier---are they in BCNF?
30. What about that alternative schema we recommended earlier---are they in BCNF?
31. What about that alternative schema we recommended earlier---are they in BCNF?
32. Announcements Course project
Stage 1 is due today
Stage 2 (E/R modeling) is due on Sep. 16
Stage 3 (Relational schema design & implementation plan) is due on Oct. 2
Homework 1 will be posted today
Due is at 3:15pm on Sep. 23
33. Tips for Working Successfully in a Group (by Randy Pausch) Meet people properly
Find things in common
Try for optimal meeting conditions
Let everyone talk
Check egos at the door
Praise each other
Phrase alternatives as questions
34. Boyce-Codd Normal Form
36. What about that alternative schema we recommended earlier---are they in BCNF?
37. Name ? Price, CategoryWhat are the keys for this one?Is it in BCNF?
38. Name ? Price, CategoryWhat are the keys for this one?Is it in BCNF?
39. If relation R is not in BCNF, you can pull out the violating part(s) until it is.
40. 2. Break R into R1 and R2 as follows.
41. 3. Repeat until all relations are in BCNF.
42. Can you turn this one into BCNF?
43. One more split needed to reach BCNF
44. An Official BCNF Decomposition Algorithm Input: relation R, set S of FDs over R.
Output: a set of relations in BCNF.
1. Compute S+.
2. Compute keys for R (from ER or from S+).
3. Use S+ and keys to check if R is in BCNF. If not:
a. Pick a violation FD A ? B.
b. Expand B as much as possible, by computing A+.
c. Create R1 = A+, and R2 = A ? (R ? A+).
d. Find the FDs over R1, using S+. Repeat for R2.
e. Recurse on R1 & its set of FDs. Repeat for R2.
4. Else R is already in BCNF; add R to the output.
45. Any good schema decomposition should be lossless. 45
46. Natural Join R= S=
R S =
47. A lossy decomposition gives you too many tuples! 47
48. BCNF decompositions are always lossless.
49. Why dont we get garbage? 49
50. Why dont we get garbage?
51. BCNF doesnt always have a dependency-preserving decomposition. Third normal form may be preferable to having to take a join to check dependencies after an update.
52. Normal Forms First Normal Form = all attributes are atomic
Second Normal Form (2NF) = old and obsolete
Boyce Codd Normal Form (BCNF)
Third Normal Form (3NF)
Fourth Normal Form (4NF)
Others...
53. A schema doesnt preserve dependencies if you have to do a join to check an FD
54. A schema doesnt preserve dependencies if you have to do a join to check an FD
55. Synthesis Algorithm for 3NF Schemas 1. Find a minimal basis G of the set of FDs for relation R
2. For each FD X?A in G, add a relation with attributes XA
3. If none of the relation schemas from Step 2 is a superkey for R, add a relation whose schema is a key for R
56. If a BCNF decomposition doesnt preserve dependencies, use 3rd Normal Form instead.
57. Youll rarely need to use 3NF. Normally, youll be able to get your schema into BCNF while preserving dependencies.
If you cant, then use 3NF unless you have a good reason not to.
58. Minimal Basis A set of FDs F is a minimal basis of a set
of dependencies E if
E = F+
Every dependency in F has a single attribute for its right-hand side
Cannot remove any dependency from F or remove attributes from the left side of any FD in F (minimality)
Example:
E = {A?B, A?C, B?A, B?C, C?A, C?B}
F = {A?B, B?C, C?A}
59. Normal Forms First Normal Form = all attributes are atomic
Second Normal Form (2NF) = old and obsolete
Boyce Codd Normal Form (BCNF)
Third Normal Form (3NF)
Fourth Normal Form (4NF)
Others...
60. BCNF doesnt catch every kind of redundancy (much less every bad schema)
61. Definition of Multi-valued Dependency 61
62. You can tear apart a relation R with an MVD. If A1 An ? B1 Bm holds in R,
then the decomposition
R1(A1, , An, B1,, Bm)
R2(A1, , An, C1 ,, Ck)
is lossless.
Note: an MVD A1 An ? B1 Bm
implicitly talks about the other attributes C1, , Ck.
63. The inference rules for MVDs are not the same as the ones for FDs. The most basic one:
If A1 An ? B1 Bm,
then A1 An ? B1 Bm.
Other rules in the book.
64. 4th Normal Form (4NF) R is in 4NF
if for every nontrivial MVD
A1,,An ? B1,, Bm,
{A1,,An} is a superkey.
65. MVD Summary: Parent ? Child X ? Y means that given X, there is a unique set of possible Y values (which do not depend on other attributes of the relation)
MVD problems arise if there are two independent 1:N relationships in a relation.
An FD is also a MVD.
Theres lots more MVD theory, but we wont go there.
66. Confused by Normal Forms ?
67. What people actually do Object DBs and object-relational DBs
Generally permit non-atomic attributes
1st normal form not required
Data warehouses
Huge append-only historical databases
Joins expensive or impractical
Argues against normalization
Everyday relational DBs
Aim for BCNF
Use your understanding of the application to identify other problematic redundancies