Key Features of Good Relational Database Design and Functional Dependencies

Ch 4: Relational Database Design

4.1 Features of Good Relational designs • Four Informal measures • Semantics of the relation attributes • Reducing the redundant values in tuples. • Reducing the null values • Disallowing the possibility of generating spurious(wrong) tuples

1. Semantics of the relation attributes • Design a realm schema so that it is easy to explain its meaning .Do not combine attributes from multiple entity types & relationship into single relation. • In general ,the easier it is to explain the semantics of relation, the better the relation schema design will be.

2. Redundant Information in Tuples & Update Anomalies • Goal of dbase is to reduce storage space used by relationship. • Grouping attributes into relation schemas has a significant effect on storage space. • For e.g. if we combine Employee with department & project , works_on will result into EMP_DEPT & EMP_PRJ. • Resultant relation shows repetition of several values leading to higher storage. • Other serious problems is of Update anomalies which is classified as insert, delete & modification anomalies.

Insertion Anomalies • Insert a new emp , we must include value for dept or need to place NULL (if emp doesnot work for dept yet.) & need to enter correctly so consistency problem donot occur. • It is difficult to enter new dept that has no employee as we cannot insert Null in ENO as its is primary key

Deletion Anomalies • If we delete from EMP_DEPT an employee that happens to represent the last employee in that dept , the info abt dept is also lost from dbase

Modification Anomalies • If we change the value of one of attributes say that of manager of dept 5 , we need to change in each tuple where dept no is 5, else it will lead to inconsistent. • Design Dbase so that no insertion, deletion & modification anomalies are present .

3. Null values in tuples • NULL have multiple interpretations such as: • Attributes that do not apply to this tuple. • Attribute value for this tuple is unknown. • Value is known but absent, i.e. it has not been recorded yet. • Problem may occur in JOIN , and aggregrate operations.

4. Generation of Spurious Tuples • Design relation schemas so that they can be joined with equality condition on attributes that are primary or foreign key.

4.2 Functional Dependencies (FD) • A functional dependency, denoted by X  Y (Read X functionally determines Y), between two sets of attributes X and Y that are subsets of R specifies a constraint on possible tuples that can be form a relation state r of R. • The constraint is that for for all pairs of tuples t1 and t2 in r such that t1 [X] = t2 [X] , they must also have, t1 [Y] = t2 [Y]. In other words Whenever two tuples of r agree on their X value, they also agree on their Y value.

Functional Dependency • Main concept associated with normalization. • Functional Dependency • Describes relationship between attributes in a relation. • If A and B are attributes of relation R, B is functionally dependent on A (denoted A  B), if each value of A in R is associated with exactly one value of B in R.

Functional Dependency • Diagrammatic representation: • Determinant of a functional dependency refers to attribute or group of attributes on left-hand side of the arrow.

Example - Functional Dependency

Functional Dependencies (Cont.) • K is a superkey for relation schema R if and only if K R • K is a candidate key for R if and only if • K R, and • for no   K,  R • Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: bor_loan = (customer_id, loan_number, amount ). We expect this functional dependency to hold: loan_numberamount but would not expect the following to hold: amount customer_name

Functional Dependencies (Cont.) • Main use of FD is to describe further relation schema R by specifying constraints on its attributes that must hold all times. • Certain FD can be specified without referring to specific relation. • {state, driving_licence} ENO • {pincode) area • {telephone code}city

Functional Dependencies (Cont.) ENOEname Pnumber{pname,plocation} {eno,pnumber}hours • Eno uniquely determines emp name • Pnumber uniquely determines project name & location • Combination of eno , pnumber uniquely determines numbers of hours that employee had worked on that project. • FD plays a key role in differentiating good DB design from bad DB design.

EXAMPLE :: TEACH • Possible FD’s • TEXTCOURSE hold • But, • TEACHERCOURSE is ruled out.

Use of Functional Dependency • To test relations to see whether they are legal under a given set of functional dependencies. If a relation r is legal under a set F of FD’s, we say that r satisfies F. • To specify constraints on the set of legal relations. If we wish to constrain ourselves to relations on schema R that satisfy a set F of functional dependencies, we say that F holds on K.

FD – A Few More Examples • Suppose one is designing a system to track vehicles and the capacity of their engines. Each vehicle has a unique vehicle identification number(VIN). One would write VIN → EngineCapacity because it would be inappropriate for a vehicle's engine to have more than one capacity. (Assuming, in this case, that vehicles only have one engine.) • However, EngineCapacity → VIN, is incorrect because there could be many vehicles with the same engine capacity.

Trivial and Non Trivial FD • Trivial FD : A FD X->Y is trivial if Y, the right hand side of the functional dependency is a subset of X. • Eg. : A FD • {EmpID, EmpAddress}->{EmpAddress} is trivial, as {EmpAddress} is a subset of {EmpID,EmpAddress}. • Non Trivial FD : A FD is called Nontrivial if Y is not a subset of X. • Eg. : A FD • {EmpID,EmpAddress}->{EmpPhone} is non trivial, as {EmpPhone} is not subset of {EmpID,EmpAddress}

Closure • The set of all FDs that include F as well as all dependencies that are implied by a given set F of FDs is called the closure of F, denoted by F+. • F= ENO{ ENAME,DOB,ADDRESS,DNUM} DNUMBER{DNAME,MGRNO} SOME ADDITIONAL FD’S ARE ENO DNAME,MGRNO DUMBERDNAME

Inference rules for FD Or AXIOMS • Reflexive: if B is a subset of A, then A  B. • Augmentation: if A  B then AC  BC • Transitivity: it A  B and B  C then A  C. • Self – determination: A  A. • Decomposition: If A  BC, then AB, AC. • Union: it A  B and A  C, then A  BC • Composition: if A  B, C  D then AC  BD. • Pseudo transitive : if A  B and rBC then Ar C.

ARMSTRONG AXIOMS • First three axioms • Reflexive: if B is a subset of A, then A  B. • Augmentation: if A  B then AC  BC • Transitivity: it A  B and B  C then A  C. are sound & complete By sound, we mean that given a set of FD on relation R, any dependency that can infer from F holds in every reln satisfies the dependencies. They do not generate incorrect FD. By complete, we mean that using 3 FC repeatedly to a complete set of all possible dependencies that can be inferred from F.

Example: find closure of F • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • some extra members of F+ • A H • by transitivity from A B and B H • AG I • by augmentingA C with G, to get AG CG and then transitivity with CG I • CG HI • by unionCG I & CG H • OR • by augmentingCG I to infer CG  CGI, and augmenting of CG H to inferCGI HI, and then transitivity

Closure of Attribute Sets • Given a set of attributes a, define the closureof aunderF (denoted by a+) as the set of attributes that are functionally determined by a under F • Algorithm to compute a+, the closure of a under F result := a; while (changes to result) do for each  in F do begin if  result then result := result  end

Example of Attribute Set Closure • R = (A, B, C, G, H, I) • F = {A BA C CG HCG IB H} • (AG)+ 1. result = AG 2. result = ABCG (A C and A  B) 3. result = ABCGH (CG H and CG  AGBC) 4. result = ABCGHI (CG I and CG  AGBCH)

Canonical Cover • Sets of functional dependencies may have redundant dependencies that can be inferred from the others • For example: A  C is redundant in: {AB, BC} • Parts of a functional dependency may be redundant • E.g.: on RHS: {AB, BC, ACD} can be simplified to {A B, BC, AD} • E.g.: on LHS: {A B, BC, ACD} can be simplified to {A B, BC, AD} • Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies

Extraneous Attributes • Consider a set F of functional dependencies and the functional dependency   in F. • Attribute A is extraneous in  if A   and F logically implies (F – {})  {( – A) }. • Attribute A is extraneous in  if A  and the set of functional dependencies (F – {})  {(– A)} logically implies F. • Note: implication in the opposite direction is trivial in each of the cases above, since a “stronger” functional dependency always implies a weaker one • Example: Given F = {AC, ABC } • B is extraneous in AB C because {AC, AB C} logically implies AC (I.e. the result of dropping B from AB C). • Example: Given F = {AC, ABCD} • C is extraneous in ABCD since AB C can be inferred even after deleting C

Testing if an Attribute is Extraneous • Consider a set F of functional dependencies and the functional dependency   in F. • To test if attribute A   is extraneousin • compute ({} – A)+ using the dependencies in F • check that ({} – A)+ contains ; if it does, A is extraneous in • To test if attribute A  is extraneous in  • compute + using only the dependencies in F’ = (F – {})  {(– A)}, • check that + contains A; if it does, A is extraneous in 

Canonical Cover • A canonical coverfor F is a set of dependencies Fc such that • F logically implies all dependencies in Fc, and • Fclogically implies all dependencies in F, and • No functional dependency in Fccontains an extraneous attribute, and • Each left side of functional dependency in Fcis unique. • To compute a canonical cover for F:repeat Use the union rule to replace any dependencies in F11 and 12 with 112 Find a functional dependency  with an extraneous attribute either in  or in  If an extraneous attribute is found, delete it from until F does not change • Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied

Computing a Canonical Cover • R = (A, B, C)F = {A BC B C A BABC} • Combine A BC and A B into A BC • Set is now {A BC, B C, ABC} • A is extraneous in ABC • Check if the result of deleting A from ABC is implied by the other dependencies • Yes: in fact, BC is already present! • Set is now {A BC, B C} • C is extraneous in ABC • Check if A C is logically implied by A B and the other dependencies • Yes: using transitivity on A B and B  C. • Can use attribute closure of A in more complex cases • The canonical cover is: A B B C

e.g. • Now we define a set of FD to be irreducible as minimal; if and only if it satisfies the following two properties. (1) The right hand side of every FD in S involve just one attribute (i.e., it is a singleton set) (2) The left hand side of every FD in S is irreducible in turn meaning that no attribute can be discarded from the determinant without changing the CLOSURE S+.

Example • A  BC, • B  C • A  B • AB  C • AC  D Compute an irreducible set of FD that is equivalent to this given set. Bring answer in reducible form

Solution (1) The step is to rewrite the FD such that each has a singleton right hand side. • A  B • A  C • B  C • A  B • AB  C • AC  D We observe that the FD A  B occurs twice. So one occurrence will be eliminated. A  BC, B  C A  B AB  C AC  D

Solution • Next, attributed C can be eliminated from the left hand side of the FD AC  D • Because we have A  C, • By augmentation AA AC • A AC (Augmentation: if X  Y then XZ  YZ) • And we are given AC  D, • A AC AC D • So A  D by transitivity; Thus C on the left hand side is redundant. A  C B  C A  B AB  C AC  D

Solution 3. Next, we observe that the FD AB  C can be eliminated, because again we have A  C By augmentation AB  CB By decomposition AB  C AB  B 4. Finally, the FD A  C is implied by the FD A  B and B  C, by transitivity so it can be eliminated. Now we have A  B B  C A  D This set is irreducible. A  C B  C A  B AB  C A  D

Key Features of Good Relational Database Design and Functional Dependencies