1 / 66

Database

Database. Functional Dependencies. Designing Good Schemas. We know how to create schemas, but ... how do we create good schemas? what does good mean? Schema quality measurements: semantics of the attributes minimal redundancy minimal frequency of null values. Functional Dependences.

troup
Download Presentation

Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Functional Dependencies

  2. Designing Good Schemas • We know how to create schemas, but ... • how do we create good schemas? • what does good mean? • Schema quality measurements: • semantics of the attributes • minimal redundancy • minimal frequency of null values

  3. Functional Dependences • A column Y of relational table R is functionally dependent up on column X of relational table R if and only if: Each value of X in R associated with each value of Y at any given time

  4. Functional Dependencies Definition: A1, ..., Am  B1, ..., Bn holds in R if: t, t’  R, (t.A1=t’.A1  ...  t.Am=t’.Am  t.B1=t’.B1  ...  t.Bm=t’.Bm ) R t if t, t’ agree here then t, t’ agree here t’

  5. Examples • EmpID Name, Phone, Position • Position Phone • but Phone Position EmpID Name Phone Position E0045 Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer

  6. Because name -> favBeer Because name -> addr Because beersLiked -> manf Example Data name addr beersLiked manf favBeer Janeway Voyager Bud A.B. WickedAle Janeway Voyager WickedAle Pete’s WickedAle Spock Enterprise Bud A.B. Bud

  7. Example Drinkers(name, addr, beersLiked, manf, favoriteBeer) • Reasonable FD's to assert: 1. name  addr 2. name  favoriteBeer 3. beersLiked  manf

  8. Functional dependences • Y is functional dependent up on X same as values of X identify values of Y • If X  Y then XZYZ • IF XY and Y  Z then XZ • X Y means that Y depend on X or X identify Y

  9. Examples • S#  Ename • {S#, P#}  Hours • If for each value of S#, there are exactly one corresponding value for sname, state, city then: S# Sname Sate City

  10. Example • If {S#, p#}  Qty S# P# QTY

  11. Redundancy Example • Where’s the redundancy?

  12. Redundancy Example

  13. Example FDs Proper FDs Transitive FDs Partial Key FD Partial Key FDs

  14. Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • some members of F+ • A H • by transitivity from A B and B H • AG I • by augmenting A C with G, to get AG CG and then transitivity with CG I • CG HI • by augmenting CG I to infer CG  CGI, and augmenting of CG H to inferCGI HI, and then transitivity

  15. Formal definition of a key • A key is a set of attributes A1, ..., An s.t. for any other attribute B, A1, ..., An B • A minimal key is a set of attributes which is a key and for which no subset is a key • Note: book calls them superkey and key

  16. Where Do Keys Come From? • We could simply assert a key K. Then the only FD’s are K -> A for all atributes A, and K turns out to be the only key obtainable from the FD’s. • We could assert FD’s and deduce the keys by systematic exploration. • E/R gives us FD’s from entity-set keys and many-one relationships.

  17. Examples of Keys • Product(name, price, category, color) name, category  price category  color Keys are: {name, category} and all supersets • Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time Keys are: [in class]

  18. Example 2 • Keys are {Lastname, Firstname} and {StudentID} Lastname Firstname Student ID Major Key Key (2 attributes) Superkey Note: There are alternate keys

  19. Person name ssn address Finding the Keys of a Relation Given a relation constructed from an E/R diagram, what is its key? Rules: 1. If the relation comes from an entity set, the key of the relation is the set of attributes which is the key of the entity set. Person(address, name, ssn)

  20. Finding the Keys Rules: 2. If the relation comes from a many-many relationship, the key of the relation is the set of all attribute keys in the relations corresponding to the entity sets name buys Person Product price name ssn date buys(name, ssn, date)

  21. Product Purchase Store CreditCard Person Finding the Keys Except: if there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key. sname name card-no ssn Purchase(name , sname, ssn, card-no)

  22. Expressing Dependencies Say: “the CreditCard determines the Person” Product sname Purchase name Store Incomplete(what doesit say ?) card-no CreditCard Person ssn Purchase(name , sname, ssn, card-no) card-no  name

  23. Enrollment(student, major, course, room, time) student  major major, course  room course  time What else can we infer ?

  24. Relational Schema Design(or Logical Design) Main idea: • Start with some relational schema • Find out its FD’s • Important also to look at inferred FD’s. • Use them to design a better relational schema

  25. Relational Schema Design Recall set attributes (persons with several phones): SSN Name, City, but not SSN  PhoneNumber Anomalies: • Redundancy = repeat data • Update anomalies = Fred moves to “Bellvue” • Deletion anomalies = Fred drops all phone numbers: what is his city ?

  26. Relation Decomposition Break the relation into two:

  27. name buys Person Product price name ssn Relational Schema Design Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies

  28. Decompositions in General R(A1, ..., An) Create two relations R1(B1, ..., Bm) and R2(C1, ..., Cp) such that: B1, ..., Bm C1, ..., Cp= A1, ..., An and: R1 = projection of R on B1, ..., Bm R2 = projection of R on C1, ..., Cp

  29. Incorrect Decomposition • Sometimes it is incorrect: Decompose on : Name, Category and Price, Category

  30. Incorrect Decomposition When we put it back: Cannot recover information

  31. Normal Forms • Each normal form is a set of conditions on a schema that guarantees certain properties (relating to redundancy and update anomalies) • The two commonly used normal forms are third normal form (3NF) and Boyce-Codd normal form (BCNF)

  32. remove multi-valued attributes remove partial dependencies remove transitive dependencies 1NF 2NF 3NF remove remaining FD anomal dependencies remove multivalue dependencies remove remaining anomalies BCNF 4NF 5NF Normalization 0NF

  33. Goals of Normalization • Let R be a relation scheme with a set F of functional dependencies. • Decide whether a relation scheme R is in “good” form. • In the case that a relation scheme R is not in “good” form, decompose it into a set of relation scheme {R1, R2, ..., Rn} such that • each relation scheme is in good form • the decomposition is a lossless-join decomposition • Preferably, the decomposition should be dependency preserving.

  34. 1 NF • First normal form is • NO multi-valued attributes • No composite attribute • No nested relation We create new table or new field (telephone, visiting)

  35. 1NF Normalization Proper translation from ER multi-value attributes will achieve 1NF. Still not a good solution,since we have redundancy in Dnumber and Dmgr_ssn.(This will be handled by 2NF.)

  36. 2 NF form • Second normal form that if primary key is multiple attribute and non-key attribute depend on part of primary key Hours Cname Loc S# P# pname

  37. 2NF Normalization Move the partial key and dependent attributes to a new relation.

  38. Transitive Dependencies • X → Y is a transitive dependency (PD) if there exists Z ⊈ any key such that X → Z → Y • TDs can cause redundancy if there are multiple values of X that determine the same value of Z • the value of Y for that value of Z is stored multiple times • 3NF normalization: move (Z,Y) to new relation in which Z is the primary key

  39. 3 NF • The relation in 3NF if it is 2 NF and every non-key attribute is non-transitively dependent on primary key

  40. 3NF Normalization • Create new relation to hold the attributes in the transitive FD. • LHS of transitive FD becomes PK of new relation.

  41. DEPT COURSE SECTION ROOM INSTR I_OFFICE DEPT COURSE SECTION ROOM INSTR I_OFFICE COMP 51 1 WPC122 DOHERTY CSB109 COMP 51 2 WPC219 CLIBURN CSB107 COMP 163 1 WPC122 DOHERTY CSB109 COMP 53 1 WPC130 CSB108 BOWRING COMP 53 2 WPC130 CSB104 CARMAN Transitive Dependency Example I_OFFICE (instructor's office) is determined by the non-PK attribute INSTR

  42. DEPT COURSE SECTION ROOM INSTR I_OFFICE DEPT COURSE SECTION ROOM INSTR INSTR I_OFFICE NF Decomposition: Foreign Keys Decomposition:

  43. 3NF Example • Relation dept_advisor: • dept_advisor (s_ID, i_ID, dept_name)F = {s_ID, dept_name  i_ID, i_ID  dept_name} • Two candidate keys: s_ID, dept_name, and i_ID, s_ID • R is in 3NF • s_ID, dept_name  i_IDs_ID • dept_name is a superkey • i_ID  dept_name • dept_name is contained in a candidate key

  44. Redundancy in 3NF • There is some redundancy in this schema • Example of problems due to redundancy in 3NF • R = (J, K, L)F = {JK L, L K } J L K j1 j2 j3 null l1 l1 l1 l2 k1 k1 k1 k2 • repetition of information (e.g., the relationship l1, k1) • (i_ID, dept_name) • need to use null values (e.g., to represent the relationshipl2, k2 where there is no corresponding value for J). • (i_ID, dept_nameI) if there is no separate relation mapping instructors to departments

  45. 3NF Decomposition: An Example • Relation schema: cust_banker_branch = (customer_id, employee_id, branch_name, type ) • The functional dependencies for this relation schema are: • customer_id, employee_id branch_name, type • employee_id  branch_name • customer_id, branch_name employee_id • We first compute a canonical cover • branch_name is extraneous in the r.h.s. of the 1st dependency • No other attribute is extraneous, so we get FC = customer_id, employee_id  type employee_id  branch_name customer_id, branch_name employee_id

  46. Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” • Formally, R is in BCNF if for every nontrivial FD for R, say XA, then X is a superkey. • “Nontrivial” = right-side attribute not in left side. Why? 1. Guarantees no redundancy due to FD’s. 2. Guarantees no update anomalies = one occurrence of a fact is updated, not all. 3. Guarantees no deletion anomalies = valid fact is lost when tuple is deleted.

  47. Boyce-Codd Normal Form A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form  where  R and  R,at least one of the following holds: •  is trivial (i.e.,  ) •  is a superkey for R Example schema not in BCNF: instr_dept (ID, name, salary, dept_name, building, budget ) because dept_namebuilding, budget holds on instr_dept, but dept_name is not a superkey

  48. Third Normal Form • A relation schema R is in third normal form (3NF) if for all:  in F+at least one of the following holds: • is trivial (i.e.,  ) •  is a superkey for R • Each attribute A in  –  is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key) • If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold). • Third condition is a minimal relaxation of BCNF to ensure dependency preservation (will see why later).

  49. Boyce-Codd Normal Form • Sample data for Course Section table • Because Prefix  Department, we know that (Prefix, Num, SecNum) could also be a primary key for this table.

  50. Example Students(name, addr, phones, CarLiked) • A student’s phones are independent of the cars they like. • Thus, each of a student’s phones appears with each of the cars they like in all combinations. • This repetition is unlike redundancy due to FD’s, of which name->addr is the only one.

More Related