1 / 55

Functional Dependencies and Normalization Chapter 15 

Functional Dependencies and Normalization Chapter 15 .  Relation Schema Goodness. Logical level - relations and views Storage level - relations as files  Placing one set of attributes in a table is better than placing them in other tables. Why?. Schema design.

Download Presentation

Functional Dependencies and Normalization Chapter 15 

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Functional Dependencies and Normalization Chapter 15 

  2.  Relation Schema Goodness • Logical level - relations and views • Storage level - relations as files •  Placing one set of attributes in a table is better than placing them in other tables. Why?

  3. Schema design • Design the schema so it is easy to explain the semantics • semantics: the meaning associated with the attributes • Want to minimize: • storage space • redundant information

  4.     Semantics • Do not combine attributes from > 1 entity/relationship type Fig 15.3 • Reduce the redundant values • Design schema so no anomalies occur • Update anomalies: insert, delete, update

  5. Update Anomalies • Insertion • if insert new employee into EMP_DEPT and no department yet? Fig 15.3 • If create a new department if no employee? • If add employee in department? • Deletion • If delete last employee of a department? • Modification • If change the values of a particular department?

  6. Performance • Design schemas so no anomalies occur but what about performance? • Must always do join between employee and department • In general it is best if specify joins as views so anomaly free • If really large tables, may have to rethink this … • Think about why NoSQL DBs do not have a join

  7. Functional Dependencies What is the most importance concept in relational schema design?     Functional Dependencies • Formal concepts and theory to define goodness of relational schemas • Functional dependency FD between 2 sets of attributes as: X → Y • Constraint on the possible tuples that can form a relation instance

  8.  Functional Dependencies X → Y means: • X functionally determines Y • Y depends on X • Values of Y component depend on, determined by values of X component

  9. Functional Dependencies Given t1 and t2: • if t1[X] = t2[X] then t1[Y] = t2[Y] (1) • In other words if the values of X are equal, then Y values are equal • Values of X component uniquely (functionally) determine values of Y component iff (1)

  10. Example for example: city, address → zipcode • ssn → name • if X is a candidate key implies X → Y • if X → Y, does this imply Y → X? • don’t know - FD is a property of semantics • dependency is a constraint • if satisfy FD, instances are legal relation instances (extension)

  11. FDs - set F • describes a relation instance • constraints must hold at all times • property of relation schema not a particular extension • therefore, it cannot be automatically deduced, it must be defined explicitly by designer

  12. Normalization • Normalization of data - method for analyzing schemas based on FDs • Objectives of normalization • good relation schemas disallowing update anomalies • Unsatisfactory schemas decomposed into smaller ones with desirable properties – This means tables are divided up into smaller tables

  13. Formal framework • database normalized to any degree (1, 2, 3, 4, 5, etc.) • normalization is not done in isolation • need: • dependency preservation • additional normal forms meet other desirable criteria • lossless join – will discuss later

  14. Normal Forms • 1st, 2nd, 3rd consider only FD and key constraints • constraints must not be hard to understand or detect • need not normalize to highest form (e.g. for performance reasons)

  15. 1NF - 1st normal form • part of the formal definition of a relation • disallow multivalued attributes, composite attributes and their combination • In 1NF single (atomic, indivisible) values

  16. Example: • There are 2 ways to look at: DEPARTMENT (dnumber, dlocations) Fig. 15.9. • dlocations is a set of values • dnumber → dlocations, but dlocations is not in 1NF • dlocations atomic values • dnumber does not functionally determine dlocations • Two different tuples with dnumber=5 can have different values for dlocation= Bellaire or Sugarland or Houston

  17. How to resolve this? These are the choices: • Nested relation - multivalued composite attributes Fig. 15.9 • research attempts to allow and formalize nested relations • Oracle allows it • Normalize it to 1NF

  18. How to normalize nested relations into 1NF?

  19. Normalize into 1NF • How to normalize nested relations into 1NF? • Remove nested relation (or set-valued) attributes into new relation • propagate PK • combine PK and partial PK (nested attribute) • recursively unnest - multilevel nesting • useful in converting hierarchical schemes into 1NF

  20.  Difficulties with 1NF • insert, delete, update •  Determine if describe entity identified by PK? • If not, called non-full FDs • We need full FDs for good inserts, deletes, updates

  21.  Second Normal Form - 2NF • Uses the concepts of FDs, PKs and this definition: • An FD is a Full functional dependency if: given Y → Z Removal of any attribute from Y means the FD does not hold any more

  22. 2NF – Partial Dependency • Examples: Fig. 15.11 {ssn, pnumber} → hours is a full FD since neither • ssn → hours nor pnumber → hours holds • Partial Dependency • {ssn, pnumber} → ename is not a full FD  it is a partial dependency since • ssn → ename also holds

  23. 2NF • A relation schema R is in 2NF if: • Relation is in 1NF • Every non-prime attribute A in R is not partially dependent on any key Definition: Prime attribute - attribute that is a member of the primary key K • In other words – No partial dependencies

  24. Remove partial dependencies: How?

  25. Solution • R can be decomposed into 2NF relations via the process of 2NF normalization • Remove partial dependencies by: How? • From original table, remove attribute(s) that is partially dependent • Place all attributes participating in partial dependency in separate table this includes part of the primary key • Result is 2 new relations where partials are now full

  26. 2NF – Formal definition • The above definition considers the primary key only (which is > 1 column) • The following more general definition takes into account relations with multiple candidate keys • A relation schema R is in 2NF if every non-prime attribute A in R is not partially dependent on any key (including candidate keys of R) Fig. 15.12 • County_name and lot# are candidate keys

  27. 2NF problems: • Even if no partial dependencies problems with insert, delete, modify • Why? • Transitive dependencies • Given a set of attributes Z, where Z is not a subset of any key and • X is a key • Both X → Z and Z → Y • then we have a transitive dependency

  28. Examples of Transitive FDs • Examples: Fig 15.11 ssn → dmgrssn is a transitive FD since ssn → dnumber and dnumber → dmgrssn Also, ssn → dnumber and dnumber → dname ssn → ename is non-transitive since there is no set of attributes X where ssn → x and x → ename

  29.  3rd Normal Form (3NF) • No non-prime attribute is transitively dependent on a primary key and the table is in 2NF • intuitively, this means we need independent entity facts steps for normalization • disallow partial and transitive dependency on primary keys

  30. 3NF  • A relation schema R is in 3NF if: • it is in 2NF • no non-prime attribute A in R is transitively dependent on the primary key • In other words – no transitive dependencies • R can be decomposed into 3NF relations via the process of 3NF normalization • Which is?

  31. Alternative notation RecruiterID,City, State → NoOfRecruits RecruiterID → RecruiterName RecruiterID → StatusID RecruiterID → Status StatusID → Status City, state → CityPopulation State → StatePopulation

  32. 3NF • Formal Definition: • a superkey of relation schema R - a set of attributes S of R that contains a key of R • A relation schema R is in 3NF if whenever X -> A  holds in R • then either a) X is a superkey of R or b) A is a prime attribute of R a) means every non-prime attribute is fully functionally dependent on every key b) means no transitive dependencies on any key  Fig.15.12

  33. Normal forms: • Each normal form is strictly stronger than the previous one: • every 2NF relation is in 1NF • every 3NF relation is in 2NF

  34. Additional normal forms: • 4NF - based on multi-valued dependencies • No table may contain 2 or more 1:N or N:M relationships that are not directly related ename - > pname, ename -> dep_name • 5NF - based on join dependencies as a way to decompose relations • If can’t decompose with lossless join into 2 tables, decompose into 3 tables Join dependency: Supplier, part, project Must join all of these to get a valid tuple

  35. Decomposition • Relational database schema design is synthesis and decomposition • synthesis - grouping attributes together • decomposition - avoiding transitive and partial dependencies • strict decomposition - start with a universal relation OR • ER model mapped to a set of relations using the rules • Maps to 3NF

  36. Additional Design Considerations - Reduce nulls • Avoid placing attributes in a base relation whose values may be null for a majority of tuples • If use null values can mean different things • "fat" tuples - if many attributes and lots of nulls wastes space • Aggregate functions are a problem with nulls

  37. Disallow spurious tuples • Spurious tuples represent incorrect information that is not valid • Result of joins with equality conditions on attributes that are not PKs or FKs • Design relations so there can be an equijoin with a PK and a FK or no spurious tuples  • Lossless join guarantees no spurious tuples Fig 15.5, 15.6join on plocation

More Related