Functional Dependencies and Normalization Chapter 15

Functional Dependencies and Normalization Chapter 15

Relation Schema Goodness • Logical level - relations and views • Storage level - relations as files • Placing one set of attributes in a table is better than placing them in other tables. Why?

Schema design • Design the schema so it is easy to explain the semantics • semantics: the meaning associated with the attributes • Want to minimize: • storage space • redundant information

Semantics • Do not combine attributes from > 1 entity/relationship type Fig 15.3 • Reduce the redundant values • Design schema so no anomalies occur • Update anomalies: insert, delete, update

Update Anomalies • Insertion • if insert new employee into EMP_DEPT and no department yet? Fig 15.3 • If create a new department if no employee? • If add employee in department? • Deletion • If delete last employee of a department? • Modification • If change the values of a particular department?

Performance • Design schemas so no anomalies occur but what about performance? • Must always do join between employee and department • In general it is best if specify joins as views so anomaly free • If really large tables, may have to rethink this … • Think about why NoSQL DBs do not have a join

Functional Dependencies What is the most importance concept in relational schema design? Functional Dependencies • Formal concepts and theory to define goodness of relational schemas • Functional dependency FD between 2 sets of attributes as: X → Y • Constraint on the possible tuples that can form a relation instance

Functional Dependencies X → Y means: • X functionally determines Y • Y depends on X • Values of Y component depend on, determined by values of X component

Functional Dependencies Given t1 and t2: • if t1[X] = t2[X] then t1[Y] = t2[Y] (1) • In other words if the values of X are equal, then Y values are equal • Values of X component uniquely (functionally) determine values of Y component iff (1)

Example for example: city, address → zipcode • ssn → name • if X is a candidate key implies X → Y • if X → Y, does this imply Y → X? • don’t know - FD is a property of semantics • dependency is a constraint • if satisfy FD, instances are legal relation instances (extension)

FDs - set F • describes a relation instance • constraints must hold at all times • property of relation schema not a particular extension • therefore, it cannot be automatically deduced, it must be defined explicitly by designer

Normalization • Normalization of data - method for analyzing schemas based on FDs • Objectives of normalization • good relation schemas disallowing update anomalies • Unsatisfactory schemas decomposed into smaller ones with desirable properties – This means tables are divided up into smaller tables

Formal framework • database normalized to any degree (1, 2, 3, 4, 5, etc.) • normalization is not done in isolation • need: • dependency preservation • additional normal forms meet other desirable criteria • lossless join – will discuss later

Normal Forms • 1st, 2nd, 3rd consider only FD and key constraints • constraints must not be hard to understand or detect • need not normalize to highest form (e.g. for performance reasons)

1NF - 1st normal form • part of the formal definition of a relation • disallow multivalued attributes, composite attributes and their combination • In 1NF single (atomic, indivisible) values

Example: • There are 2 ways to look at: DEPARTMENT (dnumber, dlocations) Fig. 15.9. • dlocations is a set of values • dnumber → dlocations, but dlocations is not in 1NF • dlocations atomic values • dnumber does not functionally determine dlocations • Two different tuples with dnumber=5 can have different values for dlocation= Bellaire or Sugarland or Houston

How to resolve this? These are the choices: • Nested relation - multivalued composite attributes Fig. 15.9 • research attempts to allow and formalize nested relations • Oracle allows it • Normalize it to 1NF

How to normalize nested relations into 1NF?

Normalize into 1NF • How to normalize nested relations into 1NF? • Remove nested relation (or set-valued) attributes into new relation • propagate PK • combine PK and partial PK (nested attribute) • recursively unnest - multilevel nesting • useful in converting hierarchical schemes into 1NF

Difficulties with 1NF • insert, delete, update • Determine if describe entity identified by PK? • If not, called non-full FDs • We need full FDs for good inserts, deletes, updates

Second Normal Form - 2NF • Uses the concepts of FDs, PKs and this definition: • An FD is a Full functional dependency if: given Y → Z Removal of any attribute from Y means the FD does not hold any more

2NF – Partial Dependency • Examples: Fig. 15.11 {ssn, pnumber} → hours is a full FD since neither • ssn → hours nor pnumber → hours holds • Partial Dependency • {ssn, pnumber} → ename is not a full FD it is a partial dependency since • ssn → ename also holds

2NF • A relation schema R is in 2NF if: • Relation is in 1NF • Every non-prime attribute A in R is not partially dependent on any key Definition: Prime attribute - attribute that is a member of the primary key K • In other words – No partial dependencies

Remove partial dependencies: How?

Solution • R can be decomposed into 2NF relations via the process of 2NF normalization • Remove partial dependencies by: How? • From original table, remove attribute(s) that is partially dependent • Place all attributes participating in partial dependency in separate table this includes part of the primary key • Result is 2 new relations where partials are now full

2NF – Formal definition • The above definition considers the primary key only (which is > 1 column) • The following more general definition takes into account relations with multiple candidate keys • A relation schema R is in 2NF if every non-prime attribute A in R is not partially dependent on any key (including candidate keys of R) Fig. 15.12 • County_name and lot# are candidate keys

2NF problems: • Even if no partial dependencies problems with insert, delete, modify • Why? • Transitive dependencies • Given a set of attributes Z, where Z is not a subset of any key and • X is a key • Both X → Z and Z → Y • then we have a transitive dependency

Examples of Transitive FDs • Examples: Fig 15.11 ssn → dmgrssn is a transitive FD since ssn → dnumber and dnumber → dmgrssn Also, ssn → dnumber and dnumber → dname ssn → ename is non-transitive since there is no set of attributes X where ssn → x and x → ename

3rd Normal Form (3NF) • No non-prime attribute is transitively dependent on a primary key and the table is in 2NF • intuitively, this means we need independent entity facts steps for normalization • disallow partial and transitive dependency on primary keys

3NF • A relation schema R is in 3NF if: • it is in 2NF • no non-prime attribute A in R is transitively dependent on the primary key • In other words – no transitive dependencies • R can be decomposed into 3NF relations via the process of 3NF normalization • Which is?

Alternative notation RecruiterID,City, State → NoOfRecruits RecruiterID → RecruiterName RecruiterID → StatusID RecruiterID → Status StatusID → Status City, state → CityPopulation State → StatePopulation

3NF • Formal Definition: • a superkey of relation schema R - a set of attributes S of R that contains a key of R • A relation schema R is in 3NF if whenever X -> A holds in R • then either a) X is a superkey of R or b) A is a prime attribute of R a) means every non-prime attribute is fully functionally dependent on every key b) means no transitive dependencies on any key Fig.15.12

Normal forms: • Each normal form is strictly stronger than the previous one: • every 2NF relation is in 1NF • every 3NF relation is in 2NF

Additional normal forms: • 4NF - based on multi-valued dependencies • No table may contain 2 or more 1:N or N:M relationships that are not directly related ename - > pname, ename -> dep_name • 5NF - based on join dependencies as a way to decompose relations • If can’t decompose with lossless join into 2 tables, decompose into 3 tables Join dependency: Supplier, part, project Must join all of these to get a valid tuple

Decomposition • Relational database schema design is synthesis and decomposition • synthesis - grouping attributes together • decomposition - avoiding transitive and partial dependencies • strict decomposition - start with a universal relation OR • ER model mapped to a set of relations using the rules • Maps to 3NF

Additional Design Considerations - Reduce nulls • Avoid placing attributes in a base relation whose values may be null for a majority of tuples • If use null values can mean different things • "fat" tuples - if many attributes and lots of nulls wastes space • Aggregate functions are a problem with nulls

Disallow spurious tuples • Spurious tuples represent incorrect information that is not valid • Result of joins with equality conditions on attributes that are not PKs or FKs • Design relations so there can be an equijoin with a PK and a FK or no spurious tuples • Lossless join guarantees no spurious tuples Fig 15.5, 15.6join on plocation

Functional Dependencies and Normalization Chapter 15

Functional Dependencies and Normalization Chapter 15

Presentation Transcript

SE 4367 Functional Testing

Protein and its functional properties in food products

Craniometry and Functional Craniology

Chapter 7

Building Competitive Advantage through Functional Level Strategy

Functional Tests

Chapter 4: Functional Anatomy of Procaryotic and Eucaryotic Cells

Functional Skills Level 1 Maths revision

3NF and Boyce-Codd Normal Form

Functional Properties of Carbohydrate

Functional Behavior Assessments and Positive Behavior Support Plans

SQL Unit 15 Normalization

Chapter 48

Introduction to functional neuroimaging Didem Gökçay

Chapter 8

Database Management Systems

Functional Groups

CHAPTER Major Network Functional Architectures

ACE Personal Trainer Manual, 4 th edition Chapter 9:

Chapter 7 :: Topics

Functional and Activity-Based Budgeting

Functional Programming You Already Know