Lecture 22
Download
1 / 29

- PowerPoint PPT Presentation


  • 268 Views
  • Uploaded on

Lecture 22. Functional Dependencies (FDs) and Normalization. Schedule Change. http://www.users.csbsju.edu/~irahal/. The Database Design Process. Conceptual design : Use a data model language to come up with an accurate, high-level description of the system requirements

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - flora


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lecture 22 l.jpg

Lecture 22

Functional Dependencies (FDs) and

Normalization


Schedule change l.jpg
Schedule Change

  • http://www.users.csbsju.edu/~irahal/


The database design process l.jpg
The Database Design Process

  • Conceptual design:

    • Use a data model language to come up with an accurate, high-level description of the system requirements

    • Words (unstructured)  Diagrams

  • Logical design:

    • Map the resulting EERD into a set of relations

    • Diagram  Relations

  • Physical design:

    • Use DDL on some DBMS to create tables corresponding to your relations


The database design process4 l.jpg
The Database Design Process

  • Limitations of E-R Designs

    • The EER model provides a set of guidelines

      • Does not result in a unique database schema

    • Does not provide a “formal” way of evaluating alternatives

    • Relies largely on the common sense of the designer

  • Here we try to answer

    • What are the criteria for "good" base relations?

    • Meaningful grouping of attributes

  • When designing a relation schema, how to decide which attributes to include?

    • So far, attributes are grouped to form the relation schema by using the common sense of the database designer


The database design process5 l.jpg
The Database Design Process

  • First discuss informal guidelines for good relational design

  • Then we discuss formal concepts of functional dependencies and normal forms

    - 1NF (First Normal Form)

    - 2NF (Second Normal Form)

    - 3NF (Third Normal Form)

    - BCNF (Boyce-Codd Normal Form)

  • Additional types of dependencies, further normal forms, relational design algorithms by synthesis are discussed in Chapter 11


Relation schema informal measures l.jpg
Relation Schema Informal Measures

  • We have some informal measures:

    • Semantics of the attributes

    • Reducing the redundancy values in tuples

    • Disallowing the possibility of generating spurious tuples

    • Reducing null values

  • Not always independent of one another


Semantics of the relation attributes 1 l.jpg
Semantics of the Relation Attributes (1)

  • Any grouping of attributes to form a relation schema must portray a certain real-world meaning

  • Each tuple in a relation should represent one entity or relationship instance

  • Guideline 1: Design a relation schema so that it is easy to explain its meaning

    • Semantics of attributes should be easy to interpret

    • Attributes of different entities should not be mixed

    • Only foreign keys should be used to refer to other entities




Redundant information 2 l.jpg
Redundant Information (2) from different relations

  • One goal of schema design is to reduce redundancy

    • Information is stored redundantly wasting storage

    • Problems with update anomalies

      • Modification anomalies

      • Insertion anomalies

      • Deletion anomalies

  • Mixing attributes of multiple entities may cause the above problems


Slide11 l.jpg

  • Update anomalies from different relations

    • Modification Anomalies

      • Update PNAME from ‘ProductY’ to ‘Customer-Accounting’

    • Insert Anomalies

      • Insert a new employee not assigned to known project

      • Insert a new project with no working employees

    • Delete Project

      • Delete PNUMBER=2

      • Delete the sole employee of a project


Modification anomalies l.jpg
Modification Anomalies from different relations

  • Consider the relation:

    • EMP_PROJ ( Emp#, Proj#, No_hours, Ename, Pname, Plocation)

  • Modification Anomaly:

    • Changing the name of project number P2 from “Project Y” to “Customer-Accounting”

    • May cause this update to be made for all employees working on project P2 otherwise the DB will become inconsistent


Modification anomalies13 l.jpg
Modification Anomalies from different relations

  • Consider the relation:

    • EMP_PROJ( Emp#, Proj#, No_hours, Ename, Pname, Plocation)

  • Insert Anomaly: Cannot insert a project unless an employee is assigned to

    • Inversely - Cannot insert an employee unless he/she is assigned to a project.

  • Delete Anomaly: When a project is deleted  delete all the employees who work on the project

    • Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project


Guidelines to redundant information in tuples and update anomalies l.jpg
Guidelines to Redundant Information in Tuples and Update Anomalies

  • Guideline 2: Design a schema that does not suffer from the insertion, deletion and update anomalies

  • If there are any present, then note them so that applications can be made to take them into account

    • Might need to break the guidelines to improve performance for certain queries

    • Assume that we always access employee information only with department information

      • The design EMP_PROJ (Emp#, Proj#, No_hours, Ename, Pname, Plocation)might be could for such cases


Example l.jpg

SSN Name Address Hobby Anomalies

1111 Joe 123 Main biking

1111 Joe 123 Main hiking

…………….

Redundancy

Example

ER Model

SSN Name Address Hobby

1111 Joe 123 Main {biking, hiking}

Relational Model (SSN, Hobby, Name, Address)


Example16 l.jpg
Example Anomalies

  • Redundancy leads to anomalies:

    • A change in Address must be made in several places

    • Suppose a person gives up all hobbies. Do we:

      • Set Hobby attribute to null? No, since Hobby is part of key

      • Delete the entire row? No, since we lose other information in the row

        • No hobby information?

    • Hobby value must be supplied for any inserted row since Hobby is part of key


Decomposition l.jpg
Decomposition Anomalies

  • Solution: use two relations to store Person information

    • Person1 (SSN, Name, Address)

    • Hobbies (SSN, Hobby)

  • People with/without hobbies can now be described

  • No update anomalies:

    • Name and address stored once

    • A hobby can be separately supplied or deleted


Spurious tuples 3 l.jpg
Spurious Tuples Anomalies (3)

  • Bad designs for a relational database (or bad decompositions) may result in erroneous results for certain JOIN operations

  • Any decomposition MUST have the "lossless join" property

    • Nospurious tuplesshould be generated by doing a natural-join of any decomposed relations

      • Person1 (SSN, Name, Address)

      • Hobbies (SSN, Name)

    • Here, “loss” relates to loss of information


Spurious tuples 319 l.jpg
Spurious Tuples (3) Anomalies

  • Suppose we replace

    • EMP_PROJ ( SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS) by

    • EMP_PROJ1(SSN, PNUMBER, PNAME, PLOCATION, HOURS) ANDEMP_LOCS (ENAME, PLOCATION)

  • Guideline 3: The relations should be designed to satisfy the lossless join condition

    • Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations


Null values in tuples 4 l.jpg
Null Values in Tuples Anomalies (4)

  • Guideline 4: Relations should be designed such that their tuples will have as few NULL values as possible

    • Make sure only NULLs are exceptional cases

  • If many attributes do not apply to all tuples in the relation, we end up with many nulls

    • Waste space

    • Ambiguity in meaning

      • Attribute not applicable or invalid

      • Value known to exist, but unavailable

      • Attribute value unknown (may or may not exist)

    • Difficulty specifying JOIN operations (inner or outer joins)

  • Attributes that are NULL frequently could be placed in separate relations (with the primary key)


Functional dependencies l.jpg
Functional Dependencies Anomalies

  • Functional dependencies (FDs) are used to specify formal measures of the "goodness" of a relational database design

  • FDs are constraints that are derived from the meaning and interrelationships of the data attributes

  • An FD is a constraint between two sets of attributes X and Y

    • X  Y holds if whenever two tuples have the same value for X, they must have the same value for Y

    • For any two tuples t1 and t2 in any relation instance r(R):If t1[X]=t2[X], then t1[Y]=t2[Y]


Functional dependencies22 l.jpg
Functional Dependencies Anomalies

  • X Y: A set of attributes X functionally determines a set of attributes Y (or Y is functionally determined by X) if the value of X determines a unique value for Y

  • X Y in R specifies a constraint on all relation instances r(R)

    • FDs are derived from the real-world constraints on the attributes

    • Property of the intentionof the database

  • An FD is a property of the attributes in the schema R

    • The constraint must hold on every relation instance r(R)

    • Can NEVER be deduced from an extension

      • E.g. if in some case, all people having the same first name are registered for the same course, can we deduce that name  course?


Examples of fd constraints l.jpg
Examples of FD Constraints Anomalies

  • EMP_PROJ (SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS)

  • social security number determines employee name

    • SSN ENAME

  • project number determines project name and location

    • PNUMBER {PNAME, PLOCATION}

  • employee SSN and project number determines the hours per week that the employee works on the project

    • {SSN, PNUMBER} HOURS


More on fd constraints l.jpg
More on FD Constraints Anomalies

  • Definition of a relation KEY (If K is a key of R)

    • Kfunctionally determines all attributes in R

  • If XY is true, does that make YX true?

  • Some FDs are always true regardless of the relation in which they occur

    • {State, Driver_License_Number}  SSN

    • Zip  {City, State}


Inference rules for fds l.jpg
Inference Rules for FDs Anomalies

  • Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold

  • Armstrong's inference rules:

    • IR1. (Reflexive) If Y X, then X Y

      • (Generates trivial FDs)

      • E.g. SSN, ENAME  ENAME

    • IR2. (Augmentation) If X Y, then XZ YZ (Note that XZ stands for X U Z)

      • E.g. SSN ENAME, then {SSN, PNUMBER}{ENAME, PNUMBER}

    • IR3. (Transitive) If X Y and Y Z, then X Z

      • E.g. SSN DOB and DOB  horoscope sign then SSN  horoscope sign


Inference rules for fds26 l.jpg
Inference Rules for FDs Anomalies

  • IR1, IR2, IR3 form a sound and complete set of inference rules

    • Sound Any rule inferred using IR1, IR2 or IR3 a valid FD

    • Complete  All possible FDs can be generated using them

  • Some additional inference rules that are useful:

    • IR4. (Decomposition) If X YZ, then X Y and X Z

      • SSN  ENAME, DOB then SSN  DOB & SSN  ENAME

    • IR5. (Union) If X Y and X Z, then X YZ

      • SSN  DOB & SSN  ENAME then SSN  ENAME, DOB

    • IR6. (Pseudo-transitivity) If X Y and WY Z, then WX Z

      • Can be deduced from IR1, IR2, and IR3 (completeness property)

      • OfficeLocation Department & Department, Ename  Salary-level then

        OfficeLocation,Ename  Salary-level


Proofs l.jpg
Proofs Anomalies

  • IR1. (Reflexive) If Y X, then X Y

    • For any two tuples t1 and t2 with t1[X] = t2[X] then t1[Y] = t2[Y] because Y X

  • IR2. (Augmentation) If X Y, then XZ YZ

    • Proof by contradiction

      • If for two tuples t1 and t2 we have

        • (1) t1[X] = t2[X]

        • (2) t1[Y] = t2[Y]

        • (3) t1[XZ] = t2[XZ]

        • (4) t1[YZ] ≠ t2[YZ]

      • Can’t be true since from (1) and (3) we have (5) t1[Z] = t2[Z] and from (2) and (5) we have t1[YZ] = t2[YZ] which contradicts (4)


Proofs28 l.jpg
Proofs Anomalies

  • IR3. (Transitive) If X Y and Y Z, then X Z

    • For any two tuples t1 and t2 with t1[X] = t2[X] then t1[Y] = t2[Y] which implies that t1[Z] = t2[Z] hence X Z holds

  • IR4. (Decomposition) If X YZ, then X Y and X Z

    • X YZ

    • YZ  Y (Using IR1)

    • X  Y (Using IR3)

      • Similarly for X  Z


Proofs29 l.jpg
Proofs Anomalies

  • IR5. (Union) If X Y and X Z, then X YZ

    • X  Y

    • X  Z

    • X  XY (Using IR1)

    • XY  YZ (Using IR2)

    • X  YZ (Using IR3)

  • IR6. (Psuedotransitivity) If X Y and WY Z, then WX Z

    • X Y

    • WY  Z

    • WX  WY (Using IR2)

    • WX  Z (Using IR3)


ad