lecture 22 l.
Skip this Video
Loading SlideShow in 5 Seconds..
Lecture 22 PowerPoint Presentation
Download Presentation
Lecture 22

Loading in 2 Seconds...

play fullscreen
1 / 29

Lecture 22 - PowerPoint PPT Presentation

  • Uploaded on

Lecture 22. Functional Dependencies (FDs) and Normalization. Schedule Change. http://www.users.csbsju.edu/~irahal/. The Database Design Process. Conceptual design : Use a data model language to come up with an accurate, high-level description of the system requirements

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Lecture 22' - flora

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
lecture 22

Lecture 22

Functional Dependencies (FDs) and


schedule change
Schedule Change
  • http://www.users.csbsju.edu/~irahal/
the database design process
The Database Design Process
  • Conceptual design:
    • Use a data model language to come up with an accurate, high-level description of the system requirements
    • Words (unstructured)  Diagrams
  • Logical design:
    • Map the resulting EERD into a set of relations
    • Diagram  Relations
  • Physical design:
    • Use DDL on some DBMS to create tables corresponding to your relations
the database design process4
The Database Design Process
  • Limitations of E-R Designs
    • The EER model provides a set of guidelines
      • Does not result in a unique database schema
    • Does not provide a “formal” way of evaluating alternatives
    • Relies largely on the common sense of the designer
  • Here we try to answer
    • What are the criteria for "good" base relations?
    • Meaningful grouping of attributes
  • When designing a relation schema, how to decide which attributes to include?
    • So far, attributes are grouped to form the relation schema by using the common sense of the database designer
the database design process5
The Database Design Process
  • First discuss informal guidelines for good relational design
  • Then we discuss formal concepts of functional dependencies and normal forms

- 1NF (First Normal Form)

- 2NF (Second Normal Form)

- 3NF (Third Normal Form)

- BCNF (Boyce-Codd Normal Form)

  • Additional types of dependencies, further normal forms, relational design algorithms by synthesis are discussed in Chapter 11
relation schema informal measures
Relation Schema Informal Measures
  • We have some informal measures:
    • Semantics of the attributes
    • Reducing the redundancy values in tuples
    • Disallowing the possibility of generating spurious tuples
    • Reducing null values
  • Not always independent of one another
semantics of the relation attributes 1
Semantics of the Relation Attributes (1)
  • Any grouping of attributes to form a relation schema must portray a certain real-world meaning
  • Each tuple in a relation should represent one entity or relationship instance
  • Guideline 1: Design a relation schema so that it is easy to explain its meaning
    • Semantics of attributes should be easy to interpret
    • Attributes of different entities should not be mixed
    • Only foreign keys should be used to refer to other entities
redundant information 2
Redundant Information (2)
  • One goal of schema design is to reduce redundancy
    • Information is stored redundantly wasting storage
    • Problems with update anomalies
      • Modification anomalies
      • Insertion anomalies
      • Deletion anomalies
  • Mixing attributes of multiple entities may cause the above problems

Update anomalies

    • Modification Anomalies
      • Update PNAME from ‘ProductY’ to ‘Customer-Accounting’
    • Insert Anomalies
      • Insert a new employee not assigned to known project
      • Insert a new project with no working employees
    • Delete Project
      • Delete PNUMBER=2
      • Delete the sole employee of a project
modification anomalies
Modification Anomalies
  • Consider the relation:
    • EMP_PROJ ( Emp#, Proj#, No_hours, Ename, Pname, Plocation)
  • Modification Anomaly:
    • Changing the name of project number P2 from “Project Y” to “Customer-Accounting”
    • May cause this update to be made for all employees working on project P2 otherwise the DB will become inconsistent
modification anomalies13
Modification Anomalies
  • Consider the relation:
    • EMP_PROJ( Emp#, Proj#, No_hours, Ename, Pname, Plocation)
  • Insert Anomaly: Cannot insert a project unless an employee is assigned to
    • Inversely - Cannot insert an employee unless he/she is assigned to a project.
  • Delete Anomaly: When a project is deleted  delete all the employees who work on the project
    • Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project
guidelines to redundant information in tuples and update anomalies
Guidelines to Redundant Information in Tuples and Update Anomalies
  • Guideline 2: Design a schema that does not suffer from the insertion, deletion and update anomalies
  • If there are any present, then note them so that applications can be made to take them into account
    • Might need to break the guidelines to improve performance for certain queries
    • Assume that we always access employee information only with department information
      • The design EMP_PROJ (Emp#, Proj#, No_hours, Ename, Pname, Plocation)might be could for such cases

SSN Name Address Hobby

1111 Joe 123 Main biking

1111 Joe 123 Main hiking




ER Model

SSN Name Address Hobby

1111 Joe 123 Main {biking, hiking}

Relational Model (SSN, Hobby, Name, Address)

  • Redundancy leads to anomalies:
    • A change in Address must be made in several places
    • Suppose a person gives up all hobbies. Do we:
      • Set Hobby attribute to null? No, since Hobby is part of key
      • Delete the entire row? No, since we lose other information in the row
        • No hobby information?
    • Hobby value must be supplied for any inserted row since Hobby is part of key
  • Solution: use two relations to store Person information
    • Person1 (SSN, Name, Address)
    • Hobbies (SSN, Hobby)
  • People with/without hobbies can now be described
  • No update anomalies:
    • Name and address stored once
    • A hobby can be separately supplied or deleted
spurious tuples 3
Spurious Tuples (3)
  • Bad designs for a relational database (or bad decompositions) may result in erroneous results for certain JOIN operations
  • Any decomposition MUST have the "lossless join" property
    • Nospurious tuplesshould be generated by doing a natural-join of any decomposed relations
      • Person1 (SSN, Name, Address)
      • Hobbies (SSN, Name)
    • Here, “loss” relates to loss of information
spurious tuples 319
Spurious Tuples (3)
  • Suppose we replace
  • Guideline 3: The relations should be designed to satisfy the lossless join condition
    • Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations
null values in tuples 4
Null Values in Tuples (4)
  • Guideline 4: Relations should be designed such that their tuples will have as few NULL values as possible
    • Make sure only NULLs are exceptional cases
  • If many attributes do not apply to all tuples in the relation, we end up with many nulls
    • Waste space
    • Ambiguity in meaning
      • Attribute not applicable or invalid
      • Value known to exist, but unavailable
      • Attribute value unknown (may or may not exist)
    • Difficulty specifying JOIN operations (inner or outer joins)
  • Attributes that are NULL frequently could be placed in separate relations (with the primary key)
functional dependencies
Functional Dependencies
  • Functional dependencies (FDs) are used to specify formal measures of the "goodness" of a relational database design
  • FDs are constraints that are derived from the meaning and interrelationships of the data attributes
  • An FD is a constraint between two sets of attributes X and Y
    • X  Y holds if whenever two tuples have the same value for X, they must have the same value for Y
    • For any two tuples t1 and t2 in any relation instance r(R):If t1[X]=t2[X], then t1[Y]=t2[Y]
functional dependencies22
Functional Dependencies
  • X Y: A set of attributes X functionally determines a set of attributes Y (or Y is functionally determined by X) if the value of X determines a unique value for Y
  • X Y in R specifies a constraint on all relation instances r(R)
    • FDs are derived from the real-world constraints on the attributes
    • Property of the intentionof the database
  • An FD is a property of the attributes in the schema R
    • The constraint must hold on every relation instance r(R)
    • Can NEVER be deduced from an extension
      • E.g. if in some case, all people having the same first name are registered for the same course, can we deduce that name  course?
examples of fd constraints
Examples of FD Constraints
  • social security number determines employee name
    • SSN ENAME
  • project number determines project name and location
  • employee SSN and project number determines the hours per week that the employee works on the project
more on fd constraints
More on FD Constraints
  • Definition of a relation KEY (If K is a key of R)
    • Kfunctionally determines all attributes in R
  • If XY is true, does that make YX true?
  • Some FDs are always true regardless of the relation in which they occur
    • {State, Driver_License_Number}  SSN
    • Zip  {City, State}
inference rules for fds
Inference Rules for FDs
  • Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold
  • Armstrong's inference rules:
    • IR1. (Reflexive) If Y X, then X Y
      • (Generates trivial FDs)
      • E.g. SSN, ENAME  ENAME
    • IR2. (Augmentation) If X Y, then XZ YZ (Note that XZ stands for X U Z)
    • IR3. (Transitive) If X Y and Y Z, then X Z
      • E.g. SSN DOB and DOB  horoscope sign then SSN  horoscope sign
inference rules for fds26
Inference Rules for FDs
  • IR1, IR2, IR3 form a sound and complete set of inference rules
    • Sound Any rule inferred using IR1, IR2 or IR3 a valid FD
    • Complete  All possible FDs can be generated using them
  • Some additional inference rules that are useful:
    • IR4. (Decomposition) If X YZ, then X Y and X Z
      • SSN  ENAME, DOB then SSN  DOB & SSN  ENAME
    • IR5. (Union) If X Y and X Z, then X YZ
      • SSN  DOB & SSN  ENAME then SSN  ENAME, DOB
    • IR6. (Pseudo-transitivity) If X Y and WY Z, then WX Z
      • Can be deduced from IR1, IR2, and IR3 (completeness property)
      • OfficeLocation Department & Department, Ename  Salary-level then

OfficeLocation,Ename  Salary-level

  • IR1. (Reflexive) If Y X, then X Y
    • For any two tuples t1 and t2 with t1[X] = t2[X] then t1[Y] = t2[Y] because Y X
  • IR2. (Augmentation) If X Y, then XZ YZ
    • Proof by contradiction
      • If for two tuples t1 and t2 we have
        • (1) t1[X] = t2[X]
        • (2) t1[Y] = t2[Y]
        • (3) t1[XZ] = t2[XZ]
        • (4) t1[YZ] ≠ t2[YZ]
      • Can’t be true since from (1) and (3) we have (5) t1[Z] = t2[Z] and from (2) and (5) we have t1[YZ] = t2[YZ] which contradicts (4)
  • IR3. (Transitive) If X Y and Y Z, then X Z
    • For any two tuples t1 and t2 with t1[X] = t2[X] then t1[Y] = t2[Y] which implies that t1[Z] = t2[Z] hence X Z holds
  • IR4. (Decomposition) If X YZ, then X Y and X Z
    • X YZ
    • YZ  Y (Using IR1)
    • X  Y (Using IR3)
      • Similarly for X  Z
  • IR5. (Union) If X Y and X Z, then X YZ
    • X  Y
    • X  Z
    • X  XY (Using IR1)
    • XY  YZ (Using IR2)
    • X  YZ (Using IR3)
  • IR6. (Psuedotransitivity) If X Y and WY Z, then WX Z
    • X Y
    • WY  Z
    • WX  WY (Using IR2)
    • WX  Z (Using IR3)