lecture 22 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Lecture 22 PowerPoint Presentation
Download Presentation
Lecture 22

Loading in 2 Seconds...

play fullscreen
1 / 29

Lecture 22 - PowerPoint PPT Presentation


  • 276 Views
  • Uploaded on

Lecture 22. Functional Dependencies (FDs) and Normalization. Schedule Change. http://www.users.csbsju.edu/~irahal/. The Database Design Process. Conceptual design : Use a data model language to come up with an accurate, high-level description of the system requirements

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Lecture 22' - flora


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
lecture 22

Lecture 22

Functional Dependencies (FDs) and

Normalization

schedule change
Schedule Change
  • http://www.users.csbsju.edu/~irahal/
the database design process
The Database Design Process
  • Conceptual design:
    • Use a data model language to come up with an accurate, high-level description of the system requirements
    • Words (unstructured)  Diagrams
  • Logical design:
    • Map the resulting EERD into a set of relations
    • Diagram  Relations
  • Physical design:
    • Use DDL on some DBMS to create tables corresponding to your relations
the database design process4
The Database Design Process
  • Limitations of E-R Designs
    • The EER model provides a set of guidelines
      • Does not result in a unique database schema
    • Does not provide a “formal” way of evaluating alternatives
    • Relies largely on the common sense of the designer
  • Here we try to answer
    • What are the criteria for "good" base relations?
    • Meaningful grouping of attributes
  • When designing a relation schema, how to decide which attributes to include?
    • So far, attributes are grouped to form the relation schema by using the common sense of the database designer
the database design process5
The Database Design Process
  • First discuss informal guidelines for good relational design
  • Then we discuss formal concepts of functional dependencies and normal forms

- 1NF (First Normal Form)

- 2NF (Second Normal Form)

- 3NF (Third Normal Form)

- BCNF (Boyce-Codd Normal Form)

  • Additional types of dependencies, further normal forms, relational design algorithms by synthesis are discussed in Chapter 11
relation schema informal measures
Relation Schema Informal Measures
  • We have some informal measures:
    • Semantics of the attributes
    • Reducing the redundancy values in tuples
    • Disallowing the possibility of generating spurious tuples
    • Reducing null values
  • Not always independent of one another
semantics of the relation attributes 1
Semantics of the Relation Attributes (1)
  • Any grouping of attributes to form a relation schema must portray a certain real-world meaning
  • Each tuple in a relation should represent one entity or relationship instance
  • Guideline 1: Design a relation schema so that it is easy to explain its meaning
    • Semantics of attributes should be easy to interpret
    • Attributes of different entities should not be mixed
    • Only foreign keys should be used to refer to other entities
redundant information 2
Redundant Information (2)
  • One goal of schema design is to reduce redundancy
    • Information is stored redundantly wasting storage
    • Problems with update anomalies
      • Modification anomalies
      • Insertion anomalies
      • Deletion anomalies
  • Mixing attributes of multiple entities may cause the above problems
slide11

Update anomalies

    • Modification Anomalies
      • Update PNAME from ‘ProductY’ to ‘Customer-Accounting’
    • Insert Anomalies
      • Insert a new employee not assigned to known project
      • Insert a new project with no working employees
    • Delete Project
      • Delete PNUMBER=2
      • Delete the sole employee of a project
modification anomalies
Modification Anomalies
  • Consider the relation:
    • EMP_PROJ ( Emp#, Proj#, No_hours, Ename, Pname, Plocation)
  • Modification Anomaly:
    • Changing the name of project number P2 from “Project Y” to “Customer-Accounting”
    • May cause this update to be made for all employees working on project P2 otherwise the DB will become inconsistent
modification anomalies13
Modification Anomalies
  • Consider the relation:
    • EMP_PROJ( Emp#, Proj#, No_hours, Ename, Pname, Plocation)
  • Insert Anomaly: Cannot insert a project unless an employee is assigned to
    • Inversely - Cannot insert an employee unless he/she is assigned to a project.
  • Delete Anomaly: When a project is deleted  delete all the employees who work on the project
    • Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project
guidelines to redundant information in tuples and update anomalies
Guidelines to Redundant Information in Tuples and Update Anomalies
  • Guideline 2: Design a schema that does not suffer from the insertion, deletion and update anomalies
  • If there are any present, then note them so that applications can be made to take them into account
    • Might need to break the guidelines to improve performance for certain queries
    • Assume that we always access employee information only with department information
      • The design EMP_PROJ (Emp#, Proj#, No_hours, Ename, Pname, Plocation)might be could for such cases
example

SSN Name Address Hobby

1111 Joe 123 Main biking

1111 Joe 123 Main hiking

…………….

Redundancy

Example

ER Model

SSN Name Address Hobby

1111 Joe 123 Main {biking, hiking}

Relational Model (SSN, Hobby, Name, Address)

example16
Example
  • Redundancy leads to anomalies:
    • A change in Address must be made in several places
    • Suppose a person gives up all hobbies. Do we:
      • Set Hobby attribute to null? No, since Hobby is part of key
      • Delete the entire row? No, since we lose other information in the row
        • No hobby information?
    • Hobby value must be supplied for any inserted row since Hobby is part of key
decomposition
Decomposition
  • Solution: use two relations to store Person information
    • Person1 (SSN, Name, Address)
    • Hobbies (SSN, Hobby)
  • People with/without hobbies can now be described
  • No update anomalies:
    • Name and address stored once
    • A hobby can be separately supplied or deleted
spurious tuples 3
Spurious Tuples (3)
  • Bad designs for a relational database (or bad decompositions) may result in erroneous results for certain JOIN operations
  • Any decomposition MUST have the "lossless join" property
    • Nospurious tuplesshould be generated by doing a natural-join of any decomposed relations
      • Person1 (SSN, Name, Address)
      • Hobbies (SSN, Name)
    • Here, “loss” relates to loss of information
spurious tuples 319
Spurious Tuples (3)
  • Suppose we replace
    • EMP_PROJ ( SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS) by
    • EMP_PROJ1(SSN, PNUMBER, PNAME, PLOCATION, HOURS) ANDEMP_LOCS (ENAME, PLOCATION)
  • Guideline 3: The relations should be designed to satisfy the lossless join condition
    • Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations
null values in tuples 4
Null Values in Tuples (4)
  • Guideline 4: Relations should be designed such that their tuples will have as few NULL values as possible
    • Make sure only NULLs are exceptional cases
  • If many attributes do not apply to all tuples in the relation, we end up with many nulls
    • Waste space
    • Ambiguity in meaning
      • Attribute not applicable or invalid
      • Value known to exist, but unavailable
      • Attribute value unknown (may or may not exist)
    • Difficulty specifying JOIN operations (inner or outer joins)
  • Attributes that are NULL frequently could be placed in separate relations (with the primary key)
functional dependencies
Functional Dependencies
  • Functional dependencies (FDs) are used to specify formal measures of the "goodness" of a relational database design
  • FDs are constraints that are derived from the meaning and interrelationships of the data attributes
  • An FD is a constraint between two sets of attributes X and Y
    • X  Y holds if whenever two tuples have the same value for X, they must have the same value for Y
    • For any two tuples t1 and t2 in any relation instance r(R):If t1[X]=t2[X], then t1[Y]=t2[Y]
functional dependencies22
Functional Dependencies
  • X Y: A set of attributes X functionally determines a set of attributes Y (or Y is functionally determined by X) if the value of X determines a unique value for Y
  • X Y in R specifies a constraint on all relation instances r(R)
    • FDs are derived from the real-world constraints on the attributes
    • Property of the intentionof the database
  • An FD is a property of the attributes in the schema R
    • The constraint must hold on every relation instance r(R)
    • Can NEVER be deduced from an extension
      • E.g. if in some case, all people having the same first name are registered for the same course, can we deduce that name  course?
examples of fd constraints
Examples of FD Constraints
  • EMP_PROJ (SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS)
  • social security number determines employee name
    • SSN ENAME
  • project number determines project name and location
    • PNUMBER {PNAME, PLOCATION}
  • employee SSN and project number determines the hours per week that the employee works on the project
    • {SSN, PNUMBER} HOURS
more on fd constraints
More on FD Constraints
  • Definition of a relation KEY (If K is a key of R)
    • Kfunctionally determines all attributes in R
  • If XY is true, does that make YX true?
  • Some FDs are always true regardless of the relation in which they occur
    • {State, Driver_License_Number}  SSN
    • Zip  {City, State}
inference rules for fds
Inference Rules for FDs
  • Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold
  • Armstrong's inference rules:
    • IR1. (Reflexive) If Y X, then X Y
      • (Generates trivial FDs)
      • E.g. SSN, ENAME  ENAME
    • IR2. (Augmentation) If X Y, then XZ YZ (Note that XZ stands for X U Z)
      • E.g. SSN ENAME, then {SSN, PNUMBER}{ENAME, PNUMBER}
    • IR3. (Transitive) If X Y and Y Z, then X Z
      • E.g. SSN DOB and DOB  horoscope sign then SSN  horoscope sign
inference rules for fds26
Inference Rules for FDs
  • IR1, IR2, IR3 form a sound and complete set of inference rules
    • Sound Any rule inferred using IR1, IR2 or IR3 a valid FD
    • Complete  All possible FDs can be generated using them
  • Some additional inference rules that are useful:
    • IR4. (Decomposition) If X YZ, then X Y and X Z
      • SSN  ENAME, DOB then SSN  DOB & SSN  ENAME
    • IR5. (Union) If X Y and X Z, then X YZ
      • SSN  DOB & SSN  ENAME then SSN  ENAME, DOB
    • IR6. (Pseudo-transitivity) If X Y and WY Z, then WX Z
      • Can be deduced from IR1, IR2, and IR3 (completeness property)
      • OfficeLocation Department & Department, Ename  Salary-level then

OfficeLocation,Ename  Salary-level

proofs
Proofs
  • IR1. (Reflexive) If Y X, then X Y
    • For any two tuples t1 and t2 with t1[X] = t2[X] then t1[Y] = t2[Y] because Y X
  • IR2. (Augmentation) If X Y, then XZ YZ
    • Proof by contradiction
      • If for two tuples t1 and t2 we have
        • (1) t1[X] = t2[X]
        • (2) t1[Y] = t2[Y]
        • (3) t1[XZ] = t2[XZ]
        • (4) t1[YZ] ≠ t2[YZ]
      • Can’t be true since from (1) and (3) we have (5) t1[Z] = t2[Z] and from (2) and (5) we have t1[YZ] = t2[YZ] which contradicts (4)
proofs28
Proofs
  • IR3. (Transitive) If X Y and Y Z, then X Z
    • For any two tuples t1 and t2 with t1[X] = t2[X] then t1[Y] = t2[Y] which implies that t1[Z] = t2[Z] hence X Z holds
  • IR4. (Decomposition) If X YZ, then X Y and X Z
    • X YZ
    • YZ  Y (Using IR1)
    • X  Y (Using IR3)
      • Similarly for X  Z
proofs29
Proofs
  • IR5. (Union) If X Y and X Z, then X YZ
    • X  Y
    • X  Z
    • X  XY (Using IR1)
    • XY  YZ (Using IR2)
    • X  YZ (Using IR3)
  • IR6. (Psuedotransitivity) If X Y and WY Z, then WX Z
    • X Y
    • WY  Z
    • WX  WY (Using IR2)
    • WX  Z (Using IR3)