- By
**flora** - Follow User

- 277 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Lecture 22' - flora

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Schedule Change

- http://www.users.csbsju.edu/~irahal/

The Database Design Process

- Conceptual design:
- Use a data model language to come up with an accurate, high-level description of the system requirements
- Words (unstructured) Diagrams
- Logical design:
- Map the resulting EERD into a set of relations
- Diagram Relations
- Physical design:
- Use DDL on some DBMS to create tables corresponding to your relations

The Database Design Process

- Limitations of E-R Designs
- The EER model provides a set of guidelines
- Does not result in a unique database schema
- Does not provide a “formal” way of evaluating alternatives
- Relies largely on the common sense of the designer
- Here we try to answer
- What are the criteria for "good" base relations?
- Meaningful grouping of attributes
- When designing a relation schema, how to decide which attributes to include?
- So far, attributes are grouped to form the relation schema by using the common sense of the database designer

The Database Design Process

- First discuss informal guidelines for good relational design
- Then we discuss formal concepts of functional dependencies and normal forms

- 1NF (First Normal Form)

- 2NF (Second Normal Form)

- 3NF (Third Normal Form)

- BCNF (Boyce-Codd Normal Form)

- Additional types of dependencies, further normal forms, relational design algorithms by synthesis are discussed in Chapter 11

Relation Schema Informal Measures

- We have some informal measures:
- Semantics of the attributes
- Reducing the redundancy values in tuples
- Disallowing the possibility of generating spurious tuples
- Reducing null values
- Not always independent of one another

Semantics of the Relation Attributes (1)

- Any grouping of attributes to form a relation schema must portray a certain real-world meaning
- Each tuple in a relation should represent one entity or relationship instance
- Guideline 1: Design a relation schema so that it is easy to explain its meaning
- Semantics of attributes should be easy to interpret
- Attributes of different entities should not be mixed
- Only foreign keys should be used to refer to other entities

Relations that violate Guideline 1 by intermixing attributes from different relations

Redundant Information (2)

- One goal of schema design is to reduce redundancy
- Information is stored redundantly wasting storage
- Problems with update anomalies
- Modification anomalies
- Insertion anomalies
- Deletion anomalies
- Mixing attributes of multiple entities may cause the above problems

- Modification Anomalies
- Update PNAME from ‘ProductY’ to ‘Customer-Accounting’
- Insert Anomalies
- Insert a new employee not assigned to known project
- Insert a new project with no working employees
- Delete Project
- Delete PNUMBER=2
- Delete the sole employee of a project

Modification Anomalies

- Consider the relation:
- EMP_PROJ ( Emp#, Proj#, No_hours, Ename, Pname, Plocation)
- Modification Anomaly:
- Changing the name of project number P2 from “Project Y” to “Customer-Accounting”
- May cause this update to be made for all employees working on project P2 otherwise the DB will become inconsistent

Modification Anomalies

- Consider the relation:
- EMP_PROJ( Emp#, Proj#, No_hours, Ename, Pname, Plocation)
- Insert Anomaly: Cannot insert a project unless an employee is assigned to
- Inversely - Cannot insert an employee unless he/she is assigned to a project.
- Delete Anomaly: When a project is deleted delete all the employees who work on the project
- Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project

Guidelines to Redundant Information in Tuples and Update Anomalies

- Guideline 2: Design a schema that does not suffer from the insertion, deletion and update anomalies
- If there are any present, then note them so that applications can be made to take them into account
- Might need to break the guidelines to improve performance for certain queries
- Assume that we always access employee information only with department information
- The design EMP_PROJ (Emp#, Proj#, No_hours, Ename, Pname, Plocation)might be could for such cases

1111 Joe 123 Main biking

1111 Joe 123 Main hiking

…………….

Redundancy

ExampleER Model

SSN Name Address Hobby

1111 Joe 123 Main {biking, hiking}

Relational Model (SSN, Hobby, Name, Address)

Example

- Redundancy leads to anomalies:
- A change in Address must be made in several places
- Suppose a person gives up all hobbies. Do we:
- Set Hobby attribute to null? No, since Hobby is part of key
- Delete the entire row? No, since we lose other information in the row
- No hobby information?
- Hobby value must be supplied for any inserted row since Hobby is part of key

Decomposition

- Solution: use two relations to store Person information
- Person1 (SSN, Name, Address)
- Hobbies (SSN, Hobby)
- People with/without hobbies can now be described
- No update anomalies:
- Name and address stored once
- A hobby can be separately supplied or deleted

Spurious Tuples (3)

- Bad designs for a relational database (or bad decompositions) may result in erroneous results for certain JOIN operations
- Any decomposition MUST have the "lossless join" property
- Nospurious tuplesshould be generated by doing a natural-join of any decomposed relations
- Person1 (SSN, Name, Address)
- Hobbies (SSN, Name)
- Here, “loss” relates to loss of information

Spurious Tuples (3)

- Suppose we replace
- EMP_PROJ ( SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS) by
- EMP_PROJ1(SSN, PNUMBER, PNAME, PLOCATION, HOURS) ANDEMP_LOCS (ENAME, PLOCATION)
- Guideline 3: The relations should be designed to satisfy the lossless join condition
- Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations

Null Values in Tuples (4)

- Guideline 4: Relations should be designed such that their tuples will have as few NULL values as possible
- Make sure only NULLs are exceptional cases
- If many attributes do not apply to all tuples in the relation, we end up with many nulls
- Waste space
- Ambiguity in meaning
- Attribute not applicable or invalid
- Value known to exist, but unavailable
- Attribute value unknown (may or may not exist)
- Difficulty specifying JOIN operations (inner or outer joins)
- Attributes that are NULL frequently could be placed in separate relations (with the primary key)

Functional Dependencies

- Functional dependencies (FDs) are used to specify formal measures of the "goodness" of a relational database design
- FDs are constraints that are derived from the meaning and interrelationships of the data attributes
- An FD is a constraint between two sets of attributes X and Y
- X Y holds if whenever two tuples have the same value for X, they must have the same value for Y
- For any two tuples t1 and t2 in any relation instance r(R):If t1[X]=t2[X], then t1[Y]=t2[Y]

Functional Dependencies

- X Y: A set of attributes X functionally determines a set of attributes Y (or Y is functionally determined by X) if the value of X determines a unique value for Y
- X Y in R specifies a constraint on all relation instances r(R)
- FDs are derived from the real-world constraints on the attributes
- Property of the intentionof the database
- An FD is a property of the attributes in the schema R
- The constraint must hold on every relation instance r(R)
- Can NEVER be deduced from an extension
- E.g. if in some case, all people having the same first name are registered for the same course, can we deduce that name course?

Examples of FD Constraints

- EMP_PROJ (SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS)
- social security number determines employee name
- SSN ENAME
- project number determines project name and location
- PNUMBER {PNAME, PLOCATION}
- employee SSN and project number determines the hours per week that the employee works on the project
- {SSN, PNUMBER} HOURS

More on FD Constraints

- Definition of a relation KEY (If K is a key of R)
- Kfunctionally determines all attributes in R
- If XY is true, does that make YX true?
- Some FDs are always true regardless of the relation in which they occur
- {State, Driver_License_Number} SSN
- Zip {City, State}

Inference Rules for FDs

- Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold
- Armstrong's inference rules:
- IR1. (Reflexive) If Y X, then X Y
- (Generates trivial FDs)
- E.g. SSN, ENAME ENAME
- IR2. (Augmentation) If X Y, then XZ YZ (Note that XZ stands for X U Z)
- E.g. SSN ENAME, then {SSN, PNUMBER}{ENAME, PNUMBER}
- IR3. (Transitive) If X Y and Y Z, then X Z
- E.g. SSN DOB and DOB horoscope sign then SSN horoscope sign

Inference Rules for FDs

- IR1, IR2, IR3 form a sound and complete set of inference rules
- Sound Any rule inferred using IR1, IR2 or IR3 a valid FD
- Complete All possible FDs can be generated using them
- Some additional inference rules that are useful:
- IR4. (Decomposition) If X YZ, then X Y and X Z
- SSN ENAME, DOB then SSN DOB & SSN ENAME
- IR5. (Union) If X Y and X Z, then X YZ
- SSN DOB & SSN ENAME then SSN ENAME, DOB
- IR6. (Pseudo-transitivity) If X Y and WY Z, then WX Z
- Can be deduced from IR1, IR2, and IR3 (completeness property)
- OfficeLocation Department & Department, Ename Salary-level then

OfficeLocation,Ename Salary-level

Proofs

- IR1. (Reflexive) If Y X, then X Y
- For any two tuples t1 and t2 with t1[X] = t2[X] then t1[Y] = t2[Y] because Y X
- IR2. (Augmentation) If X Y, then XZ YZ
- Proof by contradiction
- If for two tuples t1 and t2 we have
- (1) t1[X] = t2[X]
- (2) t1[Y] = t2[Y]
- (3) t1[XZ] = t2[XZ]
- (4) t1[YZ] ≠ t2[YZ]
- Can’t be true since from (1) and (3) we have (5) t1[Z] = t2[Z] and from (2) and (5) we have t1[YZ] = t2[YZ] which contradicts (4)

Proofs

- IR3. (Transitive) If X Y and Y Z, then X Z
- For any two tuples t1 and t2 with t1[X] = t2[X] then t1[Y] = t2[Y] which implies that t1[Z] = t2[Z] hence X Z holds
- IR4. (Decomposition) If X YZ, then X Y and X Z
- X YZ
- YZ Y (Using IR1)
- X Y (Using IR3)
- Similarly for X Z

Proofs

- IR5. (Union) If X Y and X Z, then X YZ
- X Y
- X Z
- X XY (Using IR1)
- XY YZ (Using IR2)
- X YZ (Using IR3)
- IR6. (Psuedotransitivity) If X Y and WY Z, then WX Z
- X Y
- WY Z
- WX WY (Using IR2)
- WX Z (Using IR3)

Download Presentation

Connecting to Server..