- By
**flora** - Follow User

- 268 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about '' - flora

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Schedule Change

- http://www.users.csbsju.edu/~irahal/

The Database Design Process

- Conceptual design:
- Use a data model language to come up with an accurate, high-level description of the system requirements
- Words (unstructured) Diagrams

- Logical design:
- Map the resulting EERD into a set of relations
- Diagram Relations

- Physical design:
- Use DDL on some DBMS to create tables corresponding to your relations

The Database Design Process

- Limitations of E-R Designs
- The EER model provides a set of guidelines
- Does not result in a unique database schema

- Does not provide a “formal” way of evaluating alternatives
- Relies largely on the common sense of the designer

- The EER model provides a set of guidelines
- Here we try to answer
- What are the criteria for "good" base relations?
- Meaningful grouping of attributes

- When designing a relation schema, how to decide which attributes to include?
- So far, attributes are grouped to form the relation schema by using the common sense of the database designer

The Database Design Process

- First discuss informal guidelines for good relational design
- Then we discuss formal concepts of functional dependencies and normal forms
- 1NF (First Normal Form)

- 2NF (Second Normal Form)

- 3NF (Third Normal Form)

- BCNF (Boyce-Codd Normal Form)

- Additional types of dependencies, further normal forms, relational design algorithms by synthesis are discussed in Chapter 11

Relation Schema Informal Measures

- We have some informal measures:
- Semantics of the attributes
- Reducing the redundancy values in tuples
- Disallowing the possibility of generating spurious tuples
- Reducing null values

- Not always independent of one another

Semantics of the Relation Attributes (1)

- Any grouping of attributes to form a relation schema must portray a certain real-world meaning
- Each tuple in a relation should represent one entity or relationship instance
- Guideline 1: Design a relation schema so that it is easy to explain its meaning
- Semantics of attributes should be easy to interpret
- Attributes of different entities should not be mixed
- Only foreign keys should be used to refer to other entities

Relations that violate Guideline 1 by intermixing attributes from different relations

Relation schemas that abide by Guideline 1 from different relations

Redundant Information (2) from different relations

- One goal of schema design is to reduce redundancy
- Information is stored redundantly wasting storage
- Problems with update anomalies
- Modification anomalies
- Insertion anomalies
- Deletion anomalies

- Mixing attributes of multiple entities may cause the above problems

- Update anomalies from different relations
- Modification Anomalies
- Update PNAME from ‘ProductY’ to ‘Customer-Accounting’

- Insert Anomalies
- Insert a new employee not assigned to known project
- Insert a new project with no working employees

- Delete Project
- Delete PNUMBER=2
- Delete the sole employee of a project

- Modification Anomalies

Modification Anomalies from different relations

- Consider the relation:
- EMP_PROJ ( Emp#, Proj#, No_hours, Ename, Pname, Plocation)

- Modification Anomaly:
- Changing the name of project number P2 from “Project Y” to “Customer-Accounting”
- May cause this update to be made for all employees working on project P2 otherwise the DB will become inconsistent

Modification Anomalies from different relations

- Consider the relation:
- EMP_PROJ( Emp#, Proj#, No_hours, Ename, Pname, Plocation)

- Insert Anomaly: Cannot insert a project unless an employee is assigned to
- Inversely - Cannot insert an employee unless he/she is assigned to a project.

- Delete Anomaly: When a project is deleted delete all the employees who work on the project
- Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project

Guidelines to Redundant Information in Tuples and Update Anomalies

- Guideline 2: Design a schema that does not suffer from the insertion, deletion and update anomalies
- If there are any present, then note them so that applications can be made to take them into account
- Might need to break the guidelines to improve performance for certain queries
- Assume that we always access employee information only with department information
- The design EMP_PROJ (Emp#, Proj#, No_hours, Ename, Pname, Plocation)might be could for such cases

SSN Name Address Hobby Anomalies

1111 Joe 123 Main biking

1111 Joe 123 Main hiking

…………….

Redundancy

ExampleER Model

SSN Name Address Hobby

1111 Joe 123 Main {biking, hiking}

Relational Model (SSN, Hobby, Name, Address)

Example Anomalies

- Redundancy leads to anomalies:
- A change in Address must be made in several places
- Suppose a person gives up all hobbies. Do we:
- Set Hobby attribute to null? No, since Hobby is part of key
- Delete the entire row? No, since we lose other information in the row
- No hobby information?

- Hobby value must be supplied for any inserted row since Hobby is part of key

Decomposition Anomalies

- Solution: use two relations to store Person information
- Person1 (SSN, Name, Address)
- Hobbies (SSN, Hobby)

- People with/without hobbies can now be described
- No update anomalies:
- Name and address stored once
- A hobby can be separately supplied or deleted

Spurious Tuples Anomalies (3)

- Bad designs for a relational database (or bad decompositions) may result in erroneous results for certain JOIN operations
- Any decomposition MUST have the "lossless join" property
- Nospurious tuplesshould be generated by doing a natural-join of any decomposed relations
- Person1 (SSN, Name, Address)
- Hobbies (SSN, Name)

- Here, “loss” relates to loss of information

- Nospurious tuplesshould be generated by doing a natural-join of any decomposed relations

Spurious Tuples (3) Anomalies

- Suppose we replace
- EMP_PROJ ( SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS) by
- EMP_PROJ1(SSN, PNUMBER, PNAME, PLOCATION, HOURS) ANDEMP_LOCS (ENAME, PLOCATION)

- Guideline 3: The relations should be designed to satisfy the lossless join condition
- Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations

Null Values in Tuples Anomalies (4)

- Guideline 4: Relations should be designed such that their tuples will have as few NULL values as possible
- Make sure only NULLs are exceptional cases

- If many attributes do not apply to all tuples in the relation, we end up with many nulls
- Waste space
- Ambiguity in meaning
- Attribute not applicable or invalid
- Value known to exist, but unavailable
- Attribute value unknown (may or may not exist)

- Difficulty specifying JOIN operations (inner or outer joins)

- Attributes that are NULL frequently could be placed in separate relations (with the primary key)

Functional Dependencies Anomalies

- Functional dependencies (FDs) are used to specify formal measures of the "goodness" of a relational database design
- FDs are constraints that are derived from the meaning and interrelationships of the data attributes
- An FD is a constraint between two sets of attributes X and Y
- X Y holds if whenever two tuples have the same value for X, they must have the same value for Y
- For any two tuples t1 and t2 in any relation instance r(R):If t1[X]=t2[X], then t1[Y]=t2[Y]

Functional Dependencies Anomalies

- X Y: A set of attributes X functionally determines a set of attributes Y (or Y is functionally determined by X) if the value of X determines a unique value for Y
- X Y in R specifies a constraint on all relation instances r(R)
- FDs are derived from the real-world constraints on the attributes
- Property of the intentionof the database

- An FD is a property of the attributes in the schema R
- The constraint must hold on every relation instance r(R)
- Can NEVER be deduced from an extension
- E.g. if in some case, all people having the same first name are registered for the same course, can we deduce that name course?

Examples of FD Constraints Anomalies

- EMP_PROJ (SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS)
- social security number determines employee name
- SSN ENAME

- project number determines project name and location
- PNUMBER {PNAME, PLOCATION}

- employee SSN and project number determines the hours per week that the employee works on the project
- {SSN, PNUMBER} HOURS

More on FD Constraints Anomalies

- Definition of a relation KEY (If K is a key of R)
- Kfunctionally determines all attributes in R

- If XY is true, does that make YX true?
- Some FDs are always true regardless of the relation in which they occur
- {State, Driver_License_Number} SSN
- Zip {City, State}

Inference Rules for FDs Anomalies

- Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold
- Armstrong's inference rules:
- IR1. (Reflexive) If Y X, then X Y
- (Generates trivial FDs)
- E.g. SSN, ENAME ENAME

- IR2. (Augmentation) If X Y, then XZ YZ (Note that XZ stands for X U Z)
- E.g. SSN ENAME, then {SSN, PNUMBER}{ENAME, PNUMBER}

- IR3. (Transitive) If X Y and Y Z, then X Z
- E.g. SSN DOB and DOB horoscope sign then SSN horoscope sign

- IR1. (Reflexive) If Y X, then X Y

Inference Rules for FDs Anomalies

- IR1, IR2, IR3 form a sound and complete set of inference rules
- Sound Any rule inferred using IR1, IR2 or IR3 a valid FD
- Complete All possible FDs can be generated using them

- Some additional inference rules that are useful:
- IR4. (Decomposition) If X YZ, then X Y and X Z
- SSN ENAME, DOB then SSN DOB & SSN ENAME

- IR5. (Union) If X Y and X Z, then X YZ
- SSN DOB & SSN ENAME then SSN ENAME, DOB

- IR6. (Pseudo-transitivity) If X Y and WY Z, then WX Z
- Can be deduced from IR1, IR2, and IR3 (completeness property)
- OfficeLocation Department & Department, Ename Salary-level then
OfficeLocation,Ename Salary-level

- IR4. (Decomposition) If X YZ, then X Y and X Z

Proofs Anomalies

- IR1. (Reflexive) If Y X, then X Y
- For any two tuples t1 and t2 with t1[X] = t2[X] then t1[Y] = t2[Y] because Y X

- IR2. (Augmentation) If X Y, then XZ YZ
- Proof by contradiction
- If for two tuples t1 and t2 we have
- (1) t1[X] = t2[X]
- (2) t1[Y] = t2[Y]
- (3) t1[XZ] = t2[XZ]
- (4) t1[YZ] ≠ t2[YZ]

- Can’t be true since from (1) and (3) we have (5) t1[Z] = t2[Z] and from (2) and (5) we have t1[YZ] = t2[YZ] which contradicts (4)

- If for two tuples t1 and t2 we have

- Proof by contradiction

Proofs Anomalies

- IR3. (Transitive) If X Y and Y Z, then X Z
- For any two tuples t1 and t2 with t1[X] = t2[X] then t1[Y] = t2[Y] which implies that t1[Z] = t2[Z] hence X Z holds

- IR4. (Decomposition) If X YZ, then X Y and X Z
- X YZ
- YZ Y (Using IR1)
- X Y (Using IR3)
- Similarly for X Z

Proofs Anomalies

- IR5. (Union) If X Y and X Z, then X YZ
- X Y
- X Z
- X XY (Using IR1)
- XY YZ (Using IR2)
- X YZ (Using IR3)

- IR6. (Psuedotransitivity) If X Y and WY Z, then WX Z
- X Y
- WY Z
- WX WY (Using IR2)
- WX Z (Using IR3)

Download Presentation

Connecting to Server..