Loading in 5 sec....

C20.0046: Database Management Systems Lecture #5PowerPoint Presentation

C20.0046: Database Management Systems Lecture #5

- 101 Views
- Uploaded on
- Presentation posted in: General

C20.0046: Database Management Systems Lecture #5

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

C20.0046: Database Management SystemsLecture #5

Matthew P. Johnson

Stern School of Business, NYU

Spring, 2004

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Last time: relational model
- Homework 1 is up, due next Tuesday
- This time:
- Functional dependencies
- Keys and superkeys in terms of FDs
- Finding keys for relations

- Rules for combining FDs
- Next time: anomalies & normalization

M.P. Johnson, DBMS, Stern/NYU, Sp2004

title

year

length

stars

Movies

isa

isa

Weapon

Voices

Lion King

Murder-Mysteries

Cartoons

Component

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- OO-style:
- Nulls-style:

1. Plain movies

3. Murder mysteries

2. Cartoons

4. Cartoon murder-mysteries

M.P. Johnson, DBMS, Stern/NYU, Sp2004

Root entity set: Movies(title, year, length)

Title

Year

length

Star Wars

1980

120

Roger Rabbit

1990

115

Scream

1988

110

Lion King

1993

130

Subclass: MurderMysteries(title, year, murderWeapon)

Title

Year

murderWeapon

Subclass: Cartoons(title, year)

Scream

1988

Knife

Title

Year

R. Rabbit

1990

Knife

Roger Rabbit

1990

Lion King

1993

- E/R-style

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Name and year of Roger Rabbit were listed in three different rows (in different tables)
- Suppose title changes (“Roger” “Roget”)
- must change all three places

- Q: Is this redundancy?
- A: No!
- name and year are independent
- multiple movies may have same name

- Real redundancy reqs. depenency
- two rows agree on SSN must agree on rest
- conflicting hair colors in these rows is an error

- two rows agree on movie title may still disagree
- conflicting years may be correct – or may not be

- Better: introduce “movie-id” key att

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Exercise 3.3.1
- Convert from E/R to R, by E/R, OO and nulls

chair

name

number

room

givenBy

Depts

courses

isa

Lab-courses

Computer-allocation

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- FDs are constraints
- part of the schema
- can’t tell from particular relation instances
- FD may hold for some instances “accidentally”

- Finding all FDs is part of DB design
- Used in normalization

M.P. Johnson, DBMS, Stern/NYU, Sp2004

then they must also agree on the attributes

B1, B2, …, Bm

- Definition:
- Notation:
- Read: Ai functionally determines Bj

If two tuples agree on the attributes

A1, A2, …, An

A1, A2, …, An B1, B2, …, Bm

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Product
- name price, manufacturer

- Person
- ssn name, age
- father’s/husband’s-name last-name
- zipcode state
- phone state (notwithstanding inter-state area codes)

- Company
- name stockprice, president
- symbol name
- name symbol

M.P. Johnson, DBMS, Stern/NYU, Sp2004

A1

...

Am

B1

...

Bm

t1

t2

- To check A B, erase all other columns; for each rows t1, t2
- i.e., check if remaining relation is many-one
- no “divergences”
- i.e., if AB is a well-defined function
- thus, functional dependency

if t1, t2 agree here

then t1, t2 agree here

M.P. Johnson, DBMS, Stern/NYU, Sp2004

Product(name, category, color, department, price)

Consider these FDs:

namecolor

categorydepartment

color, categoryprice

What do they say ?

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- FDs are constraints:
- On some instances they hold
- On others they don’t

namecolor

categorydepartment

color, categoryprice

Does this instance satisfy all the FDs ?

M.P. Johnson, DBMS, Stern/NYU, Sp2004

namecolor

categorydepartment

color, categoryprice

What about this one ?

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Q: Is PositionPhone an FD here?
- A: It is for this instance, but no, presumably not in general
- Others FDs?
- EmpID Name, Phone, Position
- but Phone Position

EmpID

Name

Phone

Position

E0045

Smith

1234

Clerk

E1847

John

9876

Salesrep

E1111

Smith

9876

Salesrep

E9999

Mary

1234

Lawyer

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- {A1A2A3…An} is a key for relation R if
- A1A2A3…Anfunctionally determine all other attributes
- Usual notation: A1A2A3…An B1B2…Bk
- rels = sets distinct rows can’t agree on all Ai

- A1A2A3…An is minimal
- No proper subset of A1A2A3…An functionally determines all other attributes of R

- A1A2A3…Anfunctionally determine all other attributes
- Primary key: chosen if there are several possible keys

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Relation: Student(Name, Address, DoB, Email, Credits)
- Which (/why) of the following are keys?
- Name, Address
- Name, SSN
- Email, SSN
- SSN

- NB: minimal != smallest

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- A set of attributes that contains a key
- Satisfies first condition:
- functionally determines every other attribute in the relation

- Might not satisfy the second condition: minimality
- may be possible to peel away some attributes from the superkey
- keys are superkeys
- key are special case of superkey

- superkey set is superset of key set

- name;ssn is a superkey but not a key

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Relation entity set
- Key of relation = (minimized) key of entity set

- Relation binary relationship
- Many-many: union of keys of both entity sets
- Many(M)-one(O): only key of M (Why?)
- One-one: key of either entity set (but not both!)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Key of entity set = (minimized) key of relation
- Student(Name, Address, DoB, SSN, Email, Credits)

SSN

Name

Student

Address

Credits

DoB

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Many-many key: union of both ES keys

Student

Enrolls

Course

SSN

Credits

Name

CourseID

Enrolls(SSN,CourseID)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Key of many ES but not of one ES
- keys from both would be non-minimal

Course

MeetsIn

Room

CourseID

Name

Capacity

RoomNo

MeetsIn(CourseID,RoomNo)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Keys of both ESs included in relation
- Key is key of either ES (but not both!)

Husbands

Married

Wives

SSN

Name

Name

SSN

Married(HSSN, WSSN) or

Married(HSSN, WSSN)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

Course

Enrolls

Student

CourseID

Name

SSN

Section

Name

RoomNo

Capacity

- Multiway relationships:
- Multiple ways – may not be obvious
- R:F,G,HE is many-one E’s key is included
- but not part of key
- Recall that relship atts are implicitly many-one

Enrolls(CourseID,SSN,RoomNo)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Exercise 3.4.2
- Relation relating particles in a box to locations and velocities
InPosition(id,x,y,z,vx,vy,vz)

- Q: What FDs hold?
- Q: What are the keys?

M.P. Johnson, DBMS, Stern/NYU, Sp2004

If some FDs are satisfied, thenothers are satisfied too

namecolor

categorydepartment

color, categoryprice

If all these FDs are true:

name, categoryprice

Then this FD also holds:

Why?

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Reasoning about FDs: given a set of FDs, infer other FDs – useful
- E.g. A B, B C A C
- Definitions: for FD-sets S and T
- T follows from S if all relation-instances satisfying S also satisfy T.
- S and T are equivalent if the sets of relation-instances satisfying S and T are the same.
- I.e., S and T are equivalent if S follows from T, and T follows from S.

M.P. Johnson, DBMS, Stern/NYU, Sp2004

A1

...

Am

B1

...

Bm

t1

t2

Splitting rule:

A1, A2, …, An B1

A1, A2, …, An B2

. . . . .

A1, A2, …, An Bm

A1A2…An B1B2…Bm

Note: doesn’t apply to the left side

Combining rule:

Q: Does it apply to the left side?

M.P. Johnson, DBMS, Stern/NYU, Sp2004

A1, A2, …, An Ai

with i in 1..n is a trivial FD

- FD A1A2…An B1B2…Bk may be
- Trivial: Bs are a subset of As
- Nontrivial: >=1 of the Bs is not among the As
- Completely nontrivial: none of the Bs is among the As

- Trivial elimination rule:
- Eliminate common attributes from Bs, to get an equivalent completely nontrivial FD

t

t’

M.P. Johnson, DBMS, Stern/NYU, Sp2004

If

A1, A2, …, An B1, B2, …, Bm

and

B1, B2, …, Bm C1, C2, …, Cp

A1, A2, …, An C1, C2, …, Cp

then

t

t’

M.P. Johnson, DBMS, Stern/NYU, Sp2004

A1, A2, …, An B

If

then

A1, A2, …, An , C B, C, for any C

t

t’

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- A B AC B (by definition)
- Separation/Combination
- Reflexive
- Augmentation
- Transitivity
- Last 3 called Armstrong’s Axioms
- Complete: entire closure follows from these
- Sound: no other FDs follow from these

M.P. Johnson, DBMS, Stern/NYU, Sp2004

1. namecolor

2. categorydepartment

3. color, categoryprice

Start from the following FDs:

Infer the following FDs:

Reflexive rule

Transitivity(4,1)

Reflexive rule

combine(5,6) or Aug(1)

Transitivity(3,7)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

Given a set of FDs, infer all possible FDs

How to proceed?

- Try all possible FDs, apply all rules
- E.g. R(A, B, C, D): how many FDs are possible?

- Drop trivial FDs, drop augmented FDs
- Still way too many

- Better: use the Closure Algorithm (next)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

Given a set of attributes A1, …, An

The closure, {A1, …, An}+ = {B in Atts: A1, …, An B}

Example:

namecolor

categorydepartment

color, categoryprice

Closures: {name}+ = {name, color} {name, category}+ = {name, category, color, department, price} {color}+ = {color}

M.P. Johnson, DBMS, Stern/NYU, Sp2004

Start with X={A1, …, An}.

Repeat:

if B1, …, Bn C is a FD and

B1, …, Bn are all in X

then add C to X.

until X didn’t change

Example:

namecolor

categorydepartment

color, categoryprice

{name, category}+ = {name, category, color, department, price}

M.P. Johnson, DBMS, Stern/NYU, Sp2004

In class:

A, B C

A, D E

B D

A, F B

R(A,B,C,D,E,F)

Compute {A,B}+ X = {A, B, }

Compute {A, F}+ X = {A, F, }

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Given FDs versus Derived FDs
- Given FDs: stated initially for the relation
- Derived FDs: those that are inferred using the rules or through closure

- Basis: A set of FDs from which we can infer all other FDs for the relation
- Minimal basis: a minimal set of FDs (Can’t discard any of them)
- No proper subset of the minimal basis can derive the complete set of FDs

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Given a database instance
- Find all FD’s satisfied by that instance
- Useful if we don’t get enough information from our users: need to reverse engineer a data instance
- Q: How long does this take?
- A: Some time for each subset of atts
- Q: How many subsets?
- powerset

- exponential time in worst-case
- But can often be smarter…

M.P. Johnson, DBMS, Stern/NYU, Sp2004

Example:

A, B CA, D B

B D

Step 1: Compute X+, for every X:

A+ = A, B+ = BD, C+ = C, D+ = D

AB+ = ABCD, AC+ = AC, AD+ = ABCD

ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why?)

BCD+ = BCD, ABCD+ = ABCD

Step 2: Enumerate all FD’s X Y, s.t. Y X+ and XY = :

AB CD, ADBC, ABC D, ABD C, ACD B

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- R(A,B,C)
- Each of three determines other two
- Q: What are the FDs?
- Closure of singleton sets
- Closure of doubletons

- Q: What are the keys?
- Q: What are the minimal bases?

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Rephrasing of definition of key:
- X is a key if
- X+ = all attributes
- X is minimal
Note: there can be many minimal keys !

- Example: R(A,B,C), ABC, BCAMinimal keys: AB and BC

M.P. Johnson, DBMS, Stern/NYU, Sp2004

- Product(name, price, category, color)
name, category price

category color

Keys are: {name, category}

- Enrollment(student, address, course, room, time)
student address

room, time course

student, course room, time

Keys are: [in class]

M.P. Johnson, DBMS, Stern/NYU, Sp2004