Lecture 11: Functional Dependencies

1 / 44

# Lecture 11: Functional Dependencies - PowerPoint PPT Presentation

Lecture 11: Functional Dependencies. January 31 st , 2003. Outline. Relational decomposition Normal forms Begin relational algebra . Relational Schema Design. Recall set attributes (persons with several phones):. SSN  Name, City , but not SSN  PhoneNumber. Anomalies:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Lecture 11: Functional Dependencies' - novia

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Lecture 11:Functional Dependencies

January 31st, 2003

Outline
• Relational decomposition
• Normal forms
• Begin relational algebra
Relational Schema Design

Recall set attributes (persons with several phones):

SSN Name, City, but not SSN  PhoneNumber

• Anomalies:
• Redundancy = repeat data
• Update anomalies = Fred moves to “Bellvue”
• Deletion anomalies = Fred drops all phone numbers: what is his city ?
Relation Decomposition

Break the relation into two:

name

Person

Product

price

name

ssn

Relational Schema Design

Conceptual Model:

Relational Model:

plus FD’s

Normalization:

Eliminates anomalies

Decompositions in General

R(A1, ..., An)

Create two relations R1(B1, ..., Bm) and R2(C1, ..., Cp)

such that: B1, ..., Bm C1, ..., Cp= A1, ..., An

and:

R1 = projection of R on B1, ..., Bm

R2 = projection of R on C1, ..., Cp

Incorrect Decomposition
• Sometimes it is incorrect:

Decompose on : Name, Category and Price, Category

Incorrect Decomposition

When we put it back:

Cannot recover information

Normal Forms

First Normal Form = all attributes are atomic

Second Normal Form (2NF) = old and obsolete

Third Normal Form (3NF) = this lecture

Boyce Codd Normal Form (BCNF) = this lecture

Others...

Boyce-Codd Normal Form

A simple condition for removing anomalies from relations:

A relation R is in BCNF if:

Whenever there is a nontrivial dependency A1, ..., An B

in R , {A1, ..., An} is a key for R

In English (though a bit vague):

Whenever a set of attributes of R is determining another attribute,

should determine all the attributes of R.

Example

What are the dependencies?

SSN  Name, City

What are the keys?

{SSN, PhoneNumber}

Is it in BCNF?

Decompose it into BCNF

SSN  Name, City

Summary of BCNF Decomposition

Find a dependency that violates the BCNF condition:

A , A , … A

B , B , … B

1

2

m

1

2

n

Heuristics: choose B , B , … B “as large as possible”

1

2

m

Decompose:

Continue until

there are no

BCNF violations

left.

Others

A’s

B’s

Is there a

2-attribute

relation that is

not in BCNF ?

R1

R2

Example Decomposition

Person(name, SSN, age, hairColor, phoneNumber)

SSN  name, age

age  hairColor

Decompose in BCNF (in class):

Step 1: find all keys

Step 2: now decompose

Other Example
• R(A,B,C,D) A B, B C
• Keys:
• Violations of BCNF:
Correct Decompositions

A decomposition is lossless if we can recover:

R(A,B,C)

R1(A,B) R2(A,C)

R’(A,B,C) should be the same as R(A,B,C)

Decompose

Recover

R’ is in general larger than R. Must ensure R’ = R

Correct Decompositions
• Given R(A,B,C) s.t. AB, the decomposition into R1(A,B), R2(A,C) is lossless
3NF: A Problem with BCNF

Unit Company Product

FD’s: Unit  Company; Company, Product  Unit

So, there is a BCNF violation, and we decompose.

Unit Company

Unit  Company

Unit Product

No FDs

So What’s the Problem?

Unit Company

Unit Product

Galaga99 UW Galaga99 databases

Bingo UW Bingo databases

No problem so far. All local FD’s are satisfied.

Let’s put all the data back into a single table again:

Unit Company Product

Galaga99 UW databases

Bingo UW databases

Violates the dependency: company, product -> unit!

Solution: 3rd Normal Form (3NF)

A simple condition for removing anomalies from relations:

A relation R is in 3rd normal form if :

Whenever there is a nontrivial dependency A1, A2, ..., An Bfor R , then {A1, A2, ..., An } a super-key for R,

or B is part of a key.

Relational Algebra
• Formalism for creating new relations from existing ones
• Its place in the big picture:

Declartivequerylanguage

Algebra

Implementation

Relational algebraRelational bag algebra

SQL,relational calculus

Relational Algebra
• Five operators:
• Union: 
• Difference: -
• Selection: s
• Projection: P
• Cartesian Product: 
• Derived or auxiliary operators:
• Intersection, complement
• Joins (natural,equi-join, theta join, semi-join)
• Renaming: r
1. Union and 2. Difference
• R1  R2
• Example:
• ActiveEmployees  RetiredEmployees
• R1 – R2
• Example:
• AllEmployees -- RetiredEmployees
• It is a derived operator
• R1  R2 = R1 – (R1 – R2)
• Also expressed as a join (will see later)
• Example
• UnionizedEmployees  RetiredEmployees
3. Selection
• Returns all tuples which satisfy a condition
• Notation: sc(R)
• Examples
• sSalary > 40000(Employee)
• sname = “Smithh”(Employee)
• The condition c can be =, <, , >,, <>
4. Projection
• Eliminates columns, then removes duplicates
• Notation: P A1,…,An(R)
• Example: project social-security number and names:
• PSSN, Name (Employee)
5. Cartesian Product
• Each tuple in R1 with each tuple in R2
• Notation: R1  R2
• Example:
• Employee  Dependents
• Very rare in practice; mainly used to express joins
Relational Algebra
• Five operators:
• Union: 
• Difference: -
• Selection: s
• Projection: P
• Cartesian Product: 
• Derived or auxiliary operators:
• Intersection, complement
• Joins (natural,equi-join, theta join, semi-join)
• Renaming: r
Renaming
• Changes the schema, not the instance
• Notation: rB1,…,Bn (R)
• Example:
• rLastName, SocSocNo (Employee)
Renaming Example

Employee

Name

SSN

John

999999999

Tony

777777777

• LastName, SocSocNo (Employee)

LastName

SocSocNo

John

999999999

Tony

777777777

Natural Join
• Notation: R1 ⋈ R2
• Meaning: R1 ⋈ R2 = PA(sC(R1  R2))
• Where:
• The selection sC checks equality of all common attributes
• The projection eliminates the duplicate common attributes

Employee Dependents =

PName, SSN, Dname(sSSN=SSN2(Employee x rSSN2, Dname(Dependents))

Natural Join Example

Employee

Name

SSN

John

999999999

Tony

777777777

Dependents

SSN

Dname

999999999

Emily

777777777

Joe

Name

SSN

Dname

John

999999999

Emily

Tony

777777777

Joe

Natural Join
• R= S=
• R ⋈ S=
Natural Join
• Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ?
• Given R(A, B, C), S(D, E), what is R ⋈ S ?
• Given R(A, B), S(A, B), what is R ⋈ S ?
Theta Join
• A join that involves a predicate
• R1 ⋈q R2 = sq (R1  R2)
• Here q can be any condition
Eq-join
• A theta join where q is an equality
• R1 ⋈A=B R2 = s A=B (R1  R2)
• Example:
• Employee ⋈SSN=SSN Dependents
• Most useful join in practice
Semijoin
• R ⋉ S = PA1,…,An (R ⋈ S)
• Where A1, …, An are the attributes in R
• Example:
• Employee ⋉ Dependents
Semijoins in Distributed Databases
• Semijoins are used in distributed databases

Dependents

Employee

network

Employee ⋈ssn=ssn (s age>71 (Dependents))

T = PSSNs age>71 (Dependents)

R = Employee ⋉ T

seller-ssn=ssn

pid=pid

Complex RA Expressions

Pname

Person Purchase Person Product

Pssn

Ppid

sname=fred

sname=gizmo

Operations on Bags

A bag = a set with repeated elements

All operations need to be defined carefully on bags

• {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}
• {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}
• sC(R): preserve the number of occurrences
• PA(R): no duplicate elimination
• Cartesian product, join: no duplicate elimination

Important ! Relational Engines work on bags, not sets !