Lecture 13: Relational Decomposition and Relational Algebra

1 / 36

# Lecture 13: Relational Decomposition and Relational Algebra - PowerPoint PPT Presentation

Lecture 13: Relational Decomposition and Relational Algebra. February 5 th , 2003. Summary of Previous Discussion. FD’s are given as part of the schema. You derived keys from them. You can also specify keys directly (they’re just another FD)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Lecture 13: Relational Decomposition and Relational Algebra

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Lecture 13:Relational Decomposition and Relational Algebra

February 5th, 2003

Summary of Previous Discussion
• FD’s are given as part of the schema. You derived keys from them. You can also specify keys directly (they’re just another FD)
• First, compute the keys and the FD’s that hold on the relation you’re considering.
• Then, look for violations of BCNF. There may be a few. Choose one.
Summary (continued)
• You may need to decompose the decomposed relations:
• To decide that, you need to project the FD’s on the decomposed relations.
• The end result may differ, depending on which violation you chose to decompose by. They’re all correct.
• Relations with 2 attributes are in BCNF.
Summary (continued –2)
• If you have only a single FD representing a key, then the relation is in BCNF.
• But, you can still have another FD that forces you to decompose.
• This is all really pretty simple if you think about it long enough.
Boyce-Codd Normal Form

A simple condition for removing anomalies from relations:

A relation R is in BCNF if:

Whenever there is a nontrivial dependency A1, ..., An B

in R , {A1, ..., An} is a key for R

In English (though a bit vague):

Whenever a set of attributes of R is determining another attribute,

should determine all the attributes of R.

Example Decomposition

Person(name, SSN, age, hairColor, phoneNumber)

SSN  name, age, hairColor

age  hairColor

Decompose in BCNF (in class):

Step 1: find all keys ssn, phoneNumber

SSN, name, age, hairColor

SSN, phoneNumber

Step 2: now decompose

Other Example
• R(A,B,C,D) A B, B C
• Keys:
• Violations of BCNF:
Correct Decompositions

A decomposition is lossless if we can recover:

R(A,B,C)

R1(A,B) R2(A,C)

R’(A,B,C) should be the same as R(A,B,C)

Decompose

Recover

R’ is in general larger than R. Must ensure R’ = R

Correct Decompositions
• Given R(A,B,C) s.t. AB, the decomposition into R1(A,B), R2(A,C) is lossless
3NF: A Problem with BCNF

Unit Company Product

FD’s: Unit  Company; Company, Product  Unit

So, there is a BCNF violation, and we decompose.

Unit Company

Unit  Company

Unit Product

No FDs

So What’s the Problem?

Unit Company

Unit Product

Galaga99 UW Galaga99 databases

Bingo UW Bingo databases

No problem so far. All local FD’s are satisfied.

Let’s put all the data back into a single table again:

Unit Company Product

Galaga99 UW databases

Bingo UW databases

Violates the dependency: company, product -> unit!

Solution: 3rd Normal Form (3NF)

A simple condition for removing anomalies from relations:

A relation R is in 3rd normal form if :

Whenever there is a nontrivial dependency A1, A2, ..., An Bfor R , then {A1, A2, ..., An } a super-key for R,

or B is part of a key.

Relational Algebra
• Formalism for creating new relations from existing ones
• Its place in the big picture:

Declartivequerylanguage

Algebra

Implementation

Relational algebraRelational bag algebra

SQL,relational calculus

Relational Algebra
• Five operators:
• Union: 
• Difference: -
• Selection: s
• Projection: P
• Cartesian Product: 
• Derived or auxiliary operators:
• Intersection, complement
• Joins (natural,equi-join, theta join, semi-join)
• Renaming: r
1. Union and 2. Difference
• R1  R2
• Example:
• ActiveEmployees  RetiredEmployees
• R1 – R2
• Example:
• AllEmployees -- RetiredEmployees
• It is a derived operator
• R1  R2 = R1 – (R1 – R2)
• Also expressed as a join (will see later)
• Example
• UnionizedEmployees  RetiredEmployees
3. Selection
• Returns all tuples which satisfy a condition
• Notation: sc(R)
• Examples
• sSalary > 40000(Employee)
• sname = “Smithh”(Employee)
• The condition c can be =, <, , >,, <>
4. Projection
• Eliminates columns, then removes duplicates
• Notation: P A1,…,An(R)
• Example: project social-security number and names:
• PSSN, Name (Employee)
5. Cartesian Product
• Each tuple in R1 with each tuple in R2
• Notation: R1  R2
• Example:
• Employee  Dependents
• Very rare in practice; mainly used to express joins
Relational Algebra
• Five operators:
• Union: 
• Difference: -
• Selection: s
• Projection: P
• Cartesian Product: 
• Derived or auxiliary operators:
• Intersection, complement
• Joins (natural,equi-join, theta join, semi-join)
• Renaming: r
Renaming
• Changes the schema, not the instance
• Notation: rB1,…,Bn (R)
• Example:
• rLastName, SocSocNo (Employee)
Renaming Example

Employee

Name

SSN

John

999999999

Tony

777777777

• LastName, SocSocNo (Employee)

LastName

SocSocNo

John

999999999

Tony

777777777

Natural Join
• Notation: R1 ⋈ R2
• Meaning: R1 ⋈ R2 = PA(sC(R1  R2))
• Where:
• The selection sC checks equality of all common attributes
• The projection eliminates the duplicate common attributes

Employee Dependents =

PName, SSN, Dname(sSSN=SSN2(Employee x rSSN2, Dname(Dependents))

Natural Join Example

Employee

Name

SSN

John

999999999

Tony

777777777

Dependents

SSN

Dname

999999999

Emily

777777777

Joe

Name

SSN

Dname

John

999999999

Emily

Tony

777777777

Joe

Natural Join
• R= S=
• R ⋈ S=
Natural Join
• Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ?
• Given R(A, B, C), S(D, E), what is R ⋈ S ?
• Given R(A, B), S(A, B), what is R ⋈ S ?
Theta Join
• A join that involves a predicate
• R1 ⋈q R2 = sq (R1  R2)
• Here q can be any condition
Eq-join
• A theta join where q is an equality
• R1 ⋈A=B R2 = s A=B (R1  R2)
• Example:
• Employee ⋈SSN=SSN Dependents
• Most useful join in practice
Semijoin
• R ⋉ S = PA1,…,An (R ⋈ S)
• Where A1, …, An are the attributes in R
• Example:
• Employee ⋉ Dependents
Semijoins in Distributed Databases
• Semijoins are used in distributed databases

Dependents

Employee

network

Employee ⋈ssn=ssn (s age>71 (Dependents))

T = PSSNs age>71 (Dependents)

R = Employee ⋉ T

seller-ssn=ssn

pid=pid

Complex RA Expressions

Pname

Person Purchase Person Product

Pssn

Ppid

sname=fred

sname=gizmo

Operations on Bags

A bag = a set with repeated elements

All operations need to be defined carefully on bags

• {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}
• {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}
• sC(R): preserve the number of occurrences
• PA(R): no duplicate elimination
• Cartesian product, join: no duplicate elimination

Important ! Relational Engines work on bags, not sets !