Lecture 4: Relational algebra

1 / 34

Lecture 4: Relational algebra - PowerPoint PPT Presentation

Lecture 4: Relational algebra. www.cl.cam.ac.uk/Teaching/current/Databases/. Today’s lecture. What’s the (core) relational algebra? How can we write queries using the relational algebra? How powerful is the relational algebra?. Relational query languages.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'Lecture 4: Relational algebra' - montrell-seamus

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Lecture 4:Relational algebra

www.cl.cam.ac.uk/Teaching/current/Databases/

Today’s lecture
• What’s the (core) relational algebra?
• How can we write queries using the relational algebra?
• How powerful is the relational algebra?
Relational query languages
• Query languages allow the manipulation and retrieval of data from a database
• The relational model supports simple, powerful query languages
• Strong formal foundation
• Allows for much (provably correct) optimisation
• NOTE: Query languages are not (necessarily) programming languages
Formal relational query languages
• Two formal query languages
• Relational Algebra
• Simple ‘operational’ model, useful for expressing execution plans
• RelationalCalculus
• Logical model (‘declarative’), useful for theoretical results
• Both languages were introduced by Codd in a series of papers
• They have equivalent expressive power

They are the key to understanding SQL query processing!

Preliminaries
• A query is applied to relation instances, and the result of a query is also a relation instance
• Schema of relations are fixed (cf. types)
• The query will then execute over any valid instance
• The schema of the result can also be determined
Example relation instances
• A database of boats, sailors, and reservations

S2

R1

S1

B1

Core relational algebra
• Five basic operator classes:
• Selection
• Selects a subset of rows
• Projection
• Picking certain columns
• Renaming
• Renaming attributes
• Set theoretic operations
• The familiar operations: union, intersection, difference, …
• Products and joins
• Combining relations in useful ways
Selection
• Selects rows that satisfy a condition, written

R1 = c(R2)

• where c is a condition involving the attributes of R2, e.g.

rating>8(S2)

returns the relation instance

Selection cont.
• Note:
• The schema of the result is exactly the same as the schema of the input relation instance
• There are no duplicates in the resulting relation instance (why?)
• The resulting relation instance can be used as the input for another relational algebra operator, e.g.

sname=“Julia”(rating>8(S2))

Projection

Deletes fields that are not in the

projection list

R1=A(R2)

where A is a list of attributes from the

schema of R2, e.g.

sname,rating(S2)

returns the relation instance

Projection cont.
• Note:
• Projection operator has to eliminate duplicates (why?)
• Aside: Real systems don’t normally perform duplicate elimination unless the user explicitly asks for it (why not?)
Renaming

R1= A:=B(R2)

• Returns a relation instance identical to R2 except that field A is renamed B
• For example, sname:=nom(S1)
Familiar set operations
• We have the familiar set-theoretic operators, e.g. , , -
• There is a restriction on their input relation instances: they must be union compatible
• Same number of fields
• Same field names and domains
• E.g. S1S2 is valid, but S1R1 is not
Cartesian products

AB

• Concatenate every row of A with every row of B
• What do we do if A and B have some field names in common?
• Several choices, but we’ll simply assume that the resulting duplicate field names will have the suffix 1 and 2
Theta join
• Theoretically, it is a derived operator

R1Vc R2@c(R1R2)

• E.g., S1 Vsid.1<=sid.2R1
Theta join cont.
• The result schema is the same as for a cross-product
• Sometimes this operator is called a conditional join
• Most commonly the condition is an equality on field names, e.g. S1 Vsid.1=sid.2R1
Equi- and natural join
• Equi-join is a special case of theta join where the condition is equality of field names, e.g. S1 Vsid R1
• Natural join is an equi-join on all common fields where the duplicate fields are removed. It is written simply A V B
Natural join cont.
• Note that the common fields appear only once in the resulting relation instance
• This operator appears very frequently in real-life queries
• It is always implemented directly by the query engine (why?)
Division
• Not a primitive operator, but useful to express queries such as

Find sailors who have reserved all the boats

• Consider the simple case, where relation A has fields x and y, and relation B has field y
• A/B is the set of xs (sailors) such that for every y (boat) in B, there is a row (x,y) in A
Division cont.
• Can you code this up in the relational algebra?
Division cont.
• Can you code this up in the relational algebra?

x’s that are disqualified:

x((x(A)  B) – A)

Thus:

x(A)-x((x(A)  B) – A)

Example 1

Find names of sailors who’ve reserved boat 103

Solution 1:sname(bid=103(Reserves) V Sailors)

Solution 2: sname(bid=103(Reserves V Sailors))

Which is more efficient?

Query

optimisation

Example 2

Find names of sailors who’ve reserved a red boat

Example 2

Find names of sailors who’ve reserved a red boat

sname(colour=“red”(Boats) V Reserves V Sailors)

Better:

sname(sid(bid(colour=“red”(Boats)) V Reserves) V Sailors)

Example 3

Find sailors who’ve reserved a red or a green boat

Example 3

Find sailors who’ve reserved a red or a green boat

let T = colour=“red”colour=“green”(Boats)

in

sname(T V Reserves V Sailors)

Example 4

Find sailors who’ve reserved a red and a green boat

Example 4

Find sailors who’ve reserved a red and a green boat

NOTE: Can’t just trivially modify last solution!

let T1 = sid (colour=“red”(Boats) V Reserves)

T2 = sid (colour=“green”(Boats) V Reserves)

in

sname((T1  T2) V Sailors)

Example 5

Find the names of sailors who’ve reserved at least two boats

let T = sid.1:=sid (sid.1,sname,bid (Sailors V Reserves))

in

sname.1 (sid.1=sid.2bid.1bid.2(T  T))

Example 6

Find the names of sailors who’ve reserved all boats

let T = sid,bid (Reserves) / bid (Boats)

in

sname(T V Sailors)

Computational limitations
• Suppose we have a relation SequelOf of movies and their immediate sequels
• We want to compute the relation ‘isFollowedBy’ …
Computational limitations
• We could compute

fst,thd(movie:=fst,sequel:=snd(SequelOf)

V movie:=snd,sequel:=thd(SequelOf))

• This provides us with sequels-of-sequels
• We could write three joins to get sequels-of- sequels-of-sequels and union the results
• What about Friday the 13th (9 sequels)? 
• In general we need to be able to write an arbitrarily large union…
• The relational algebra needs to be extended to handle these sorts of queries
Summary

You should now understand:

• The core relational algebra
• Operations and semantics
• Union compatibility
• Computational limitations of the relational algebra

Next lecture: Relational calculus