# Database Systems Chapter 6 The Relational Algebra - PowerPoint PPT Presentation

1 / 61

Database Systems Chapter 6 The Relational Algebra. Relation Schema. A relation schema is used to describe a relation Denoted by R(A 1 , A 2 , A 3 , …, A n ), where R: Relation schema name A 1 , …, A n : attributes of R

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Database Systems Chapter 6 The Relational Algebra

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Database SystemsChapter 6The Relational Algebra

### Relation Schema

• A relation schema is used to describe a relation

• Denoted by R(A1, A2, A3, …, An), where

• R: Relation schema name

• A1, …, An: attributes of R

• The degree of a relation is the number of attributes in a relation schema

• STUDENT(Name, SSN, HomePhone, Address, OfficePhone, Age, GPA)

• Degree(STUDENT) = 7

### Relation Instance (1)

• A relation (or relation instance) r of the relation schema R(A1, A2, …, An), denoted by r(R), is a set of n-tuples

• r = {t1, t2, …, tm)

• each n-tuple ti is an ordered list of n values,

• ti = <v1, v2, …, vn) where

• each value vi, 1  i  n, is an element of dom(Ai) or a special null value

### Basic Relational Algebra Operations

• Is a collection of operators that are used to manipulate entire relations. The result of each operation is a new relation which can be further manipulated.

### SELECT Operation

•  --- Select (sigma)

• Format: selection-condition(R)

• Semantics:

• returns all tuples of relation R that satisfy the selection-condition

• Select operation is unary. It applies to a single relation

• is used to select a subset of the tuples in a relation that satisfy a selection condition.

### Formats of Selection Conditions

• (a) A op v: A is an attribute, op is an operator (=, , <, , >, ), and v is a constant.

• Age  20, Name = `Bill'

• (b) A op B: A and B are two attributes in R.

• Persons(SSN, Name, Birthplace, Residence)

• Birthplace = Residence

• (c) Combinations of (a) and (b) connected by and, or or not.

• Age  20 and Birthplace = Residence

### An Example of SELECT (1)

• Example: Find all students who are 20 years old or younger, and whose birthplace is the same as his/her residence.

• Age  20 and Birthplace = Residence(Students)

### An Example of Select (2)

If the current Students is:

SSN Name Age GPA Birthplace Residence

123456789 John 20 3.2 Vestal Vestal

234567891 Mary 18 2.9 Binghamton Vestal

345678912 Bill 19 2.7 Endwell Endwell

456789123 Nancy 24 3.6 Binghamton NYC

then the result is a new relation:

SSN Name Age GPA Birthplace Residence

123456789 John 20 3.2 Vestal Vestal

345678912 Bill 19 2.7 Endwell Endwell

### SELECT Operation

• Commutativity of select:

• condition-1(condition-2(R))

= condition-2(condition-1(R))

= condition-1 and condition-2(R)

• city = “Irbid” AND GPA > 65 (STUDENT) or

• city = “Irbid” (GPA > 65 (STUDENT)

### PROJECT Operation

•  --- project (pi)

• Format:

• attribute-list(R),

• where attribute-list is a subset of all attributes in R

• Semantics:

• Returns all tuples of relation R but for each tuple, only values under attribute-list are returned

• Project removes duplicate tuples automatically

### Project (2)

• Selects certain columns from the table and discards other columns

•  <attribute-list> (<relation-name>)

• degree of resulting relation is equal to the number of attributes in the <attribute-list>

• The number of tuples of the result of project is less than or equal to the number of tuples in the original relation. (It removes the duplicates)

### Project (3)

Example: Find the name and GPA of all students.

Name,GPA(Students)

Students

SSNNameAgeGPAName GPA

123456789 John 20 3.2 John 3.2

234567891 Mary 18 2.9 Mary 2.9

345678912 John 19 3.2

Input Relation Output Relation

### Project (4)

• If attribute-list-1attribute-list-2,

then

• attribute-list-1(attribute-list-2(R))

= attribute-list-1(R)

• The Project operation is notcommutative

• Retrieve all student numbers and names who live in Amman.

• STNO, ST-Name(City = “Amman”(STUDENT))

### Project (5)

• In complex queries, it becomes necessary to store intermediate results, therefore we should know how to give names to relations and attributes

• Amman-students =city = “Amman” (STUDENT)

• Result =  STNO, ST-Name (Amman-Students)

• or

• Result(Number,Name) =  STNO, ST-Name (Amman-Students)

Renaming of attributes

### Select and Project

• Example:

• Find the name and GPA of all students who are 20 years oldor younger and whose birthplace and residence are the same

• Name, GPA(Age20 and Birthplace=Residence(Students))

### RENAME Operation

• --- rename(rho)

• Format: S(R)

• Semantics:

• Make a copy of relation R and name the copy as S

• S(R):

• Rename R only

• S(B1, B2, …, Bn)(R):

• Rename R and its attributes

• (B1, B2, …, Bn)(R):

• Rename attributes only

### Set Theoretic Operations

• UNION, INTERSECTION, DIFFERENCE.

• binary (applied to two relations at a time)

• To apply any of these operators to relations, relations should be union-compatible.

• Two relations R(A1, A2, …, An) and S(B1, B2, …, Bm) are said to be union-compatible if:

• they have the same degree (n = m) and

• dom(Ai) = dom(Bi) for 1  i n.

• Both R and S have the same number of attributes and the corresponding attributes have the same domain

### Union (1)

•  --- union

• Format:

• R1 R2

• Semantics:

• Returns all tuples that belong to either R1 or R2.

• Formally:

• R1 R2 = { t | t R1 or t  R2 }

• Condition of union:

• R1 and R2 must be union compatible.

• The union operator removes duplicate tuples automatically.

### Union (2)

Example:

R1 R2 R1  R2

A B C A B C A B C

a1 b1 c1 a0 b0 c0 a1 b1 c1

a2 b2 c2a1 b1 c1 a2 b2 c2

a3 b3 c3a2 b2 c2 a3 b3 c3

a4 b4 c4 a0 b0 c0

a4 b4 c4

### Set Difference (1)

• - set difference

• Format:

• R1 - R2

• Semantics:

• Returns all tuples that belong to R1 but not R2.

• Formally:

• R1 - R2 = { t | t  R1 and t R2 }

• Set difference also requires union compatibility between R1 and R2

### Set Difference (2)

Example:

R1 R2 R1 - R2

A B C A B C A B C

a1 b1 c1 a0 b0 c0 a3 b3 c3

a2 b2 c2a1 b1 c1

a3 b3 c3a2 b2 c2

a4 b4 c4

### INTERSECTION

• --- set intersection

• Format:

• R1  R2

• Semantics:

• Returns all tuples that belong to both R1 and R2.

• Formally:

• R1 R2 = { t | t R1 and t R2 }

• Derivation from existing operators:

• R1 R2 = R1 - (R1 - R2) = R2 - (R2 - R1)

### INTERSECTION

• Union and Intersection are commutative operations

• R  S = S  R and

• R  S = S  R

• Union and Intersection can be applied to any number of relations and both are associative:

• R  (S  Q) = (S  R)  Q

• R  (S  Q) = (S  R)  Q

• Difference operator is not commutative:

• R - S  S - R in general.

### Cartesian Product (1)

•  --- Cartesian product

• Format:

• R1  R2

• Semantics:

• Returns every tuple that can be formed by concatenating a tuple in R1 with a tuple in R2

• Binary operation, but the relations on which it is applied do not have to be union compatible

### Cartesian Product (2)

Example: R1 A B C R2 B D E

a1 b1 c1 b1 d1 e1

a2 b2 c2 b2 d2 e2

a3 b3 c3

R1 R2 AR1.BC R2.BD E

a1 b1 c1 b1 d1 e1

a1 b1 c1 b2 d2 e2

a2 b2 c2 b1 d1 e1

a2 b2 c2 b2 d2 e2

a3 b3 c3 b1 d1 e1

a3 b3 c3 b2 d2 e2

### Cartesian Product (3)

• If R1 and R2 have common attributes, then the full names of these attributes must be used

• Example:

• Use R.A instead of A

• To prevent identical attribute names from occurring in the same relation schema, R R is not allowed. However, R S(R) is allowed

• Commutativity: R1  R2 = R2 R1

### Cartesian Product (4)

• Given R(A1, A2, …, An) and S(B1, B2, …, Bm)

• RS = Q(A1, A2, …, An, B1, B2, …, Bm)

• degree of Q = n + m

• If R1 has N tuples and R2 has M tuples, then

• R1 R2 has N*M tuples

• Cartesian product is extremely expensive

• If R1 and R2 are both large, then each relation may need to be scanned many times to perform the Cartesian product.

• Writing out the result can be very expensive due to the large size of the result

### Example

Retrieve for each female employee a list of names of her dependents

FEMALE-EMPS=  Sex = “F” (EMPLOYEE)

EMP-NAMES = Fname,Lname,SSN(FEMALE-EMPS)

EMP-DEPENDENTS = EMP-NAMES  DEPENDENT

ACTUAL-DEP= SSN = ESSN (EMP-DEPENDENTS)

RESULT = Fname,Lname,Dependent-name(ACTUAL-DEP)

### Relational Algebra Example (1)

Example: Find the names of each employee and his/her manager

Employees: SSNNameAgeDept-Name

123456789 John 34 Sales

234567891 Mary 42 Service

345678912 Bill 39 null

Departments: Name Location Manager

Sales XYZ Bill

Inventory YZX Charles

Service ZXY Maria

### Relational Algebra Example (2)

• A relational algebra expression is:

Employees.Name, Departments.Manager

(Employees.Dept_Name = Departments.Name

(Employees  Departments))

• A simplified version (don't use full name when you don't have to):

Employees.Name, Manager

(Dept_Name =Departments.Name

(Employees Departments))

### Relational Algebra Example (3)

• Use assignment operator ( = ) to save the intermediate result into a temporary relation

• Example: The following expression

Employees.Name, Manager(Dept_Name = Departments.Name

(Employees  Departments) )

• is equivalent to the following series of expressions:

TEMP1= Employees Departments

TEMP2= Dept_Name=Departments.Name(TEMP1)

RESULT= Employees.Name, Manager (TEMP2)

### Relational Algebra Example (4)

Example: Find the names of all students who have the highest GPA

STUDENTS

SSN Name GPA

123456789 John 3.8

234567891 Maria 3.2

345678912 Mike 3.0

### Relational Algebra Example (5)

• Step 1: Find the GPAs that are not the highest

• TEMP1 = Students.GPA(Students.GPA <S2.GPA (Students S2(Students)))

Students S2(Students)

SSN Name GPA S2.SSN S2.Name S2.GPA

123456789 John 3.8 123456789 John 3.8

123456789 John 3.8 234567891 Maria 3.2

123456789 John 3.8 345678912 Mike 3.0

234567891 Maria 3.2 123456789 John 3.8

234567891 Maria 3.2 234567891 Maria 3.2

234567891 Maria 3.2 345678912 Mike 3.0

345678912 Mike 3.0 123456789 John 3.8

345678912 Mike 3.0 234567891 Maria 3.2

345678912 Mike 3.0 345678912 Mike 3.0

### Relational Algebra Example (6)

• Step 2: Find the highest GPA

• TEMP2 = GPA(Students) - TEMP1

• Step 3: Find the names of students who have the highest GPA

• RESULT = Name(Students.GPA = EMP2.GPA(StudentsTEMP2))

### Join (1)

• --- join

• Format:

• R1 join-condition R2

• Semantics:

• Returns all tuples in R1 R2 which satisfy the join condition

• Derivation from existing operators:

• R1 join-condition R2 = join-condition(R1 R2)

• Format of join condition:

• R1.A op R2.B

• R1.A1 op R2.B1and R1.A2 op R2.B2 . . .

• Tuples whose join attributes are NULL do not appear in the result

### Join (2)

Example: Find the names of all employees and theirdepartment locations

Employees: SSNNameAgeDept-Name

123456789 John 34 Sales

234567891 Mary 42 Service

345678912 Bill 39 null

Departments: NameLocationManager

Sales Binghamton Bill

Inventory Endicott Charles

Service Vestal Maria

### Join (3)

Employees.Name, Location(Employees

Dept-Name = Departments.Name Departments)

Result Name Location

John Binghamton

Mary Vestal

### Join (4)

Example: Find the names of all employees who earn more than his/her manager

Employees: SSNNameSalary Manager-SSN

123456789 John 34k 234567891

234567891 Bill 40k null

345678912 Mary 38k null

456789123 Mike 41k 345678912

Employees.Name (Employees Employees.Manager-SSN = EMP.SSN and Employees.Salary > EMP.Salary

EMP(Employees))

### Equijoin

• Definition: A join is called an equijoin if only equality operator is used in all join conditions.

R1 R2 R1 R1.B = R2.BR2

A B B C A R1.B R2.B C

a b b c a b b c

d b c d d b b c

b c a d b c c d

• Most joins in practice are equijoins.

### Natural Join (1)

• Definition:

• A join between R1 and R2 is a natural join if

• There is an equality comparison between every pair of identically named attributes from the two relations

• Among each pair of identically named attributes from the two relations, only one remains in the result

• Natural join is denoted by  with no join conditions explicitly specified

### Natural Join (2)

• Example:

• R1(A, B, C) R2(A, C, D)

• has attributes (A, B, C, D) in the result

• Questions:

• How to express natural join in terms of equijoin and other relational operator?

• R1  R2 = R1.A, B, R2.C, D

(R1R1.A=R2.Aand R1.C= R2.C R2)

### A Complete Set of Relational Algebra Operations

• The relational algebra is a set of expressions as defined below:

• A relation is an expression.

• If E1 and E2 are expressions, so are

P(E1), A(E1), S(E1), E1E2, E1-E2, E1E2

• That is, any expression that can be formed from base relations and the six relational operators is a relational algebra expression

### Aggregate Functions & Grouping

• Well know Aggregate Functions:

• SUM, AVERAGE, MAXIMUM, MINIMUM, COUNT

• Format:  (Script F)

• <grouping attributes>  <function list> (R)

• <grouping attributes> is a list of attributes in R

• <function list> is a list of (<function> <attribute>) pairs

• <function> is one of aggregate functions

• <attribute> is an attribute in R

### Examples

• Retrieve each department number, the number of employees in the department, and their average salary:

• DNOCOUNT( SSN), AVERAGE(SALARY) (EMPLOYEE)

DNOCOUNT_SSNAVERAGE_SALARY

1155000

4331000

5433250

### Examples

• Retrieve for each department number, the number of employees in the department, and their average salary:

R(DNO, NO_OF_EMP, AVERAGE_SAL)(DNO  COUNT( SSN),

AVERAGE(SALARY) (EMPLOYEE))

DNO NO_OF_EMP AVERAGE_SAL

1155000

4331000

5433250

### Examples

• If no grouping attributes are specified, the functions are applied to attribute values of all the tuples in the relation

• Retrieve the number of employees, and their average salary:

• COUNT( SSN), AVERAGE(SALARY) (EMPLOYEE))

COUNT_SSN AVERAGE_SALARY

8 35125

### Outerjoin (1)

R1 R2 R1  R2

A B C C D E A B C D E

a1 b1 c1c1 d1 e1 a1 b1 c1 d1 e1

a4 b3 c2 c6 d3 e2

• The second tuples of R1 and R2 are not present in the result (called dangling tuples)

• Applications exist that require to retain dangling tuples

### Outerjoin (2)

O --- outer join

Format: R1 O R2

Semantics: like join except

• it retains dangling tuples from both R1 and R2

• it uses null to fill out missing entries

R1 O R2

A B C D E

a1 b1 c1 d1 e1

a4 b3 c2 null null

null null c6 d3 e2

### Left Outerjoin and Right Outerjoin

• LO --- left outer join

• Format: R1 LO R2

• Semantics: like outerjoin but retains only dangling tuples of the relation on the left

• RO --- right outer join

• Format: R1 RO R2

• Semantics: like outerjoin but retains only dangling tuples of the relation on the right

### Left Outerjoin and Right Outerjoin (2)

R1 R2 R1  R2

A B C C D E A B C D E

a1 b1 c1c1 d1 e1 a1 b1 c1 d1 e1

a4 b3 c2 c6 d3 e2

### Left Outerjoin and Right Outerjoin (3)

R1 LO R2

A B C D E

a1 b1 c1 d1 e1

a4 b3 c2 null null

R1 RO R2

A B C D E

a1 b1 c1 d1 e1

null null c6 d3 e2

### Examples of Queries in Relational Algebra (1)

• Many relational algebra queries can be expressed using selection, projection and join operators by following steps:

• (1) Determine necessary relations to answer the query.

• If R1, ..., Rn are all the relations needed,

• P are all conditions and

• T are all (target) attributes to be output,

• then form the initial query:

• T(P(R1...  Rn))

### Relational Algebra Example (2)

• (2) If P contains a condition, say Ci, that involves only attributes in Ri

• replace Ri by Ci(Ri) and remove Ci from P

• (3) If P contains a condition, say C, that involves attributes from both Ri and Rj

• replace Ri Rj by RiC Rj (or a natural join) and remove C from P

### Relational Algebra Example (3)

• Consider the following database schema:

• Students(SSN, Name, GPA, Age, Dept-Name)

• Courses(Course#, Title, Dept-Name)

• Departments(Name, Location, Phone)

### Relational Algebra Example (5)

• Query: Find the SSNs and names of all students who are CS major and who take CIS328

• (1) Relations Students and Enrollment are needed

• T = {Students.SSN, Students.Name}

• P = {Students.Dept-Name = ‘CS’, Enrollment.Course# = ‘CIS328’, Students.SSN = Enrollment.SSN}

• The initial relational algebra query is:

• Students.SSN, Name

(Dept-Name=`CS’ and Course#=`CIS328’ and

Students.SSN= Enrollment.SSN

(Students  Enrollment))

### Relational Algebra Example (6)

• (2) Replace

• Students by

Dept-Name = `CS’(Students) and

• Enrollment by

Course#=`CIS328’(Enrollment)

• Remove the two conditions from the initial expression

### Relational Algebra Example (7)

• (3) Replace

• Students Enrollment by

StudentsEnrollment and

• remove Students.SSN = Enrollment.SSN from the initial expression.

• The final expression:

• Students.SSN,Name(Dept-Name = `CS’(Students) 

Course# = `CIS328’ (Enrollment))

### Relational Algebra Example (8)

• Query: Find the SSN and name of each student who is CS major together with the titles of the courses taken by the student

• Students.SSN, Name, Title

(Students.Dept-Name = `CS’ and Students.SSN = Enrollment.SSN and Enrollment.Course# = Courses.Course#

(StudentsEnrollment Courses))

### Relational Algebra Example (9)

• = Students.SSN, Name, Title

(((Students.Dept-Name = `CS’(Students))  Enrollment)  Courses)

• = Students.SSN, Name, Title ((Students.Dept-Name = `CS’ (Students))  (Enrollment  Courses))

### Relational Algebra Summary (1)

Relational algebra operators:

• Fundamental operators:

C(R), A(R), S(R), R1  R2,

R1 - R2, R1 R2

R1  R2, R1 C R2, R1 R2

### Relational Algebra Summary (2)

Some identities:

• C1(C2(R)) = C2(C1(R)) = C1 andC2(R)

• L1(L2(R)) = L1(R) , if L1  L2

• R1 R2 = R2  R1

• R1  (R2  R3) = (R1 R2)  R3

• R1  R2 = R2  R1

• R1  (R2  R3) = (R1  R2)  R3