- 44 Views
- Uploaded on
- Presentation posted in: General

Database Systems Chapter 6 The Relational Algebra

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Database SystemsChapter 6The Relational Algebra

- A relation schema is used to describe a relation
- Denoted by R(A1, A2, A3, …, An), where
- R: Relation schema name
- A1, …, An: attributes of R

- The degree of a relation is the number of attributes in a relation schema
- STUDENT(Name, SSN, HomePhone, Address, OfficePhone, Age, GPA)
- Degree(STUDENT) = 7

- A relation (or relation instance) r of the relation schema R(A1, A2, …, An), denoted by r(R), is a set of n-tuples
- r = {t1, t2, …, tm)

- each n-tuple ti is an ordered list of n values,
- ti = <v1, v2, …, vn) where
- each value vi, 1 i n, is an element of dom(Ai) or a special null value

- Is a collection of operators that are used to manipulate entire relations. The result of each operation is a new relation which can be further manipulated.

- --- Select (sigma)
- Format: selection-condition(R)
- Semantics:
- returns all tuples of relation R that satisfy the selection-condition

- Select operation is unary. It applies to a single relation
- is used to select a subset of the tuples in a relation that satisfy a selection condition.

- (a) A op v: A is an attribute, op is an operator (=, , <, , >, ), and v is a constant.
- Age 20, Name = `Bill'

- (b) A op B: A and B are two attributes in R.
- Persons(SSN, Name, Birthplace, Residence)
- Birthplace = Residence

- (c) Combinations of (a) and (b) connected by and, or or not.
- Age 20 and Birthplace = Residence

- Example: Find all students who are 20 years old or younger, and whose birthplace is the same as his/her residence.
- Age 20 and Birthplace = Residence(Students)

If the current Students is:

SSN Name Age GPA Birthplace Residence

123456789 John 20 3.2 Vestal Vestal

234567891 Mary 18 2.9 Binghamton Vestal

345678912 Bill 19 2.7 Endwell Endwell

456789123 Nancy 24 3.6 Binghamton NYC

then the result is a new relation:

SSN Name Age GPA Birthplace Residence

123456789 John 20 3.2 Vestal Vestal

345678912 Bill 19 2.7 Endwell Endwell

- Commutativity of select:
- condition-1(condition-2(R))
= condition-2(condition-1(R))

= condition-1 and condition-2(R)

- condition-1(condition-2(R))
- city = “Irbid” AND GPA > 65 (STUDENT) or
- city = “Irbid” (GPA > 65 (STUDENT)

- --- project (pi)
- Format:
- attribute-list(R),
- where attribute-list is a subset of all attributes in R

- Semantics:
- Returns all tuples of relation R but for each tuple, only values under attribute-list are returned

- Format:
- Project removes duplicate tuples automatically

- Selects certain columns from the table and discards other columns
- <attribute-list> (<relation-name>)

- degree of resulting relation is equal to the number of attributes in the <attribute-list>
- The number of tuples of the result of project is less than or equal to the number of tuples in the original relation. (It removes the duplicates)

Example: Find the name and GPA of all students.

Name,GPA(Students)

Students

SSNNameAgeGPAName GPA

123456789 John 20 3.2 John 3.2

234567891 Mary 18 2.9 Mary 2.9

345678912 John 19 3.2

Input Relation Output Relation

- If attribute-list-1attribute-list-2,
then

- attribute-list-1(attribute-list-2(R))
= attribute-list-1(R)

- attribute-list-1(attribute-list-2(R))
- The Project operation is notcommutative
- Retrieve all student numbers and names who live in Amman.
- STNO, ST-Name(City = “Amman”(STUDENT))

- In complex queries, it becomes necessary to store intermediate results, therefore we should know how to give names to relations and attributes
- Amman-students =city = “Amman” (STUDENT)
- Result = STNO, ST-Name (Amman-Students)
- or
- Result(Number,Name) = STNO, ST-Name (Amman-Students)

Renaming of attributes

- Example:
- Find the name and GPA of all students who are 20 years oldor younger and whose birthplace and residence are the same
- Name, GPA(Age20 and Birthplace=Residence(Students))

- --- rename(rho)
- Format: S(R)
- Semantics:
- Make a copy of relation R and name the copy as S

- S(R):
- Rename R only

- S(B1, B2, …, Bn)(R):
- Rename R and its attributes

- (B1, B2, …, Bn)(R):
- Rename attributes only

- UNION, INTERSECTION, DIFFERENCE.
- binary (applied to two relations at a time)

- To apply any of these operators to relations, relations should be union-compatible.
- Two relations R(A1, A2, …, An) and S(B1, B2, …, Bm) are said to be union-compatible if:
- they have the same degree (n = m) and
- dom(Ai) = dom(Bi) for 1 i n.

- Both R and S have the same number of attributes and the corresponding attributes have the same domain

- --- union
- Format:
- R1 R2

- Semantics:
- Returns all tuples that belong to either R1 or R2.

- Formally:
- R1 R2 = { t | t R1 or t R2 }

- Format:
- Condition of union:
- R1 and R2 must be union compatible.

- The union operator removes duplicate tuples automatically.

Example:

R1 R2 R1 R2

A B C A B C A B C

a1 b1 c1 a0 b0 c0 a1 b1 c1

a2 b2 c2a1 b1 c1 a2 b2 c2

a3 b3 c3a2 b2 c2 a3 b3 c3

a4 b4 c4 a0 b0 c0

a4 b4 c4

- - set difference
- Format:
- R1 - R2

- Semantics:
- Returns all tuples that belong to R1 but not R2.

- Formally:
- R1 - R2 = { t | t R1 and t R2 }

- Format:
- Set difference also requires union compatibility between R1 and R2

Example:

R1 R2 R1 - R2

A B C A B C A B C

a1 b1 c1 a0 b0 c0 a3 b3 c3

a2 b2 c2a1 b1 c1

a3 b3 c3a2 b2 c2

a4 b4 c4

- --- set intersection
- Format:
- R1 R2

- Semantics:
- Returns all tuples that belong to both R1 and R2.

- Formally:
- R1 R2 = { t | t R1 and t R2 }

- Format:
- Derivation from existing operators:
- R1 R2 = R1 - (R1 - R2) = R2 - (R2 - R1)

- Union and Intersection are commutative operations
- R S = S R and
- R S = S R

- Union and Intersection can be applied to any number of relations and both are associative:
- R (S Q) = (S R) Q
- R (S Q) = (S R) Q

- Difference operator is not commutative:
- R - S S - R in general.

- --- Cartesian product
- Format:
- R1 R2

- Semantics:
- Returns every tuple that can be formed by concatenating a tuple in R1 with a tuple in R2

- Format:
- Binary operation, but the relations on which it is applied do not have to be union compatible

Example: R1 A B C R2 B D E

a1 b1 c1 b1 d1 e1

a2 b2 c2 b2 d2 e2

a3 b3 c3

R1 R2 AR1.BC R2.BD E

a1 b1 c1 b1 d1 e1

a1 b1 c1 b2 d2 e2

a2 b2 c2 b1 d1 e1

a2 b2 c2 b2 d2 e2

a3 b3 c3 b1 d1 e1

a3 b3 c3 b2 d2 e2

- If R1 and R2 have common attributes, then the full names of these attributes must be used
- Example:
- Use R.A instead of A

- To prevent identical attribute names from occurring in the same relation schema, R R is not allowed. However, R S(R) is allowed
- Commutativity: R1 R2 = R2 R1

- Given R(A1, A2, …, An) and S(B1, B2, …, Bm)
- RS = Q(A1, A2, …, An, B1, B2, …, Bm)
- degree of Q = n + m

- If R1 has N tuples and R2 has M tuples, then
- R1 R2 has N*M tuples

- Cartesian product is extremely expensive
- If R1 and R2 are both large, then each relation may need to be scanned many times to perform the Cartesian product.
- Writing out the result can be very expensive due to the large size of the result

Retrieve for each female employee a list of names of her dependents

FEMALE-EMPS= Sex = “F” (EMPLOYEE)

EMP-NAMES = Fname,Lname,SSN(FEMALE-EMPS)

EMP-DEPENDENTS = EMP-NAMES DEPENDENT

ACTUAL-DEP= SSN = ESSN (EMP-DEPENDENTS)

RESULT = Fname,Lname,Dependent-name(ACTUAL-DEP)

Example: Find the names of each employee and his/her manager

Employees: SSNNameAgeDept-Name

123456789 John 34 Sales

234567891 Mary 42 Service

345678912 Bill 39 null

Departments: Name Location Manager

Sales XYZ Bill

Inventory YZX Charles

Service ZXY Maria

- A relational algebra expression is:
Employees.Name, Departments.Manager

(Employees.Dept_Name = Departments.Name

(Employees Departments))

- A simplified version (don't use full name when you don't have to):
Employees.Name, Manager

(Dept_Name =Departments.Name

(Employees Departments))

- Use assignment operator ( = ) to save the intermediate result into a temporary relation
- Example: The following expression
Employees.Name, Manager(Dept_Name = Departments.Name

(Employees Departments) )

- is equivalent to the following series of expressions:
TEMP1= Employees Departments

TEMP2= Dept_Name=Departments.Name(TEMP1)

RESULT= Employees.Name, Manager (TEMP2)

Example: Find the names of all students who have the highest GPA

STUDENTS

SSN Name GPA

123456789 John 3.8

234567891 Maria 3.2

345678912 Mike 3.0

- Step 1: Find the GPAs that are not the highest
- TEMP1 = Students.GPA(Students.GPA <S2.GPA (Students S2(Students)))
Students S2(Students)

SSN Name GPA S2.SSN S2.Name S2.GPA

123456789 John 3.8 123456789 John 3.8

123456789 John 3.8 234567891 Maria 3.2

123456789 John 3.8 345678912 Mike 3.0

234567891 Maria 3.2 123456789 John 3.8

234567891 Maria 3.2 234567891 Maria 3.2

234567891 Maria 3.2 345678912 Mike 3.0

345678912 Mike 3.0 123456789 John 3.8

345678912 Mike 3.0 234567891 Maria 3.2

345678912 Mike 3.0 345678912 Mike 3.0

- TEMP1 = Students.GPA(Students.GPA <S2.GPA (Students S2(Students)))

- Step 2: Find the highest GPA
- TEMP2 = GPA(Students) - TEMP1

- Step 3: Find the names of students who have the highest GPA
- RESULT = Name(Students.GPA = EMP2.GPA(StudentsTEMP2))

- --- join
- Format:
- R1 join-condition R2

- Semantics:
- Returns all tuples in R1 R2 which satisfy the join condition

- Format:
- Derivation from existing operators:
- R1 join-condition R2 = join-condition(R1 R2)

- Format of join condition:
- R1.A op R2.B
- R1.A1 op R2.B1and R1.A2 op R2.B2 . . .

- Tuples whose join attributes are NULL do not appear in the result

Example: Find the names of all employees and theirdepartment locations

Employees: SSNNameAgeDept-Name

123456789 John 34 Sales

234567891 Mary 42 Service

345678912 Bill 39 null

Departments: NameLocationManager

Sales Binghamton Bill

Inventory Endicott Charles

Service Vestal Maria

Employees.Name, Location(Employees

Dept-Name = Departments.Name Departments)

Result Name Location

John Binghamton

Mary Vestal

Example: Find the names of all employees who earn more than his/her manager

Employees: SSNNameSalary Manager-SSN

123456789 John 34k 234567891

234567891 Bill 40k null

345678912 Mary 38k null

456789123 Mike 41k 345678912

Employees.Name (Employees Employees.Manager-SSN = EMP.SSN and Employees.Salary > EMP.Salary

EMP(Employees))

- Definition: A join is called an equijoin if only equality operator is used in all join conditions.
R1 R2 R1 R1.B = R2.BR2

A B B C A R1.B R2.B C

a b b c a b b c

d b c d d b b c

b c a d b c c d

- Most joins in practice are equijoins.

- Definition:
- A join between R1 and R2 is a natural join if
- There is an equality comparison between every pair of identically named attributes from the two relations

- Among each pair of identically named attributes from the two relations, only one remains in the result
- Natural join is denoted by with no join conditions explicitly specified

- Example:
- R1(A, B, C) R2(A, C, D)
- has attributes (A, B, C, D) in the result

- Questions:
- How to express natural join in terms of equijoin and other relational operator?
- R1 R2 = R1.A, B, R2.C, D
(R1R1.A=R2.Aand R1.C= R2.C R2)

- R1 R2 = R1.A, B, R2.C, D

- The relational algebra is a set of expressions as defined below:
- A relation is an expression.
- If E1 and E2 are expressions, so are
P(E1), A(E1), S(E1), E1E2, E1-E2, E1E2

- That is, any expression that can be formed from base relations and the six relational operators is a relational algebra expression

- Well know Aggregate Functions:
- SUM, AVERAGE, MAXIMUM, MINIMUM, COUNT

- Format: (Script F)
- <grouping attributes> <function list> (R)
- <grouping attributes> is a list of attributes in R
- <function list> is a list of (<function> <attribute>) pairs
- <function> is one of aggregate functions
- <attribute> is an attribute in R

- Retrieve each department number, the number of employees in the department, and their average salary:
- DNOCOUNT( SSN), AVERAGE(SALARY) (EMPLOYEE)
DNOCOUNT_SSNAVERAGE_SALARY

1155000

4331000

5433250

- DNOCOUNT( SSN), AVERAGE(SALARY) (EMPLOYEE)

- Retrieve for each department number, the number of employees in the department, and their average salary:
R(DNO, NO_OF_EMP, AVERAGE_SAL)(DNO COUNT( SSN),

AVERAGE(SALARY) (EMPLOYEE))

DNO NO_OF_EMP AVERAGE_SAL

1155000

4331000

5433250

- If no grouping attributes are specified, the functions are applied to attribute values of all the tuples in the relation
- Retrieve the number of employees, and their average salary:
- COUNT( SSN), AVERAGE(SALARY) (EMPLOYEE))
COUNT_SSN AVERAGE_SALARY

8 35125

- COUNT( SSN), AVERAGE(SALARY) (EMPLOYEE))

R1 R2 R1 R2

A B C C D E A B C D E

a1 b1 c1c1 d1 e1 a1 b1 c1 d1 e1

a4 b3 c2 c6 d3 e2

- The second tuples of R1 and R2 are not present in the result (called dangling tuples)
- Applications exist that require to retain dangling tuples

O --- outer join

Format: R1 O R2

Semantics: like join except

- it retains dangling tuples from both R1 and R2
- it uses null to fill out missing entries
R1 O R2

A B C D E

a1 b1 c1 d1 e1

a4 b3 c2 null null

null null c6 d3 e2

- LO --- left outer join
- Format: R1 LO R2
- Semantics: like outerjoin but retains only dangling tuples of the relation on the left

- RO --- right outer join
- Format: R1 RO R2
- Semantics: like outerjoin but retains only dangling tuples of the relation on the right

R1 R2 R1 R2

A B C C D E A B C D E

a1 b1 c1c1 d1 e1 a1 b1 c1 d1 e1

a4 b3 c2 c6 d3 e2

R1 LO R2

A B C D E

a1 b1 c1 d1 e1

a4 b3 c2 null null

R1 RO R2

A B C D E

a1 b1 c1 d1 e1

null null c6 d3 e2

- Many relational algebra queries can be expressed using selection, projection and join operators by following steps:
- (1) Determine necessary relations to answer the query.
- If R1, ..., Rn are all the relations needed,
- P are all conditions and
- T are all (target) attributes to be output,
- then form the initial query:
- T(P(R1... Rn))

- (2) If P contains a condition, say Ci, that involves only attributes in Ri
- replace Ri by Ci(Ri) and remove Ci from P

- (3) If P contains a condition, say C, that involves attributes from both Ri and Rj
- replace Ri Rj by RiC Rj (or a natural join) and remove C from P

- Consider the following database schema:
- Students(SSN, Name, GPA, Age, Dept-Name)
- Enrollment(SSN, Course#, Grade)
- Courses(Course#, Title, Dept-Name)
- Departments(Name, Location, Phone)

- Query: Find the SSNs and names of all students who are CS major and who take CIS328
- (1) Relations Students and Enrollment are needed
- T = {Students.SSN, Students.Name}
- P = {Students.Dept-Name = ‘CS’, Enrollment.Course# = ‘CIS328’, Students.SSN = Enrollment.SSN}

- The initial relational algebra query is:
- Students.SSN, Name
(Dept-Name=`CS’ and Course#=`CIS328’ and

Students.SSN= Enrollment.SSN

(Students Enrollment))

- Students.SSN, Name

- (2) Replace
- Students by
Dept-Name = `CS’(Students) and

- Enrollment by
Course#=`CIS328’(Enrollment)

- Remove the two conditions from the initial expression

- Students by

- (3) Replace
- Students Enrollment by
StudentsEnrollment and

- remove Students.SSN = Enrollment.SSN from the initial expression.

- Students Enrollment by
- The final expression:
- Students.SSN,Name(Dept-Name = `CS’(Students)
Course# = `CIS328’ (Enrollment))

- Students.SSN,Name(Dept-Name = `CS’(Students)

- Query: Find the SSN and name of each student who is CS major together with the titles of the courses taken by the student
- Students.SSN, Name, Title
(Students.Dept-Name = `CS’ and Students.SSN = Enrollment.SSN and Enrollment.Course# = Courses.Course#

(StudentsEnrollment Courses))

- = Students.SSN, Name, Title
(((Students.Dept-Name = `CS’(Students)) Enrollment) Courses)

- = Students.SSN, Name, Title ((Students.Dept-Name = `CS’ (Students)) (Enrollment Courses))

Relational algebra operators:

- Fundamental operators:
C(R), A(R), S(R), R1 R2,

R1 - R2, R1 R2

- Other traditional operators:
R1 R2, R1 C R2, R1 R2

Some identities:

- C1(C2(R)) = C2(C1(R)) = C1 andC2(R)
- L1(L2(R)) = L1(R) , if L1 L2
- R1 R2 = R2 R1
- R1 (R2 R3) = (R1 R2) R3
- R1 R2 = R2 R1
- R1 (R2 R3) = (R1 R2) R3