- 81 Views
- Uploaded on
- Presentation posted in: General

Database Systems I Relational Algebra

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Database Systems I Relational Algebra

- Query languages (QL): Allow manipulation and retrieval of data from a database.
- Relational model supports simple, powerful query languages:
- Strong formal foundation based on logic.
- High level, abstract formulation of queries.
- Easy to program.
- Allows the DBS to do much optimization.

- DBS can choose, e.g., most efficient sorting algorithm or the order of basic operations.

- Query Languages != programming languages!
- QLs not expected to be “Turing complete”.
- Able to answer any computable problem given enough storage space and time

- QLs not intended to be used for complex calculations.
- Applications on top of database can do that

- QLs support easy, efficient access to large data sets.

- QLs not expected to be “Turing complete”.
- E.g., in a QL cannot
- determine whether the number of tuples of a table is even or odd,
- create a visualization of the results of a query,
- ask the user for additional input.

- Two mathematical query languages form the basis for “real” languages (e.g. SQL), and for implementation:
- Relational Algebra (RA)
- More procedural, very useful for representing execution plans, relatively close to SQL.
- Composed of a collection of operators
- A step-by-step procedure for computing the answer

- Relational Calculus (RC)
- Lets users describe what they want, rather than how to compute it. (Non-procedural, declarative.)
- Describes the answer, not the steps.

- Relational Algebra (RA)
- Understanding these formal query languages is important for understanding SQL and query processing.

- A query is applied to relation instances, and the result of a query is also a relation instance.
- Inputs and Outputs of Queries are relations
- Query evaluated on instances of input relations
- Different instance (DB?) as input = different answer

- Schemasof input relations for a query are fixed (but query will run regardless of instance!)
- The schema for the resultof a given query is also fixed! Determined by definition of input relations and query language constructs.

- Positional vs. named-attribute notation:
- Positional notation
- Ex: function(1,2,3)
- easier for formal definitions

- Named-attribute notation
- Ex: function(var1=1, var2=2, var3=3)
- more readable

- Positional notation
- Advantages/disadvantages of one over the other?

R1

S1

S2

- “Sailors” and “Reserves” relations for our examples.
- 2 instances of “Sailors” S1 & S2

- We’ll use positional or named attribute notation
- Assume that names of attributes in query results are `inherited’ from names of attributes in query input relations

Relational Algebra

- An algebra consists of operators and operands. Operands can be either variables or constants.
- In the algebra of arithmetic:
- atomic operands are variables such as x or y and constants such as 15.
- Operators are the usual arithmetic operators such as +, -, *.

- Expressions are formed by applying operators to atomic operands or other expressions.
- For example,
15x + 15

(x + 15) * y

- Algebraic expressions can be re-ordered according to commutativity or associativity laws without changing their resulting value.
- E.g., 15 + 20 = 20 + 15(x * y) * z = x * (y * z)
- Parentheses group operators and define precedence of operators, e.g.(x + 15) * y
x + (15 *y)

- Describes step-by-step procedure for computing the query
- Just like algebra in math.

- In relational algebra, operands are relations / tables, and an expression evaluates to a relation / set of tuples.
- The relational algebra operators are
- set operations,
- operations removing rows (selection) or columns (projection) from a relation,
- operations combining two relations into a new one (Cartesian product, join),
- a renaming operation, which changes the name of the relation or of its attributes.

- Every operator takes one or two relation instances
- A relational algebra expression is recursively defined to be a relation
- A combination of relations is a relation
- Result is also a relation
- Can apply operator to
- Relation from database
- Relation as a result of another operator

- Basic operations
- Selection ( ) Selects a subset of rows from relation.
- Projection ( ) Deletes unwanted columns from relation.
- Cartesian product( ) Combinetwo relations.
- Set-difference ( ) Tuples in relation 1, but not in relation 2.
- Union( ) Tuples in relation 1 or in relation 2.

- Additional operations:
- Intersection, join, division.
- Renaming of relations / attributes.
- Not essential, can be implemented using the five basic operations.
- But (very!) useful.

- Since each operation returns a relation, operationscan be composed, i.e. output of one operation can be input of the next operation.

Renames relations / attributes, without changing the relation instance.

relation R is renamed to S, attributes are renamed A1, . . ., An

Rename only some attributes

using the positional notation to reference attributes

No renaming of attributes, just the relation

- One input relation.
- Deletes attributes that are not in projection list.
- Schema of result contains exactly the attributes in the projection list, with the same names that they had in the (only) input relation.
- Projection operation has to eliminate duplicates, since relations are sets.
- Duplicate elimination is expensive.
- Commercial DBMS typically don’t do duplicate elimination unless the user explicitly asks for it.

S2

- Similar in concept to VIEWs

- Other fields are projected out

- One input relation.
- Selects all tuples that satisfy selection condition.
- No duplicates in result!
- Schema of result identical to schema of (only) input relation.
- Selection conditions:
- simple conditions comparing attribute values (variables) and / or constants or
- complex conditions that combine simple conditions using logical connectives AND and OR.

S2

S2

- But the result of an expression is another relation, we can substitute an expression wherever a relation is expected

- All of these set operations take two input relations, which must be union-compatible:
- Same sets of attributes.
- Corresponding attributes have same type.

- What is the schema of result?

S2

S1

- Concatenates S1 and S2
- Result contains ALL tuples that occur in either S1 or S2
- Schemas must be identical
- If they have the same number of fields
- Fields have same domains

- Schemas must be identical

SELECT * FROM S1

UNION

SELECT * FROM S2

S2

S1

- Result contains ALL tuples that occur in both S1 or S2
- Schemas must be identical

SELECT * FROM S1

INTERSECT

SELECT * FROM S2

S2

S1

- Result contains ALL tuples that occur in S1 but not in S2
- Schemas must be identical

SELECT * FROM S1

MINUS

SELECT * FROM S2

- Also referred to as cross-product or product.
- Two input relations.
- Each tuple of the one relation is paired with each tuple of the other relation.
- Result schema has one attribute per attribute of both input relations, with attribute names `inherited’ if possible.
- In the result, there may be two attributes with the same name, e.g. both S1 and R1 have an attribute called sid.
- What to do??

S2

S1

- Result contains all fields of S1 followed by fields of S2
- Contains one tuple <s1,s2> for each pair of tuples in S1 and S2.
- Schemas can be different
- If S1 and S2 contains fields of the same name, there is a naming conflict. Rename one or both fields.

SELECT * FROM S1, S2

- Field names in conflict become unnamed

R1

S1

- In the result, there may be two attributes with the same name, e.g. both S1 and R1 have an attribute called sid.
- Then, apply the renaming operation, e.g.
- The field names in the output relation are the same fields, but some are renamed
- Field 1 is renamed to sid1, field 5 is renamed to sid5
- For an expression to be well defined, no 2+ fields must have same name

- Similar to Cartesian product with same result schema.
- Cartesian Product has no conditions
- Join has conditions

- Join can be defined as a Cartesian Product, followed by selections and projections
- Join however is much more efficient
- Why?

S1

R1

- Condition Join
- Each tuple of the one relation is paired with each tuple of the other relation if the two tuples satisfy the join condition.
- Condition c refers to attributes of both R and S.

Equi-Join: A special case of Conditional Join where the condition c contains only equalities.

Result schema similar to Cartesian product, but only one copy of attributes for which equality is specified.

R1.sid is redundant

R1.sid is dropped by a applying a projection

Natural Join: Equi-join on all common attributes.

- Not supported as a primitive operation, but useful for expressing queries like: Find sailors who have reserved all boats.
- Let A have 2 attributes, x and y; B have only attribute y:
- A/B =
- i.e., A/B contains all x tuples (sailors) such that for every y tuple (boat) in B, there is an xy tuple (reservation) in A.

- In general, x and y can be any lists of attributes; y is the list of attributes in B, and x y is the list of attributes of A.

B1

B2

B3

A/B1

A/B2

A/B3

A

all disqualified x values

- Division is not an essential operation; can be implemented using the five basic operations.
- Also true of joins, but joins are so common that systems implement joins specially.

- Idea:For A/B, compute all x values in A that are not ‘disqualified’ by some y value in B.
- x value in A is disqualified if by attaching y value from B, we obtain an xy tuple that is not in A.

- Disqualified x values:

A/B:

- Division is not an essential operation; can be implemented using the five basic operations.
SELECT name

FROM Student

WHERE NOT EXISTS (

SELECT CID

FROM Course

WHERE CID NOT IN (

SELECT CID

FROM Register

WHERE Student.SID= Register.SID))

If such courses don’t exist (NOT EXISTS), then student must’ve taken all courses

returns those courses this student didn’t take

- Find names of sailors who’ve reservedboat #103.
- Solution 1:
- Solution 2:
- Solution 3:
- Which is most efficient? Why?

- Find names of sailors who’ve reserved a red boat.
- Information about boat color only available in Boats; so need an extra join:
- A more efficient solution:
- A query optimizer can find the second solution given the first one.

- Find sailors who’ve reserved a red or a green boat.
- Can identify all red or green boats, then find sailors who’ve reserved one of these boats:
- Can also define Tempboats using union! (How?)
- What happens if OR is replaced by AND in this query?

- Find sailors who’ve reserved a red and a green boat.
- Must identify sailors who’ve reserved red boats, sailors who’ve reserved green boats, then find the intersection (note that sid is a key for Sailors):

- Find the names of sailors who’ve reserved all boats.
- Uses division; schemas of the input relations must be carefully chosen:
- To find sailors who’ve reserved all ‘Interlake’ boats:
- Book has lots of examples.

- The relational model has formal query languages that are easy to use and allow efficient optimization by the DBS.
- Relational algebra (RA) is more procedural; used as internal representation for SQL query evaluation plans.
- Five basic RA operations: selection, projection, Cartesian product, union, set-difference.
- Additional operations as shorthand for important cases: intersection, join, division.
- These operations can be implemented using the basic operations.