Database systems i relational algebra
This presentation is the property of its rightful owner.
Sponsored Links
1 / 41

Database Systems I Relational Algebra PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on
  • Presentation posted in: General

Database Systems I Relational Algebra. Relational Query Languages. Query languages (QL): Allow manipulation and retrieval of data from a database. Relational model supports simple, powerful query languages: Strong formal foundation based on logic.

Download Presentation

Database Systems I Relational Algebra

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Database systems i relational algebra

Database Systems I Relational Algebra


Relational query languages

Relational Query Languages

  • Query languages (QL): Allow manipulation and retrieval of data from a database.

  • Relational model supports simple, powerful query languages:

    • Strong formal foundation based on logic.

    • High level, abstract formulation of queries.

    • Easy to program.

    • Allows the DBS to do much optimization.

  • DBS can choose, e.g., most efficient sorting algorithm or the order of basic operations.


Relational query languages1

Relational Query Languages

  • Query Languages != programming languages!

    • QLs not expected to be “Turing complete”.

      • Able to answer any computable problem given enough storage space and time

    • QLs not intended to be used for complex calculations.

      • Applications on top of database can do that

    • QLs support easy, efficient access to large data sets.

  • E.g., in a QL cannot

    • determine whether the number of tuples of a table is even or odd,

    • create a visualization of the results of a query,

    • ask the user for additional input.


Formal query languages

Formal Query Languages

  • Two mathematical query languages form the basis for “real” languages (e.g. SQL), and for implementation:

    • Relational Algebra (RA)

      • More procedural, very useful for representing execution plans, relatively close to SQL.

      • Composed of a collection of operators

      • A step-by-step procedure for computing the answer

    • Relational Calculus (RC)

      • Lets users describe what they want, rather than how to compute it. (Non-procedural, declarative.)

      • Describes the answer, not the steps.

  • Understanding these formal query languages is important for understanding SQL and query processing.


Preliminaries

Preliminaries

  • A query is applied to relation instances, and the result of a query is also a relation instance.

    • Inputs and Outputs of Queries are relations

    • Query evaluated on instances of input relations

      • Different instance (DB?) as input = different answer

    • Schemasof input relations for a query are fixed (but query will run regardless of instance!)

    • The schema for the resultof a given query is also fixed! Determined by definition of input relations and query language constructs.


Preliminaries1

Preliminaries

  • Positional vs. named-attribute notation:

    • Positional notation

      • Ex: function(1,2,3)

      • easier for formal definitions

    • Named-attribute notation

      • Ex: function(var1=1, var2=2, var3=3)

      • more readable

  • Advantages/disadvantages of one over the other?


Example instances

R1

S1

S2

Example Instances

  • “Sailors” and “Reserves” relations for our examples.

    • 2 instances of “Sailors” S1 & S2

  • We’ll use positional or named attribute notation

  • Assume that names of attributes in query results are `inherited’ from names of attributes in query input relations


Relational algebra

Relational Algebra


Relational algebra1

Relational Algebra

  • An algebra consists of operators and operands. Operands can be either variables or constants.

  • In the algebra of arithmetic:

    • atomic operands are variables such as x or y and constants such as 15.

    • Operators are the usual arithmetic operators such as +, -, *.

  • Expressions are formed by applying operators to atomic operands or other expressions.

  • For example,

    15x + 15

    (x + 15) * y


Relational algebra2

Relational Algebra

  • Algebraic expressions can be re-ordered according to commutativity or associativity laws without changing their resulting value.

  • E.g., 15 + 20 = 20 + 15(x * y) * z = x * (y * z)

  • Parentheses group operators and define precedence of operators, e.g.(x + 15) * y

    x + (15 *y)

  • Describes step-by-step procedure for computing the query

    • Just like algebra in math.


Relational algebra3

Relational Algebra

  • In relational algebra, operands are relations / tables, and an expression evaluates to a relation / set of tuples.

  • The relational algebra operators are

    • set operations,

    • operations removing rows (selection) or columns (projection) from a relation,

    • operations combining two relations into a new one (Cartesian product, join),

    • a renaming operation, which changes the name of the relation or of its attributes.


Relational algebra4

Relational Algebra

  • Every operator takes one or two relation instances

  • A relational algebra expression is recursively defined to be a relation

    • A combination of relations is a relation

    • Result is also a relation

    • Can apply operator to

      • Relation from database

      • Relation as a result of another operator


Relational algebra operations

Relational Algebra Operations

  • Basic operations

    • Selection ( ) Selects a subset of rows from relation.

    • Projection ( ) Deletes unwanted columns from relation.

    • Cartesian product( ) Combinetwo relations.

    • Set-difference ( ) Tuples in relation 1, but not in relation 2.

    • Union( ) Tuples in relation 1 or in relation 2.


Relational algebra operations1

Relational Algebra Operations

  • Additional operations:

    • Intersection, join, division.

    • Renaming of relations / attributes.

      • Not essential, can be implemented using the five basic operations.

      • But (very!) useful.

  • Since each operation returns a relation, operationscan be composed, i.e. output of one operation can be input of the next operation.


Renaming

Renames relations / attributes, without changing the relation instance.

relation R is renamed to S, attributes are renamed A1, . . ., An

Rename only some attributes

using the positional notation to reference attributes

No renaming of attributes, just the relation

Renaming


Projection

Projection

  • One input relation.

  • Deletes attributes that are not in projection list.

  • Schema of result contains exactly the attributes in the projection list, with the same names that they had in the (only) input relation.

  • Projection operation has to eliminate duplicates, since relations are sets.

    • Duplicate elimination is expensive.

    • Commercial DBMS typically don’t do duplicate elimination unless the user explicitly asks for it.


Projection1

S2

Projection

  • Similar in concept to VIEWs

  • Other fields are projected out


Selection

Selection

  • One input relation.

  • Selects all tuples that satisfy selection condition.

  • No duplicates in result!

  • Schema of result identical to schema of (only) input relation.

  • Selection conditions:

    • simple conditions comparing attribute values (variables) and / or constants or

    • complex conditions that combine simple conditions using logical connectives AND and OR.


Selection1

S2

Selection


Selection2

S2

Selection

  • But the result of an expression is another relation, we can substitute an expression wherever a relation is expected


Union intersection set difference

Union, Intersection, Set-Difference

  • All of these set operations take two input relations, which must be union-compatible:

    • Same sets of attributes.

    • Corresponding attributes have same type.

  • What is the schema of result?


Union

Union

S2

S1

  • Concatenates S1 and S2

  • Result contains ALL tuples that occur in either S1 or S2

    • Schemas must be identical

      • If they have the same number of fields

      • Fields have same domains

SELECT * FROM S1

UNION

SELECT * FROM S2


Intersection

Intersection

S2

S1

  • Result contains ALL tuples that occur in both S1 or S2

    • Schemas must be identical

SELECT * FROM S1

INTERSECT

SELECT * FROM S2


Set difference

Set-Difference

S2

S1

  • Result contains ALL tuples that occur in S1 but not in S2

    • Schemas must be identical

SELECT * FROM S1

MINUS

SELECT * FROM S2


Cartesian product

Cartesian Product

  • Also referred to as cross-product or product.

  • Two input relations.

  • Each tuple of the one relation is paired with each tuple of the other relation.

  • Result schema has one attribute per attribute of both input relations, with attribute names `inherited’ if possible.

  • In the result, there may be two attributes with the same name, e.g. both S1 and R1 have an attribute called sid.

    • What to do??


Cartesian product1

Cartesian Product

S2

S1

  • Result contains all fields of S1 followed by fields of S2

  • Contains one tuple <s1,s2> for each pair of tuples in S1 and S2.

  • Schemas can be different

  • If S1 and S2 contains fields of the same name, there is a naming conflict. Rename one or both fields.

SELECT * FROM S1, S2


Cartesian product2

Cartesian Product

  • Field names in conflict become unnamed

R1

S1


Renaming1

Renaming

  • In the result, there may be two attributes with the same name, e.g. both S1 and R1 have an attribute called sid.

  • Then, apply the renaming operation, e.g.

  • The field names in the output relation are the same fields, but some are renamed

  • Field 1 is renamed to sid1, field 5 is renamed to sid5

  • For an expression to be well defined, no 2+ fields must have same name


Database systems i relational algebra

Join

  • Similar to Cartesian product with same result schema.

    • Cartesian Product has no conditions

    • Join has conditions

  • Join can be defined as a Cartesian Product, followed by selections and projections

    • Join however is much more efficient

    • Why?

S1

R1


Database systems i relational algebra

Join

  • Condition Join

    • Each tuple of the one relation is paired with each tuple of the other relation if the two tuples satisfy the join condition.

    • Condition c refers to attributes of both R and S.


Database systems i relational algebra

Equi-Join: A special case of Conditional Join where the condition c contains only equalities.

Result schema similar to Cartesian product, but only one copy of attributes for which equality is specified.

R1.sid is redundant

R1.sid is dropped by a applying a projection

Natural Join: Equi-join on all common attributes.

Join


Division

Division

  • Not supported as a primitive operation, but useful for expressing queries like: Find sailors who have reserved all boats.

  • Let A have 2 attributes, x and y; B have only attribute y:

    • A/B =

    • i.e., A/B contains all x tuples (sailors) such that for every y tuple (boat) in B, there is an xy tuple (reservation) in A.

  • In general, x and y can be any lists of attributes; y is the list of attributes in B, and x y is the list of attributes of A.


Division1

Division

B1

B2

B3

A/B1

A/B2

A/B3

A


Division2

all disqualified x values

Division

  • Division is not an essential operation; can be implemented using the five basic operations.

    • Also true of joins, but joins are so common that systems implement joins specially.

  • Idea:For A/B, compute all x values in A that are not ‘disqualified’ by some y value in B.

    • x value in A is disqualified if by attaching y value from B, we obtain an xy tuple that is not in A.

  • Disqualified x values:

A/B:


Division3

Division

  • Division is not an essential operation; can be implemented using the five basic operations.

    SELECT name

    FROM Student

    WHERE NOT EXISTS (

    SELECT CID

    FROM Course

    WHERE CID NOT IN (

    SELECT CID

    FROM Register

    WHERE Student.SID= Register.SID))

If such courses don’t exist (NOT EXISTS), then student must’ve taken all courses

returns those courses this student didn’t take


Example queries

Example Queries

  • Find names of sailors who’ve reservedboat #103.

  • Solution 1:

  • Solution 2:

  • Solution 3:

  • Which is most efficient? Why?


Example queries1

Example Queries

  • Find names of sailors who’ve reserved a red boat.

  • Information about boat color only available in Boats; so need an extra join:

  • A more efficient solution:

  • A query optimizer can find the second solution given the first one.


Example queries2

Example Queries

  • Find sailors who’ve reserved a red or a green boat.

  • Can identify all red or green boats, then find sailors who’ve reserved one of these boats:

  • Can also define Tempboats using union! (How?)

  • What happens if OR is replaced by AND in this query?


Example queries3

Example Queries

  • Find sailors who’ve reserved a red and a green boat.

  • Must identify sailors who’ve reserved red boats, sailors who’ve reserved green boats, then find the intersection (note that sid is a key for Sailors):


Example queries4

Example Queries

  • Find the names of sailors who’ve reserved all boats.

  • Uses division; schemas of the input relations must be carefully chosen:

  • To find sailors who’ve reserved all ‘Interlake’ boats:

  • Book has lots of examples.


Summary

Summary

  • The relational model has formal query languages that are easy to use and allow efficient optimization by the DBS.

  • Relational algebra (RA) is more procedural; used as internal representation for SQL query evaluation plans.

  • Five basic RA operations: selection, projection, Cartesian product, union, set-difference.

  • Additional operations as shorthand for important cases: intersection, join, division.

  • These operations can be implemented using the basic operations.


  • Login