**Relational AlgebraMore pointers from Tuesday**

**Relational Algebra - Division**

**Queries that give the same result as division are not** replacements for division

**Division Example** Want to perform Completed/DBProjectto find all students who completed all projects.

**Division Example** Step 1) Project Completed onto it’s unique attributes πStudent(Completed)

**Division Example** Step 2) Perform Cartesian product with DBProject πStudent(Completed) × DBProject Every student is combined with every task.

**Division Example** Step 3) Subtract Completed πStudent(Completed) × DBProject-Completed We have the possible combinations that "could have" been, but weren't.

**Division Example** Step 4) Project onto the unique attributes of Completed πStudent( πStudent(Completed) × DBProject-Completed) All students who have not completed all assignments.

**Division Example** Step 5) Subtract from Completed πStudent(Completed) - πStudent( πStudent(Completed) × DBProject-Completed) Completed/DBProject=

**Example Queries** • Find sailors who’ve reserved a red and a green boat. • Must identify sailors who’ve reserved red boats, sailors who’ve reserved green boats, then find the intersection (note that sid is a key for Sailors):

**Example Queries** • Find the names of sailors who’ve reserved all boats. • Uses division; schemas of the input relations must be carefully chosen: • To find sailors who’ve reserved all ‘Interlake’ boats: • Book has lots of examples.

**Database Systems I Query Optimization**

**Principles of Query Optimization** • Display the minimum number of fields in a query. • Use primary key or indexes wherever possible. • Use numeric rather than text primary keys. • Use non blank unique fields. • Avoid domain aggregate functions such as Dlookup(). • Use between and equal to , rather than > or <. It will speed up the queries. • Use count(*) rather than count(column). • Short table and field names run faster than long name. • Normalize the tables. • Avoid the use of distinct row queries.

**Query Optimization** • A user of a commercial DBMS formulates SQL queries • The query optimizer translates this query into an equivalent RA query, i.e. an RA query with the same result • In order to optimize the efficiency of query processing, the query optimizer can re-order the individual operations within the RA query • Re-ordering has to preserve the query semantics and is based on RA equivalences • Just like Math operations can be reordered, so can Relational Algebra operations

**Query Optimization** • Why can re-ordering improve the efficiency? • Different orders can imply different sizes of the intermediate results • The smaller the intermediate results, the more efficient • Example: much (!) more efficient than Why?

**Relational Algebra Equivalences** • The most important RA equivalences are commutative and associative laws. • A commutative law about some operation states that the order of (two) arguments does not matter. • An associative law about some (binary) operation states that (more than two) arguments can be grouped either from the left or from the right. • If an operation is both commutative and associative, then any number of arguments can be (re-)ordered in an arbitrary manner.

**Ç** > < Relational Algebra Equivalences • The following (binary) RA operations are commutative and associative: • For example, we have: • Proof method: show that each tuple produced by the expression on the left is also produced by the expression on the right and vice versa. (Commutative) (R S) (S R) (Associative) R (S T) (R S) T

**Relational Algebra Equivalences** • Selections are crucial from the point of view of query optimization, because they typically reduce the size of intermediate results by a significant factor. • Laws for selections only: (Splitting) (Commutative)

**Laws for the combination of selections and other operations:** if R has all attributes mentioned in c if S has all attributes mentioned in c The above laws can be applied to “push selections down” as much as possible in an expression, i.e. performing selections as early as possible. Relational Algebra Equivalences

**Relational Algebra Equivalences** • A projection commutes with a selection that only uses attributes retained by the projection. • Selection between attributes of the two arguments of a Cartesian product converts Cartesian product to a join. • Similarly, if a projection follows a join R S, we can ‘push’ it by retaining only attributes of R (and S) that are needed for the join or are kept by the projection.

**Summary** • Several ways of expressing a given query; a query optimizer chooses the most efficient version. • Query optimization exploits RA equivalencies to re-order the operations within an RA expression. • Optimization criterion is to minimize the size of intermediate relations.

**Database Systems I Relational Calculus**

**Relational Calculus** • Nonprocedural • Describes the set of answers without saying how they should be computed • Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus(DRC). • Calculus has variables, constants, comparison ops, logical connectives and quantifiers.

**Tuple Relational Calculus** • Query has the form: {T | p(T)} • p(T) denotes a formula in which tuple variable T appears. • Answer is the set of all tuples T for which the formula p(T) evaluates to true. • Formula is recursively defined: • start with simple atomic formulas (get tuples from relations or make comparisons of values) • build bigger and better formulas using the logical connectives.

**Domain Relational Calculus** • Query has the form: • Answer includes all tuples that make the formula be true. • Formula is recursively defined, starting with • simple atomic formulas (getting tuples from relations or making comparisons of values), and building bigger and better formulas using the logical connectives.

**TRC Formulas** • An Atomic formula is one of the following: • R Rel (R is a tuple in relation Rel) • R.a op S.b (comparing two fields) • R.a op constant (comparing field to constant) • op is one of • A formula can be: • an atomic formula • where p and q are formulas • where variable R is a tuple variable • where variable R is a tuple variable

**Selection and Projection** {S |S Sailors S.rating > 7} • Find all sailors with rating above 7

**Joins** • Find sailors rated > 7 who’ve reserved boat #103 {S | SSailorsS.rating > 7 R(RReservesR.sid = S.sid R.bid = 103)} • Note the use of to find a tuple in Reserves that ‘joins with’ the Sailors tuple under consideration.

**Unsafe Queries, Expressive Power** • It is possible to write syntactically correct calculus queries that have an infinite number of answers! Such queries are called unsafe. • e.g.,

**Summary** • Relational calculus is non-operational, and users define queries in terms of what they want, not in terms of how to compute it. (Declarativeness.) • Algebra and safe calculus have same expressive power, leading to the notion of relational completeness.