Relational Algebra

Relational Algebra Relational Algebra Operations

Introduction • Mathematical set theory brings a group of operations that manipulate relations to a relational database. • Known as the relational algebra, these operations are used by a DBMS to process queries sub mitted by users. • Users typically • do not work directly with relational algebra, • but instead use query languages built on the relational calculus.

Five Basic Operations • To be considered relationally complete, a DBMS must support at least five operations from the relational algebra: • Project is used to take a subset of the columns of a relation. • Joinis used to paste two relations together by matching values against a condition. • Select is used to retrieve rows from a relation. • Union combines all rows from two relations, and • Difference produces all rows from one relation that are not contained in another.

Four More Operations • DBMSs also occasionally make use of four other relational operations: product, intersect, divide, and outer join. • Product (the Cartesian product of two sets) combines two relations by creating new rows for all possible combinations of rows. • Intersecting two relations produces a result consisting of all rows that appear in both relations. • Divide, which works on a relation with one column (a unary relation) and another with two columns (a binary relation), extracts values from the binary relation in a manner that will be discussed later in this chapter. • Outer join is a variation of the join operation that produces a result which is not a legal relation; it is, however, very useful for data retrieval.

General Format • There is no official syntax for expressing relational algebra operations. • Each relational algebra operation will be represented in the following general format: • OPERATIONparameters FROMsource_table_nameGIVINGresult_table_name

Project • A projection of a relation is a new relation created by copying one or more of the columns from the original relation into a new table. • As an example, consider Figure 6.1. The relation Trans is a projection of the relation Bank with the attributes Customer #, Account #, Transaction # and Amount. • In the relational algebra, the projection in Figure 6.1 is written: • ROJECT Customer #, Account #, Transaction #, Amount FROM Bank GIVING Trans

Project • Projection, like all relational algebra operations, removes duplicate rows from the result table which may result in fewer rows. • The middle table in Figure 6.2 contains the intermediate result from the following projection: • PROJECT Customer #, Account # FROM Trans GIVING Account Owners

Project • While it is theoretically possible to create projections from any combination of the columns in a relation, not all projections may be meaningful. • Consider the projection in Figure 6.3: • PROJECT Customer Address, Transaction # FROM Bank GIVING Place

Join • The join operation creates one result table from two source tables. • It is used primarily to implement retrieval operations in which data are stored in two or more relations. • The general form of the join operation is: • JOIN table_1 TO table_2 OVER attributes_used_for_matching GIVING result_table

Join • Remember that Small Bank`s customer and account data are stored as follows: • Customers (Customer #, Cust.Name, Cust. Address, Cust.Phone, SSN) • Account (Account #, Customer #, Branch Name, Account Type, Current Balance, Date Opened)

Join • If the DBMS receives a request to • print out the name, address, and account numbers for each of Small Bank`s customers, 1. It must it consult the Account table to discover which customer number is associated with which Account number, 2. It must have some way to associate names and addresses with the account numbers.

Join • Using the relational algebra, the operation is written: • JOIN Customer TO Account OVER Customer # GIVING List • A join includes all columns in both source relations. • In a natural join, the column or columns used to perform the join are duplicated in the result. • In an equi-join, the matching column or columns are not duplicated. • Like project, join also excludes duplicate rows

Join • The result of this join appears in Figure 6.4. • Notice that two customers, 10015 and 10055, have no accounts. • In other words, there are no rows for their customer numbers in Account. • Since no matching customer number could be found for these two people, neither appears in the result table.

Join • While it is theoretically possible to join any two relations, the result of a join is not always valid. • A meaningless projection is useless but harmless; however, an invalid join actually contains facts that aren't true. • As an example, consider what happens if someone were to look at the Small Bank database and decide that the Customer # column wasn`t needed in the Account table.

Join • Instead, to find out who owns which account, the Branch Patronage table could be joined to the Account table over Branch Name, the column that the two tables have in common: • JOIN Branch Patronage TO Account OVER Branch Name GIVING Branch Owners • The result of this join appears in Figure 6.5.

Join • In general, • joins between a primary key and a foreign key are valid; • joins between two primary keys are also valid (should your database have two tables with the same primary key). • However, other joins (such as the example you have just seen) are usually invalid

Join • The joins that you have seen to this point are based on matching values of an attribute held in common by the two relations being joined. • It is also possible to join relations based on other criteria, such as an inequality of values in the shared attribute.

Select • The select operation is often misunderstood, primarily because it is used as a retrieval operator by SQL. • As defined in the relational algebra, select creates a new table by copying rows from a relation that meet specified criteria. • Select copies all attributes in the relation; it has no way to specify which attributes should be included in the resulting table.

Select • Select identifies which rows are to be included in the result table with a logical criteria. • The operation therefore takes the general Form: • SELECT FROM source_table_name WHERE logical_selection_criteria GIVING result_table

Select • For example, suppose we want to retrieve the customer numbers of those tomers who have accounts at Small Bank`s Central branch. • The operation might be expressed as: • SELECT FROM Account WHERE Branch Name = “Central” GIVING One Branch • The result appears in Figure 6.6.

Union • Union combines all rows from two relations into a single, resultant relation. • For union to work, however, the two relations being combined must be union-compatible. That means that their structure must be identical; they must contain the same columns. • The columns do not necessarily have to have the same names, but they must have the same logical domains.

Union • For example, assume that Small Bank merges with Smaller Bank. • Smaller Bank is also using a database system. Some of the relations in its database are: • Clients (Client#, CNameCAdd, CPhone, CSSN) • Accounts 1Client#,' Acct#l • Transactions lTCode, Acct#, TDate, TArnt)

Union • While none of the relations have the same column names as Small Bank's Customers, Account and Transaction relations, some union-compatibility does exist. • Clientsis union-compatible with customers, since each attribute in Clients corresponds exactly to an attribute in Customers. • The same is not true for Small Bank's Account Table; it contains many more columns than Smaller Bank's Account table. • Small Bank's table also isn't union-compatible with Smaller bank's table.

Union • Because Clients and Customers are union-compatible, Small Bank can use the union operation to compile a single list of customers. • Figure 6.7 shows the result of a union operation on Small Bank`s Customers relation and Smaller Bank's Clients relation. • Union does remove duplicate rows, but only when the entire rows are the same. In this case, the rows are the same except for the customer number.

Difference • The difference operation produces a result that consists of all rows in the first relation that are not present in the second . • It subtracts the second relation in the command from the first. • Rows that appear in the second relation but not the first are ignored. • Difference can only be performed on union-compatible relations.

Difference • It is used primarily to answers questions involving a negative. • For example, if Small Bank removes the customer number columns from Customers and Clients, it can be used to identify Small Bank customers who are not Smaller Bank customers. That operation (summarized in Figure 6.8.) • Temp Customers MINUS Temp Clients GIVING Small Bank Only

Difference • Difference is the only relational algebra operation in which the order or the tables in the operation changes the result. • Consider the result of the operation • Temp Clients MINUS Temp Customers GIVING Smaller Bank Only • As you can see from Figure 6.9, reversing the order of the relations reverses the meaning of the operation.

“Show me all the people who are customers of Smaller Bank but not Small Bank". The result table contains all customers from Temp Clients who don’t appear in Temp Customers. In this case, only Samuel Smith has accounts at both banks and therefore doesn’t appear in the result.

lntersect • The intersect operation is the opposite of union. While • unionproduces a result containing all rows that appear in either relation, • intersectionproduces a result containing all rows that appear in both relations. • Intersection can therefore only be performed on two union-compatible relations.

Intersect • Intersection is useful for identifying entities that have some characteristic in common. • For example, if Small Bank wished to know which of its customers also had accounts at Smaller Bank, it could intersect Temp Customers and Temp Clients. • The result, seen in Figure 6.10, contains only rows for only those individuals who have matching rows in both Temp Customers and Temp Clients.

Intersect • Beware that there is a potential problem with an intersection. • For a row to be included in the result, it must exist in exactly the same way in the two relations being intersected. • In the example we have just seen, • if a customer's address is stored as "105 W. 99th” in Temp Customers but as “l05 W. 99th Pl ace" in Temp Clients, the intersection will not recognize the two rows as being equivalent.

Product • The product operation (the Cartesian product) creates new rows by concatenating every row from one relation onto every row in another. • It creates all possible pairs of rows. Example: • If one of the relations in the operation has 15 rows and the other 20 rows, the virtual table created by taking their product will contain 300 rows. • Columns that appear in both tables participating in the operation are duplicated in the result table.

Product • Product is generally not used directly to produce the result of a query. • Instead, a DBMS may perform a productfollowed by a select operation to implement a join. • To see how this works, consider a query that asks for the customer numbers of all people who have accounts at each of Small Bank’s branches.

Product • This query requires a join between the Branch and Branch Patronage tables over the Branch Name column. • However, rather than doing the join directly, a DBMS might perform a product on Branch and Branch Patronage: • Branch TIMES Branch Patronage GIVING Intermediate Product

Product • The result table, which appears in Figure 6.11 as Intermediate Product, contains 24 rows (the three rows in Branch multiplied by the eight rows in Branch Patronage). • The join, however, needs only those rows where the branch name in both tables is the same. • Therefore, the select that completes the implementation of the join is written: , • SELECT FROM Intermediate Product WHERE Branch Name (from Branch) = Branch Name (from Branch Patronage) GIVING Result of join

Divide • The divide operation is perhaps the least intuitive of all the operations in the relational algebra. • As most commonly performed, division uses two relations: • a unary relation (one with only a single column) and • a binary relation (one with two columns).

Divide • To see how it works, assume that we have a new relation for Small Bank, Branch Names (Branch Name). • In Figure 6.12, Branch Names contains only two rows, one for the Central branch and one for the Westlake branch. • If we divide Branch Patronage (the dividend) by Branch Names (the divisor), we get a unary relation (Two Branch Customers) as the result. Its single column is Customer #, the column that Branch Patronage and Branch Names do not have in common.

If you look at the data in Two Branch Customers, you will see that it contains the customer numbers of all people who have an account at both the Central branch and the Westlake branch.

Divide • Division retrieves values from the column of the dividend relation that it does not share with the divisor relation. • However, values are selected by looking for matches between the overlapping column (for example, Brunch Name). • In general, divide retrieves entries that have multiple rows in a table (in this example, customers who have accounts are two specific branches).

Divide • The result of a divide can also be obtained by • performing one select for each value in the unary relation and then • joining the results of the selects. (If there are n rows in the unary relation, you will need n-1 joins.) • As you can see in Figure 6.13, • each select retrieves rows for one of the values in the unary relation. • The join is then performed over customer number, placing a row for each customer who has rows in both of the selects result tables into the join result table.

Relational Algebra