SQL Unit 7Set Operations Kirk Scott
7.1 Introduction • 7.2 UNION Queries • 7.3 Queries with IN (Intersection) • 7.4 Queries with NOT IN (Set Subtraction) • 7.5 Unions, Joins, and Outer Joins
1. The technical term for a table is a relation. A relation is like a set. • The technical term for a row in a table is a tuple. • A tuple is like an element in a set.
The fundamental difference between an element of a set and a tuple in a relation is that the tuple may be a composite. • It may contain values for more than one different attribute.
The similarity between sets and relations explains some of the aspects of relations. • The order of elements in a set is immaterial; • likewise, the order of tuples in a relation is immaterial.
A set can't contain duplicate elements; • likewise, a relation can't contain duplicate tuples. • (Although a query result can.)
2. Recall the logical operator OR. • This allowed you to make conditions on the values of attributes. • There is also a set level operation, union, which is related in meaning. • Union applies not to attribute values, but to collections of tuples in relations.
Given two sets, A and B, you may recall this definition of union from math class: • = the union of A and B • = the set of elements in A or in B or in both A and B
A B • In this Venn diagram, both A and B and the area where they overlap are shaded, indicating that they all are included in the union.
Microsoft Access SQL has the keyword UNION, which implements the idea behind a logical union.
3. Here is a simple example illustrating the use of the union operator. • Suppose that table A and table B have each been defined with the same number of fields, of the same type, in the same order. • The names of the fields don't have to be the same.
Then consider this query: • SELECT * • FROM A • UNION • SELECT * • FROM B
The results will contain the set of rows that were in A or B or both • Because the query uses a set operator, and duplicates are not allowed in sets, any row that might have occurred in both A and B will only appear once in the results. • It’s not a major thing, but to emphasize, you can observe the following: • In general, query results may contain duplicate rows, but the use of set operators has a similar effect to the keyword DISTINCT.
4. The previous example specified that the tables in the two parts of the query had to have the same number of fields of the same type and in the same order. • It wouldn't do to have records in the same result table which varied in the number of fields they contained. • It also wouldn't do for numeric fields to hold non-numeric values, and vice-versa.
Having a correspondence between fields in the two parts of the query is known as union compatibility. • The specific requirements for this are: • A. The corresponding fields in the two parts of the query should mean the same thing. • This may be referred to as semantic equivalence.
B. If the corresponding fields are of exactly the same type and size, there is no problem at all. • The formal requirements are less stringent though: • i. All numeric fields are union compatible with each other. • ii. All text fields are union compatible with each other. • iii. All date fields are union compatible with each other.
In cases where the types of the fields are not the same, but they are union compatible, the "larger" of the two types will be used in the results. • Given two union compatible types, the "larger" kind of one type can always hold values of the other type.
A text field with a large width can hold the values of a text field with a smaller width. • A numeric type that can have decimal points can hold integer values. • Since the one that can hold the other is used in the results, no data will be lost when a union is done.
1. Here is a concrete example of a union query using tables and fields from the cardealership database: • SELECT * • FROM Car • WHERE make = 'Chevrolet' • UNION • SELECT * • FROM Car • WHERE make = 'Toyota'
This query illustrates the relationship between the UNION operator and the OR operator. • Because the two parts of the query are on the same table, the following query would accomplish the same thing: • SELECT * • FROM Car • WHERE make = 'Chevrolet' • OR make = 'Toyota'
In this query the Car table is the "universe", and the query finds the union of two disjoint subsets of the Car table, because no car could have two different makes at the same time. • This is the Venn diagram for the query:
Car make = 'Chevro-let' make = 'Toyota'
2. Here is another example of a union query. • The two parts of the query are based on two different tables: • SELECT name, addr, city, state • FROM Customer • UNION • SELECT name, addr, city, state • FROM Salesperson
Because two tables are involved, it would not be possible to accomplish this with the OR operator. • Notice that there is no problem with union compatibility because the corresponding fields in the two tables were defined in exactly the same way.
The Venn diagram for this query is more typical than the previous diagram. • The same person could be both a salesperson and a customer. • The results of the query would include the names, addresses, cities, and states of all customers, all salespeople, and anybody who fell into both categories.
Custom-ers Sales- people
3. As noted previously, the union operator eliminates duplicates from the results of a query. • If by chance you would like to do a union and not have the duplicates eliminated, you would use the keywords UNION ALL: • SELECT city • FROM Customer • UNION ALL • SELECT city • FROM Salesperson
There is a side effect related to eliminating or keeping duplicates in the results. • When plain UNION is used, the duplicates will be eliminated and the results will typically be sorted in some order. • The explanation is that the system uses the following approach to eliminate duplicates: • First it sorts the records. • After sorting, duplicates should be next to each other. • Then the system finds and eliminates them.
4. It is possible to do unions where one part of the query doesn't have fields corresponding to the fields in the other part. • Those fields that correspond have to be union compatible. • For those fields without corresponding fields, nulls have to be used.
Recall that the schemas for Customer and Salesperson look like this: • Customer(custnopk, name, addr, city, state, phone) • Salesperson(spnopk, name, addr, city, state, phone, bossno, commrate)
Here is an example of a union query where all of the fields of the Customer table are matched with the explicitly listed corresponding fields of the Salesperson table: • SELECT * • FROM Customer • UNION • SELECT spno, name, addr, city, state, phone • FROM Salesperson
If you would like to keep all of the fields from the Salesperson table while also including all of the records from the Customer table in the results, you could do this: • SELECT *, NULL, NULL • FROM Customer • UNION • SELECT * • FROM Salesperson
1. Among the concepts of set theory, along with union, there is intersection and there is the idea of set containment. • Given two sets, A and B, here is the definition of union again, along with the definitions of intersection and containment:
= the union of A and B • = the set of elements in A or in B or in both A and B
= the intersection of A and B • = the set of elements that A and B have in common
= A is contained in B; • as a proposition this is either true or false, either the elements of A are also in B, or they're not
A B • In this Venn diagram, the area where A and B overlap is crosshatched, indicating that this is the area in the intersection.
B A • This Venn diagram signifies that A is contained in B:
Microsoft Access SQL does not have keywords for intersection or containment, but it does have this operator: • IN • Using IN it is possible to write expressions that check whether or not a given set of tuples is included in another set. • This makes it possible to find the intersection of two sets.
2. It is possible to specify a set of values in SQL by enclosing the values in parentheses (not curly braces) and separating them with commas. • This first example of the use of the keyword IN involves such a set: • SELECT * • FROM Car • WHERE make IN ('Chevrolet', 'Toyota')
This query is equivalent in results to the following query already seen above: • SELECT * • FROM Car • WHERE make = 'Chevrolet' • OR make = 'Toyota' • The results of the query are the union of two sets.
3. The more general use of the keyword IN occurs when a set of values in a query is defined by a subquery rather than a set listed in parentheses. • An example is shown below. • Notice that its structure is similar to the foregoing examples.
The outer query selects from a table where some field value is in or is not in the set specified by the subquery: • SELECT name • FROM Salesperson • WHERE spno IN • (SELECT spno • FROM Carsale)
This query illustrates the ideas of intersection and containment. • You're selecting the names of salespeople whose spno's appear in the Carsale table. • Because of referential integrity, every spno in the Carsale table has to appear in the Salesperson table. • That means that the set of spno's from the Carsale table is a subset of the spno's in the Salesperson table.
Not every salesperson has to have sold a car, so not necessarily every spno in Salesperson appears in the Carsale table. • When you find the intersection between the two, it is simply the set of spno's from Carsale.