910 likes | 1.22k Views
Lecture 3: Relational Algebra and SQL. Outline. Relational Algebra: 4.2 (except 4.2.5) SQL: 5.2, 5.3, 5.4. Querying the Database. Goal: specify what we want from our database Find all the employees who earn more than $50,000 and pay taxes in New Jersey.
 
                
                E N D
Outline • Relational Algebra: 4.2 (except 4.2.5) • SQL: 5.2, 5.3, 5.4
Querying the Database • Goal: specify what we want from our database • Find all the employees who earn more than $50,000 and pay taxes in New Jersey. • Could write in C++/Java, but bad idea • Instead use high-level query languages: • Theoretical: Relational Algebra, Datalog • Practical: SQL • Relational algebra: a basic set of operations on relations that provide the basic principles.
Relational Algebra • Operators: relations as input, new relation as output • Five basic RA operators: • Set Operators • union, difference • Selection: s • Projection: p • Cartesian Product: X • Derived operators: • Intersection, complement • Joins (natural,equi-join, theta join, semi-join) • When our relations have attribute names: • Renaming: r
Set Operations: Union • Union: all tuples in R1 or R2 • Notation: R1 U R2 • R1, R2 must have the same schema • R1 U R2 has the same schema as R1, R2 • Example: • ActiveEmployees U RetiredEmployees
Union • R  S • Union of two relations R and S defines a relation that contains all the tuples of R, or S, or both R and S, duplicate tuples being eliminated. • R and S must be union-compatible. • If R and S have I and J tuples, respectively, union is obtained by concatenating them into one relation with a maximum of (I + J) tuples. Example: List all cities where there is either a branch office or a property for rent. • Pcity(Branch) union Pcity(PropertyForRent) or R = Branch[city] union PropertyForRent[city]
Intersection • R  S • Defines a relation consisting of the set of all tuples that are in both R and S. • R and S must be union-compatible. • Expressed using basic operations: R  S = R – (R – S) Example: List all cities where there is both a branch office and at least one property for rent. city(Branch) city(PropertyForRent)
Difference (Minus) • R – S • Defines a relation consisting of the tuples that are in relation R, but not in S. • R and S must be union-compatible. • List all cities where there is a branch office but no properties for rent. city(Branch) – city(PropertyForRent) Or R = Branch [city] - PropertyForRent [city]
Set Operations: Difference • Difference: all tuples in R1 and not in R2 • Notation: R1 – R2 • R1, R2 must have the same schema • R1 - R2 has the same schema as R1, R2 • Example • AllEmployees - RetiredEmployees
Set Operations: Selection • Returns all tuples which satisfy a condition • Notation: sc(R) • c is a condition: =, <, >, and, or, not • Output schema: same as input schema • Find all employees with salary > $40,000: • s Salary > 40000(Employee)
Find all employees with salary more than $40,000. s Salary > 40000(Employee)
Restriction (or Selection) • Works on a single relation R and defines a relation that contains only those tuples (rows) of R that satisfy the specified condition (predicate). Example: List all staff with a salary greater than US$10,000. salary > 10000 (Staff) or R = STAFF Where Salary > 10000
Projection • Unary operation: returns certain columns • Eliminates duplicate tuples ! • Notation: P A1,…,An(R) • Input schema R(B1,…,Bm) • Condition: {A1, …, An}  {B1, …, Bm} • Output schema S(A1,…,An) • Example: project social-security number and names: • PSSN, Name (Employee)
Projection • col1, . . . , coln(R) • Works on a single relation R and defines a relation that contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates. • Produce a list of salaries for all staff, showing only staffNo, fName, lName, and salary details. staffNo, fName, lName, salary(Staff) or Staff [staffNo, fName, lName, salary]
Cartesian Product • Each tuple in R1 with each tuple in R2 • Notation: R1 x R2 • Input schemas R1(A1,…,An), R2(B1,…,Bm) • Condition: {A1,…,An} ∩ {B1,…Bm} = F • Output schema is S(A1, …, An, B1, …, Bm) • Notation: R1 x R2 • Example: Employee x Dependents • Very rare in practice; but joins are very often
Cartesian Product (Multiplication) • R X S • Defines a relation that is the concatenation of every tuple of relation R with every tuple of relation S. • List the names and comments of all clients who have viewed a property for rent. (clientNo, fName, lName(Client)) X (clientNo, propertyNo, comment (Viewing)) or Client [clientNo, fName, lName] x Viewing [clientNo, propertyNo, comment ]
Example - Cartesian product and Selection • Use selection operation to extract those tuples where Client.clientNo = Viewing.clientNo. sClient.clientNo = Viewing.clientNo((ÕclientNo,fName,lName(Client))  (ÕclientNo,propertyNo,comment(Viewing))) • Cartesian product and Selection can be reduced to a single operation called a Join.
Renaming • Does not change the relational instance • Changes the relational schema only • Notation: rB1,…,Bn (R) • Input schema: R(A1, …, An) • Output schema: S(B1, …, Bn) • Example: • LastName, SocSocNo (Employee)
Renaming Example Employee Name SSN John 999999999 Tony 777777777 • LastName, SocSocNo (Employee) LastName SocSocNo John 999999999 Tony 777777777
Derived Operations • Intersection can be derived: • R1 ∩ R2 = R1 – (R1 – R2) • There is another way to express it (later) • Most importantly: joins, in many variants
Join • Join is a derivative of Cartesian product. • Equivalent to performing a Selection, using join predicate as selection formula, over Cartesian product of the two operand relations. • One of the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMSs have intrinsic performance problems.
Join • Join is a derivative of Cartesian product. • Equivalent to performing a Selection, using join predicate as selection formula, over Cartesian product of the two operand relations. • One of the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMSs have intrinsic performance problems. • Various forms of join operation • Natural join (defined by Codd) • Outer join • Theta join • Equijoin (a particular type of Theta join) • Semijoin
Theta join (-join) • R FS • Defines a relation that contains tuples satisfying the predicate F from the Cartesian product of R and S. • The predicate F is of the form R.ai S.bi where  may be one of the comparison operators (<, , >, , =, ).
Theta join (-join) • Can rewrite Theta join using basic Selection and Cartesian product operations. R FS = F(R  S) • Degree of a Theta join is sum of degrees of the operand relations R and S. If predicate F contains only equality (=), the term Equijoin is used.
Example - Equijoin • List the names and comments of all clients who have viewed a property for rent. (clientNo, fName, lName(Client)) Client.clientNo = Viewing.clientNo (clientNo, propertyNo, comment(Viewing))
EQUIJOIN • SELECT table.column, table.column FROM table1, table2 WHERE table1.column1 = table2.column2;
Natural Join • Notation: R1 R2 • Input Schema: R1(A1, …, An), R2(B1, …, Bm) • Output Schema: S(C1,…,Cp) • Where {C1, …, Cp} = {A1, …, An} U {B1, …, Bm} • Meaning: combine all pairs of tuples in R1 and R2 that agree on the attributes: • {A1,…,An} ∩ {B1,…, Bm} (called the join attributes) • Equivalent to a cross product followed by selection • Example Employee Dependents
Employee Dependents = PName, SSN, Dname(sSSN=SSN2(Employee x rSSN2, Dname(Dependents)) Natural Join Example Employee Name SSN John 999999999 Tony 777777777 Dependents SSN Dname 999999999 Emily 777777777 Joe Name SSN Dname John 999999999 Emily Tony 777777777 Joe
Natural Join • List the names and comments of all clients who have viewed a property for rent. (clientNo, fName, lName(Client)) Join (clientNo, propertyNo, comment(Viewing)) Or Client [clientNo, fName, lName] Join Viewing [clientNo, propertyNo, comment ]
Another Natural Join Example • R= S= • R S=
Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? • Given R(A, B, C), S(D, E), what is R S ? • Given R(A, B), S(A, B), what is R S ?
Theta Join • A join that involves a predicate • Notation: R1 q R2 where q is a condition • Input schemas: R1(A1,…,An), R2(B1,…,Bm) • {A1,…An} ∩ {B1,…,Bm} = f • Output schema: S(A1,…,An,B1,…,Bm) • Derived operator: R1 q R2 = sq (R1 x R2)
Semijoin • R S = PA1,…,An (R S) • Where the schemas are: • Input: R(A1,…An), S(B1,…,Bm) • Output: T(A1,…,An)
Outer join • To display rows in the result that do not have matching values in the join column, use Outer join. • R S • (Left) outer join is join in which tuples from R that do not have matching values in common columns of S are also included in result relation.
Outer Join • To display rows in the result that do not have matching values in the join column, use Outer join. • R Left Outer Join S • (Left) outer join is join in which tuples from R that do not have matching values in common columns of S are also included in result relation. Example: • Produce a status report on property viewings. propertyNo, street, city(PropertyForRent) Left Outer Join Viewing
Types of Joins • Left Outer Join • keep all of the tuples from the “left” relation • join with the right relation • pad the non-matching tuples with nulls • Right Outer Join • same as the left, but keep tuples from the “right” relation • Full Outer Join • same as left, but keep all tuples from both relations
Left Outer Join name phone name email R= S= • If we do a left outer join on R and S, and we match on the first column, the result is: name phone email
Example - Left Outer join • Produce a status report on property viewings. propertyNo, street, city(PropertyForRent) Viewing
Right Outer Join name email name phone R= S= • If we do a right outer join on R and S, and we match on the first column, the result is: name phone email
Full Outer Join name email name phone R= S= • If we do a full outer join on R and S, and we match on the first column, the result is: name phone email
Division • Identify all clients who have viewed all properties with three rooms. (clientNo, propertyNo(Viewing))  (propertyNo(rooms = 3 (PropertyForRent)))
Natural join • R S • An Equijoin of the two relations R and S over all common attributes x. One occurrence of each common attribute is eliminated from the result.
Example - Natural join • List the names and comments of all clients who have viewed a property for rent. (clientNo, fName, lName(Client)) (clientNo, propertyNo, comment(Viewing))
Semijoin • R F S • Defines a relation that contains the tuples of R that participate in the join of R with S. • Can rewrite Semijoin using Projection and Join: • R F S = A(R F S)