Create Presentation
Download Presentation

Download Presentation
## Lecture 3: Relational Algebra and SQL

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Lecture 3: Relational Algebra and SQL**Tuesday, March 25, 2008**Outline**• Relational Algebra: 4.2 (except 4.2.5) • SQL: 5.2, 5.3, 5.4**Querying the Database**• Goal: specify what we want from our database • Find all the employees who earn more than $50,000 and pay taxes in New Jersey. • Could write in C++/Java, but bad idea • Instead use high-level query languages: • Theoretical: Relational Algebra, Datalog • Practical: SQL • Relational algebra: a basic set of operations on relations that provide the basic principles.**Relational Algebra**• Operators: relations as input, new relation as output • Five basic RA operators: • Set Operators • union, difference • Selection: s • Projection: p • Cartesian Product: X • Derived operators: • Intersection, complement • Joins (natural,equi-join, theta join, semi-join) • When our relations have attribute names: • Renaming: r**Set Operations: Union**• Union: all tuples in R1 or R2 • Notation: R1 U R2 • R1, R2 must have the same schema • R1 U R2 has the same schema as R1, R2 • Example: • ActiveEmployees U RetiredEmployees**Union**• R S • Union of two relations R and S defines a relation that contains all the tuples of R, or S, or both R and S, duplicate tuples being eliminated. • R and S must be union-compatible. • If R and S have I and J tuples, respectively, union is obtained by concatenating them into one relation with a maximum of (I + J) tuples. Example: List all cities where there is either a branch office or a property for rent. • Pcity(Branch) union Pcity(PropertyForRent) or R = Branch[city] union PropertyForRent[city]**Intersection**• R S • Defines a relation consisting of the set of all tuples that are in both R and S. • R and S must be union-compatible. • Expressed using basic operations: R S = R – (R – S) Example: List all cities where there is both a branch office and at least one property for rent. city(Branch) city(PropertyForRent)**Difference (Minus)**• R – S • Defines a relation consisting of the tuples that are in relation R, but not in S. • R and S must be union-compatible. • List all cities where there is a branch office but no properties for rent. city(Branch) – city(PropertyForRent) Or R = Branch [city] - PropertyForRent [city]**Set Operations: Difference**• Difference: all tuples in R1 and not in R2 • Notation: R1 – R2 • R1, R2 must have the same schema • R1 - R2 has the same schema as R1, R2 • Example • AllEmployees - RetiredEmployees**Set Operations: Selection**• Returns all tuples which satisfy a condition • Notation: sc(R) • c is a condition: =, <, >, and, or, not • Output schema: same as input schema • Find all employees with salary > $40,000: • s Salary > 40000(Employee)**Find all employees with salary more than $40,000.**s Salary > 40000(Employee)**Restriction (or Selection)**• Works on a single relation R and defines a relation that contains only those tuples (rows) of R that satisfy the specified condition (predicate). Example: List all staff with a salary greater than US$10,000. salary > 10000 (Staff) or R = STAFF Where Salary > 10000**Projection**• Unary operation: returns certain columns • Eliminates duplicate tuples ! • Notation: P A1,…,An(R) • Input schema R(B1,…,Bm) • Condition: {A1, …, An} {B1, …, Bm} • Output schema S(A1,…,An) • Example: project social-security number and names: • PSSN, Name (Employee)**Projection**• col1, . . . , coln(R) • Works on a single relation R and defines a relation that contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates. • Produce a list of salaries for all staff, showing only staffNo, fName, lName, and salary details. staffNo, fName, lName, salary(Staff) or Staff [staffNo, fName, lName, salary]**Cartesian Product**• Each tuple in R1 with each tuple in R2 • Notation: R1 x R2 • Input schemas R1(A1,…,An), R2(B1,…,Bm) • Condition: {A1,…,An} ∩ {B1,…Bm} = F • Output schema is S(A1, …, An, B1, …, Bm) • Notation: R1 x R2 • Example: Employee x Dependents • Very rare in practice; but joins are very often**Cartesian Product (Multiplication)**• R X S • Defines a relation that is the concatenation of every tuple of relation R with every tuple of relation S. • List the names and comments of all clients who have viewed a property for rent. (clientNo, fName, lName(Client)) X (clientNo, propertyNo, comment (Viewing)) or Client [clientNo, fName, lName] x Viewing [clientNo, propertyNo, comment ]**Example - Cartesian product and Selection**• Use selection operation to extract those tuples where Client.clientNo = Viewing.clientNo. sClient.clientNo = Viewing.clientNo((ÕclientNo,fName,lName(Client)) (ÕclientNo,propertyNo,comment(Viewing))) • Cartesian product and Selection can be reduced to a single operation called a Join.**Renaming**• Does not change the relational instance • Changes the relational schema only • Notation: rB1,…,Bn (R) • Input schema: R(A1, …, An) • Output schema: S(B1, …, Bn) • Example: • LastName, SocSocNo (Employee)**Renaming Example**Employee Name SSN John 999999999 Tony 777777777 • LastName, SocSocNo (Employee) LastName SocSocNo John 999999999 Tony 777777777**Derived Operations**• Intersection can be derived: • R1 ∩ R2 = R1 – (R1 – R2) • There is another way to express it (later) • Most importantly: joins, in many variants**Join**• Join is a derivative of Cartesian product. • Equivalent to performing a Selection, using join predicate as selection formula, over Cartesian product of the two operand relations. • One of the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMSs have intrinsic performance problems.**Join**• Join is a derivative of Cartesian product. • Equivalent to performing a Selection, using join predicate as selection formula, over Cartesian product of the two operand relations. • One of the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMSs have intrinsic performance problems. • Various forms of join operation • Natural join (defined by Codd) • Outer join • Theta join • Equijoin (a particular type of Theta join) • Semijoin**Theta join (-join)**• R FS • Defines a relation that contains tuples satisfying the predicate F from the Cartesian product of R and S. • The predicate F is of the form R.ai S.bi where may be one of the comparison operators (<, , >, , =, ).**Theta join (-join)**• Can rewrite Theta join using basic Selection and Cartesian product operations. R FS = F(R S) • Degree of a Theta join is sum of degrees of the operand relations R and S. If predicate F contains only equality (=), the term Equijoin is used.**Example - Equijoin**• List the names and comments of all clients who have viewed a property for rent. (clientNo, fName, lName(Client)) Client.clientNo = Viewing.clientNo (clientNo, propertyNo, comment(Viewing))**EQUIJOIN**• SELECT table.column, table.column FROM table1, table2 WHERE table1.column1 = table2.column2;**Natural Join**• Notation: R1 R2 • Input Schema: R1(A1, …, An), R2(B1, …, Bm) • Output Schema: S(C1,…,Cp) • Where {C1, …, Cp} = {A1, …, An} U {B1, …, Bm} • Meaning: combine all pairs of tuples in R1 and R2 that agree on the attributes: • {A1,…,An} ∩ {B1,…, Bm} (called the join attributes) • Equivalent to a cross product followed by selection • Example Employee Dependents**Employee Dependents =**PName, SSN, Dname(sSSN=SSN2(Employee x rSSN2, Dname(Dependents)) Natural Join Example Employee Name SSN John 999999999 Tony 777777777 Dependents SSN Dname 999999999 Emily 777777777 Joe Name SSN Dname John 999999999 Emily Tony 777777777 Joe**Natural Join**• List the names and comments of all clients who have viewed a property for rent. (clientNo, fName, lName(Client)) Join (clientNo, propertyNo, comment(Viewing)) Or Client [clientNo, fName, lName] Join Viewing [clientNo, propertyNo, comment ]**Another Natural Join Example**• R= S= • R S=**Natural Join**• Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? • Given R(A, B, C), S(D, E), what is R S ? • Given R(A, B), S(A, B), what is R S ?**Theta Join**• A join that involves a predicate • Notation: R1 q R2 where q is a condition • Input schemas: R1(A1,…,An), R2(B1,…,Bm) • {A1,…An} ∩ {B1,…,Bm} = f • Output schema: S(A1,…,An,B1,…,Bm) • Derived operator: R1 q R2 = sq (R1 x R2)**Semijoin**• R S = PA1,…,An (R S) • Where the schemas are: • Input: R(A1,…An), S(B1,…,Bm) • Output: T(A1,…,An)**Outer join**• To display rows in the result that do not have matching values in the join column, use Outer join. • R S • (Left) outer join is join in which tuples from R that do not have matching values in common columns of S are also included in result relation.**Outer Join**• To display rows in the result that do not have matching values in the join column, use Outer join. • R Left Outer Join S • (Left) outer join is join in which tuples from R that do not have matching values in common columns of S are also included in result relation. Example: • Produce a status report on property viewings. propertyNo, street, city(PropertyForRent) Left Outer Join Viewing**Types of Joins**• Left Outer Join • keep all of the tuples from the “left” relation • join with the right relation • pad the non-matching tuples with nulls • Right Outer Join • same as the left, but keep tuples from the “right” relation • Full Outer Join • same as left, but keep all tuples from both relations**Left Outer Join**name phone name email R= S= • If we do a left outer join on R and S, and we match on the first column, the result is: name phone email**Example - Left Outer join**• Produce a status report on property viewings. propertyNo, street, city(PropertyForRent) Viewing**Right Outer Join**name email name phone R= S= • If we do a right outer join on R and S, and we match on the first column, the result is: name phone email**Full Outer Join**name email name phone R= S= • If we do a full outer join on R and S, and we match on the first column, the result is: name phone email**Division**• Identify all clients who have viewed all properties with three rooms. (clientNo, propertyNo(Viewing)) (propertyNo(rooms = 3 (PropertyForRent)))**Natural join**• R S • An Equijoin of the two relations R and S over all common attributes x. One occurrence of each common attribute is eliminated from the result.**Example - Natural join**• List the names and comments of all clients who have viewed a property for rent. (clientNo, fName, lName(Client)) (clientNo, propertyNo, comment(Viewing))**Semijoin**• R F S • Defines a relation that contains the tuples of R that participate in the join of R with S. • Can rewrite Semijoin using Projection and Join: • R F S = A(R F S)