97 Views

Download Presentation
##### Lecture 07: Relational Algebra

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Outline**• Relational Algebra (Section 6.1)**Relational Algebra**• Formalism for creating new relations from existing ones • Its place in the big picture: Declarativequerylanguage Algebra Implementation Relational algebra SQL,relational calculus**Relational Algebra**• Five operators: • Union: • Difference: - • Selection: s • Projection: P • Cartesian Product: • Derived or auxiliary operators: • Intersection, complement • Joins (natural,equi-join, theta join, semi-join) • Renaming: r**1. Union and 2. Difference**• R1 R2Example: • ActiveEmployees RetiredEmployees • R1 – R2Example: • AllEmployees − RetiredEmployees**What about Intersection ?**• It is a derived operatorR1 R2 = R1 – (R1 – R2) • Also expressed as a join (will see later)Example • UnionizedEmployees RetiredEmployees**3. Selection**• Returns all tuples which satisfy a condition • Notation: sc(R) • Examples • sSalary > 40000(Employee) • sname = “Smith”(Employee) • The condition c can be =, <, , >,, <> [in SQL: SELECT * FROM Employee WHERE Salary > 40000]**Find all employees with salary more than $40,000.**s Salary > 40000(Employee)**4. Projection**• Eliminates columns, then removes duplicates • Notation: P A1,…,An(R) • Example: project to social-security number and names: • PSSN, Name (Employee) • Output schema: Answer(SSN, Name) [In SQL: SELECT DISTINCT SSN, Name FROM Employee]**5. Cartesian Product**• Combine each tuple in R1 with each tuple in R2 • Notation: R1 R2 • Example: • Employee Dependents • Very rare in practice; mainly used to express joins [In SQL: SELECT * FROM R1, R2]**Relational Algebra**• Five operators: • Union: • Difference: - • Selection: s • Projection: P • Cartesian Product: • Derived or auxiliary operators: • Intersection, complement • Joins (natural,equi-join, theta join, semi-join) • Renaming: r**Renaming**• Changes the schema, not the instance • Schema: R(A1, …, An ) • Notation: rB1,…,Bn (R) • Example: • rLastName, SocSocNo (Employee) • Output schema: Answer(LastName, SocSocNo) [in SQL: SELECT Name AS LastName, SSN AS SocSocNo FROM Employee]**Renaming Example**Employee Name SSN John 999999999 Tony 777777777 • LastName, SocSocNo (Employee) LastName SocSocNo John 999999999 Tony 777777777**Natural Join**• Notation: R1 ⋈ R2 • Meaning: R1 ⋈ R2 = PA(sC(R1 R2)) • Where: • The selection sC checks equality of all common attributes • The projection eliminates the duplicate common attributes [in SQL: SELECT DISTINCT R1.A, R1. B, R2.C FROM R1, R2 WHERE R1.B = R2.B Schema: R1(A,B), R2(B,C)]**Employee Dependents =**PName, SSN, Dname(sSSN=SSN2(Employee x rSSN2, Dname(Dependents)) Natural Join Example Employee Name SSN John 999999999 Tony 777777777 Dependents SSN Dname 999999999 Emily 777777777 Joe Name SSN Dname John 999999999 Emily Tony 777777777 Joe**Natural Join**• R= S= • R ⋈ S=**Natural Join**• Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? • Given R(A, B, C), S(D, E), what is R ⋈ S ? • Given R(A, B), S(A, B), what is R ⋈ S ?**Theta Join**• A join that involves a predicate • R1 ⋈q R2 = sq (R1 R2) • Here q can be any condition**Eq-join**• A theta join where q is an equality R1 ⋈A=B R2 = s A=B (R1 R2) • Example: • Employee ⋈SSN=SSN Dependents • Most useful join in practice(difference to natural join?)**Semijoin**• R ⋉ S = PA1,…,An (R ⋈ S) • Where A1, …, An are the attributes in R • Example: • Employee ⋉ Dependents**Semijoins in Distributed Databases**• Semijoins are used in distributed databases Dependents Employee network Employee ⋈ssn=ssn (s age>71 (Dependents)) T = PSSNs age>71 (Dependents) R = Employee ⋉ T Answer = R ⋈ Dependents**seller-ssn=ssn**pid=pid buyer-ssn=ssn Complex RA Expressions Pname Person Purchase Person Product Pssn Ppid sname=fred sname=gizmo**sname**sid=sid (Scan; (Scan; write to write to rating > 5 bid=100 temp T2) temp T1) Reserves Sailors sname rating > 5 bid=100 sid=sid Sailors Reserves Application: Query Rewriting for Optimization The earlier we process selections, less tuples we need to manipulate higher up in the tree (predicate pushdown) Disadvantages?**Algebraic Laws (Examples)**• Commutative and Associative Laws • R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T • R S = S R, R (S T) = (R S) T • Laws involving selection • sC AND C’(R) = sC(sC’(R)) = sC(R) ∩ sC’(R) • sC (R S) = sC (R) S • When C involves only attributes of R • Laws involving projections • PM(PN(R)) = PM,N(R) ⋈ ⋈ ⋈ ⋈ ⋈ ⋈ ⋈ ⋈**Operations on Bags**A bag = a set with repeated elements All operations need to be defined carefully on bags • {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f} • {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b} • sC(R): preserve the number of occurrences • PA(R): no duplicate elimination • Cartesian product, join: no duplicate elimination Important ! Relational Engines work on bags, not sets !**Finally: RA has Limitations !**• Cannot compute “transitive closure” • Find all direct and indirect relatives of Fred • Cannot express in RA !!! Need to write C program