1 / 106

Lecture 8: Query Execution

Lecture 8: Query Execution. Wednesday, November 17, 2010. Outline. Relational Algebra: Ch. 4.2 Overview of query evaluation: Ch. 12 Evaluating relational operators: Ch. 14. The WHAT and the HOW. In SQL we write WHAT we want to get form the data

raja
Download Presentation

Lecture 8: Query Execution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 8:Query Execution Wednesday, November 17, 2010 Dan Suciu -- CSEP544 Fall 2010

  2. Outline • Relational Algebra: Ch. 4.2 • Overview of query evaluation: Ch. 12 • Evaluating relational operators: Ch. 14 Dan Suciu -- CSEP544 Fall 2010

  3. The WHAT and the HOW • In SQL we write WHAT we want to get form the data • The database system needs to figure out HOW to get the data we want • The passage from WHAT to HOW goes through the Relational Algebra Data Independence Dan Suciu -- CSEP544 Fall 2010

  4. SQL = WHAT Product(pid, name, price)Purchase(pid, cid, store) Customer(cid, name, city) SELECTDISTINCTx.name, z.name FROM Product x, Purchase y, Customer z WHEREx.pid = y.pid and y.cid = y.cid andx.price > 100 and z.city = ‘Seattle’ It’s clear WHAT we want, unclear HOW to get it Dan Suciu -- CSEP544 Fall 2010

  5. Relational Algebra = HOW Final answer Product(pid, name, price)Purchase(pid, cid, store) Customer(cid, name, city) δ T4(name,name) Π x.name,z.name T3(. . . ) T2( . . . .) σ price>100 and city=‘Seattle’ T1(pid,name,price,pid,cid,store) cid=cid Temporary tablesT1, T2, . . . pid=pid Customer Purchase Product

  6. Relational Algebra = HOW The order is now clearly specified: Iterate over PRODUCT… …join with PURCHASE… …join with CUSTOMER… …select tuples with Price>100 and City=‘Seattle’… …eliminate duplicates… …and that’s the final answer ! Dan Suciu -- CSEP544 Fall 2010

  7. Sets v.s. Bags • Sets: {a,b,c}, {a,d,e,f}, { }, . . . • Bags: {a, a, b, c}, {b, b, b, b, b}, . . . Relational Algebra has two semantics: • Set semantics • Bag semantics Dan Suciu -- CSEP544 Fall 2010

  8. Extended Algebra Operators • Union , intersection , difference - • Selection s • ProjectionΠ • Join ⨝ • Rename  • Duplicate elimination d • Grouping and aggregation g • Sorting t Dan Suciu -- CSEP544 Fall 2010

  9. Relational Algebra (1/3) The Basic Five operators: • Union:  • Difference: - • Selection:s • Projection: P • Join: ⨝ Dan Suciu -- CSEP544 Fall 2010

  10. Relational Algebra (2/3) Derived or auxiliary operators: • Renaming:ρ • Intersection, complement • Variations of joins • natural, equi-join, theta join, semi-join, cartesian product Dan Suciu -- CSEP544 Fall 2010

  11. Relational Algebra (3/3) Extensions for bags: • Duplicate elimination:δ • Group by: γ • Sorting:τ Dan Suciu -- CSEP544 Fall 2010

  12. Union and Difference R1  R2 R1 – R2 What do they mean over bags ? Dan Suciu -- CSEP544 Fall 2010

  13. What about Intersection ? • Derived operator using minus • Derived using join (will explain later) R1  R2 = R1 – (R1 – R2) R1  R2 = R1 ⨝ R2 Dan Suciu -- CSEP544 Fall 2010

  14. Selection • Returns all tuples which satisfy a condition • Examples • sSalary > 40000(Employee) • sname = “Smith”(Employee) • The condition c can be =, <, , >,, <> sc(R) Dan Suciu -- CSEP544 Fall 2010

  15. Employee sSalary > 40000(Employee) Dan Suciu -- CSEP544 Fall 2010

  16. Projection • Eliminates columns • Example: project social-security number and names: • PSSN, Name (Employee) • Answer(SSN, Name) P A1,…,An(R) Semantics differs over set or over bags Dan Suciu -- CSEP544 Fall 2010

  17. Employee PName,Salary (Employee) Set semantics Bag semantics Which is more efficient to implement ? Dan Suciu -- CSEP544 Fall 2010

  18. Cartesian Product • Each tuple in R1 with each tuple in R2 • Very rare in practice; mainly used to express joins R1  R2 Dan Suciu -- CSEP544 Fall 2010

  19. Employee Dependent Employee ✕ Dependent Dan Suciu -- CSEP544 Fall 2010

  20. Renaming • Changes the schema, not the instance • Example: • rN, S(Employee)  Answer(N, S) rB1,…,Bn (R) Dan Suciu -- CSEP544 Fall 2010

  21. Natural Join R1 ⨝R2 • Meaning: R1⨝R2 = PA(s(R1 × R2)) • Where: • The selection schecks equality of all common attributes • The projection eliminates the duplicate common attributes Dan Suciu -- CSEP544 Fall 2010

  22. Natural Join R S R ⨝ S= PABC(sR.B=S.B(R × S)) Dan Suciu -- CSEP544 Fall 2010

  23. Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⨝ S ? • Given R(A, B, C), S(D, E), what is R ⨝ S ? • Given R(A, B), S(A, B), what is R ⨝ S ? Dan Suciu -- CSEP544 Fall 2010

  24. Theta Join • A join that involves a predicate • Here q can be any condition R1 ⨝q R2 = sq (R1  R2) Dan Suciu -- CSEP544 Fall 2010

  25. Eq-join • A theta join where q is an equality • This is by far the most used variant of join in practice R1 ⨝A=B R2 = sA=B (R1  R2) Dan Suciu -- CSEP544 Fall 2010

  26. So Which Join Is It ? • When we write R ⨝ S we usually mean an eq-join, but we often omit the equality predicate when it is clear from the context Dan Suciu -- CSEP544 Fall 2010

  27. Semijoin • Where A1, …, An are the attributes in R R ⋉C S = PA1,…,An (R ⨝C S) • Formally, R ⋉CS means this: retain from R only thosetuples that have some matching tuple in S • Duplicates in R are preserved • Duplicates in S don’t matter Dan Suciu -- CSEP544 Fall 2010

  28. Semijoins in Distributed Databases Dependent Employee network Employee ⨝SSN=EmpSSN (sage>71 (Dependent)) Assumptions: Very few Employees have dependents. Very few dependents have age > 71. “Stuff” is big. Task: compute the query with minimum amount of data transfer

  29. Semijoins in Distributed Databases Dependent Employee network Employee ⨝SSN=EmpSSN (sage>71 (Dependent)) T(SSN) = PSSNsage>71 (Dependents)

  30. Semijoins in Distributed Databases Dependent Employee network Employee ⨝SSN=EmpSSN (sage>71 (Dependent)) T(SSN) = PSSNsage>71 (Dependents) R = Employee⨝SSN=EmpSSNT = Employee ⋉SSN=EmpSSN (sage>71 (Dependents))

  31. Semijoins in Distributed Databases Dependent Employee network Employee ⨝SSN=EmpSSN (sage>71 (Dependent)) T(SSN) = PSSNsage>71 (Dependents) R = Employee ⋉SSN=EmpSSNT Answer = R ⨝SSN=EmpSSNDependents

  32. Joins R US • The join operation in all its variants (eq-join, natural join, semi-join, outer-join) is at the heart of relational database systems • WHY ? Dan Suciu -- CSEP544 Fall 2010

  33. Operators on Bags • Duplicate elimination d d(R) = select distinct * from R • Grouping g gA,sum(B) (R) = select A,sum(B) from R group by A • Sorting t Dan Suciu -- CSEP544 Fall 2010

  34. y.seller-ssn=z.ssn y.pid=u.pid x.ssn=y.buyer-ssn Complex RA Expressions gu.name, count(*) Person x Purchase y Person z Product u Pssn Ppid sname=fred sname=gizmo Dan Suciu -- CSEP544 Fall 2010

  35. RA = Dataflow Program • Several operations, plus strictly specified order • In RDBMS the dataflow graph is always a tree • Novel applications (s.a. PIG), dataflow graph may be a DAG Dan Suciu -- CSEP544 Fall 2010

  36. Limitations of RA • Cannot compute “transitive closure” • Find all direct and indirect relatives of Fred • Cannot express in RA !!! Need to write Java program • Remember the Bacon number ? Needs TC too ! Dan Suciu -- CSEP544 Fall 2010

  37. Steps of the Query Processor SQL query Parse & Rewrite Query Logicalplan Select Logical Plan Queryoptimization Select Physical Plan Physicalplan Query Execution Disk

  38. Example Database Schema Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) View: Suppliers in Seattle CREATE VIEW NearbySupp AS SELECT sno, sname FROM Supplier WHERE scity='Seattle' AND sstate='WA' Dan Suciu -- CSEP544 Fall 2010

  39. Example Query Find the names of all suppliers in Seattle who supply part number 2 SELECT sname FROM NearbySupp WHERE sno IN ( SELECT sno FROM Supplies WHERE pno = 2 ) Dan Suciu -- CSEP544 Fall 2010

  40. Steps in Query Evaluation • Step 0: Admission control • User connects to the db with username, password • User sends query in text format • Step 1: Query parsing • Parses query into an internal format • Performs various checks using catalog • Correctness, authorization, integrity constraints • Step 2: Query rewrite • View rewriting, flattening, etc. Dan Suciu -- CSEP544 Fall 2010

  41. Rewritten Version of Our Query Original query: Rewritten query: SELECT sname FROM NearbySupp WHERE sno IN ( SELECT sno FROM Supplies WHERE pno = 2 ) SELECT S.sname FROM Supplier S, Supplies U WHERE S.scity='Seattle' AND S.sstate='WA’ AND S.sno = U.sno AND U.pno = 2; Dan Suciu -- CSEP544 Fall 2010

  42. Continue with Query Evaluation • Step 3: Query optimization • Find an efficient query plan for executing the query • A query plan is • Logical query plan: an extended relational algebra tree • Physical query plan: with additional annotations at each node • Access method to use for each relation • Implementation to use for each relational operator Dan Suciu -- CSEP544 Fall 2010

  43. Extended Algebra Operators • Union , intersection , difference - • Selection s • Projection  • Join ⨝ • Duplicate elimination d • Grouping and aggregation g • Sorting t • Rename  Dan Suciu -- CSEP544 Fall 2010

  44. Logical Query Plan Πsname sscity=‘Seattle’ sstate=‘WA’  pno=2 sno = sno Suppliers Supplies Dan Suciu -- CSEP544 Fall 2010

  45. Query Block • Most optimizers operate on individual query blocks • A query block is an SQL query with no nesting • Exactly one • SELECT clause • FROM clause • At most one • WHERE clause • GROUP BY clause • HAVING clause Dan Suciu -- CSEP544 Fall 2010

  46. Typical Plan for Block (1/2)  fields selection condition SELECT-PROJECT-JOIN Query join condition … join condition R S Dan Suciu -- CSEP544 Fall 2010

  47. Typical Plan For Block (2/2) σhaving-ondition fields, sum/count/min/max(fields) selection condition join condition … … Dan Suciu -- CSEP544 Fall 2010

  48. Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) How about Subqueries? SELECTQ.sno FROM Supplier Q WHEREQ.sstate = ‘WA’ and not exists SELECT * FROM Supply PWHEREP.sno = Q.sno and P.price > 100 Dan Suciu -- CSEP544 Fall 2010

  49. Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) How about Subqueries? SELECTQ.sno FROM Supplier Q WHEREQ.sstate = ‘WA’ and notexists SELECT * FROM Supply PWHEREP.sno = Q.sno and P.price > 100 Correlation ! Dan Suciu -- CSEP544 Fall 2010

  50. Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) How about Subqueries? SELECTQ.sno FROM Supplier Q WHEREQ.sstate = ‘WA’ and not exists SELECT * FROM Supply PWHEREP.sno = Q.sno and P.price > 100 De-Correlation SELECTQ.sno FROM Supplier Q WHEREQ.sstate = ‘WA’ andQ.snonotin SELECTP.sno FROM Supply PWHEREP.price > 100 Dan Suciu -- CSEP544 Fall 2010

More Related