ICS 421 Spring 2010 Query Evaluation ( i )

ICS 421 Spring 2010Query Evaluation (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa Lipyeow Lim -- University of Hawaii at Manoa

SELECT * FROM Reserves WHERE sid=101 Query Sid=101 Parse Query A B Reserves Enumerate Plans fetch SCAN (sid=101) Reserves IDXSCAN (sid=101) Reserves Estimate Cost 32.0 Index(sid) 25.0 Choose Best Plan Pick B Optimizer Evaluate Query Plan Evaluate Plan A Result Lipyeow Lim -- University of Hawaii at Manoa

Parse Query Query Parse Query • Input : SQL • Eg. SELECT-FROM-WHERE, CREATE TABLE, DROP TABLE statements • Output: Some data structure to represent the “query” • Relational algebra ? • Also checks syntax, resolves aliases, binds names in SQL to objects in the catalog • How ? Enumerate Plans Estimate Cost Choose Best Plan Evaluate Query Plan Result Lipyeow Lim -- University of Hawaii at Manoa

Enumerate Plans Query Parse Query • Input : a data structure representing the “query” • Output: a collection of equivalent query evaluation plans • Query Execution Plan (QEP): tree of database operators. • high-level: RA operators are used • low-level: RA operators with particular implementation algorithm. • Plan enumeration: find equivalent plans • Different QEPs that return the same results • Query rewriting : transformation of one QEP to another equivalent QEP. Enumerate Plans Estimate Cost Choose Best Plan Evaluate Query Plan Result Lipyeow Lim -- University of Hawaii at Manoa

Estimate Cost Query Parse Query • Input : a collection of equivalent query evaluation plans • Output: a cost estimate for each QEP in the collection • Cost estimation: a mapping of a QEP to a cost • Cost Model: a model of what counts in the cost estimate. Eg. Disk accesses, CPU cost … • Statistics about the data and the hardware are used. Enumerate Plans Estimate Cost Choose Best Plan Evaluate Query Plan Result Lipyeow Lim -- University of Hawaii at Manoa

Choose Best Plan Query Parse Query • Input : a collection of equivalent query evaluation plans and their cost estimate • Output: best QEP in the collection • The steps: enumerate plans, estimate cost, choose best plan collectively called the: • Query Optimizer: • Explores the space of equivalent plan for a query • Chooses the best plan according to a cost model Enumerate Plans Estimate Cost Choose Best Plan Evaluate Query Plan Result Lipyeow Lim -- University of Hawaii at Manoa

Evaluate Query Plan Query Parse Query • Input : a QEP (hopefully the best) • Output: Query results • Often includes a “code generation” step to generate a lower level QEP in executable “code”. • Query evaluation engine is a “virtual machine” that executes some code representing low level QEP. Enumerate Plans Estimate Cost Choose Best Plan Evaluate Query Plan Result Lipyeow Lim -- University of Hawaii at Manoa

Query Execution Plans (QEPs) • A tree of database operators: each operator is a RA operator with specific implementation • Selection : Index Scan or Table Scan • Projection π: • Without DISTINCT : Table Scan • With DISTINCT : requires sorting or index scan • Join : • Nested loop joins (naïve) • Index nested loop joins • Sort merge joins • Sort : • In-memory sort • External sort Lipyeow Lim -- University of Hawaii at Manoa

QEP Examples SELECTS.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 ANDS.rating>5 R.sid=S.sid R.sid=S.sid R.sid=S.sid R.sid=S.sid πS.sname πS.sname πS.sname πS.sname On the fly On the fly S.rating>5 S.rating>5 AND R.bid=100 S.rating>5 AND R.bid=100 Nested Loop Join On the fly R.bid=100 Nested Loop Join Temp T1 R.bid=100 S.rating>5 Reserves Sailors Reserves Sailors (SCAN) (SCAN) (SCAN) (SCAN) Reserves Reserves Sailors Sailors Lipyeow Lim -- University of Hawaii at Manoa

Access Paths R.sid=S.sid • An access path is a method of retrieving tuples. Eg. Given a query with a selection condition: • File or table scan • Index scan • Index matching problem: given a selection condition, which indexes can be used for the selection, i.e., matches the selection ? • Selection condition normalized to conjunctive normal form (CNF), where each term is a conjunct • Eg. (day<8/9/94 ANDrname=‘Paul’) OR bid=5 ORsid=3 • CNF: (day<8/9/94 OR bid=5 ORsid=3 ) AND (rname=‘Paul’ OR bid=5 ORsid=3) πS.sname On the fly Nested Loop Join Fetch Temp T1 R.bid=100 S.rating>5 R.bid=100 Reserves (SCAN) (SCAN) (IDXSCAN) Reserves Sailors Index(R.bid) Lipyeow Lim -- University of Hawaii at Manoa

Index Matching Q1: a=5 AND b=3 I1: Tree Index (a,b,c) Q2: a=5 AND b>6 • A tree index matches a selection condition if the selection condition is a prefix of the index search key. • A hash index matches a selection condition if the selection condition has a term attribute=value for every attribute in the index search key Q3: b=3 I2: Tree Index (b,c,d) Q4: a=5 AND b=3 AND c=5 I3: Hash Index (a,b,c) Q5: a>5 AND b=3 AND c=5 Lipyeow Lim -- University of Hawaii at Manoa

One Approach to Selections • Find the most selective access path, retrieve tuples using it • Apply remaining terms in selection not matched by the chosen access path • The selectivity of an access path is the size of the result set (in terms of tuples or pages). • Sometimes selectivity is also used to mean reduction factor: fraction of tuples in a table retrieved by the access path or selection condition. • Eg. Consider the selection: day<8/9/94 AND bid=5 ANDsid=3 • Tree Index(day) • Hash index (bid,sid) Lipyeow Lim -- University of Hawaii at Manoa

Join Algorithms • Cost model • Single DBMS server: I/Os in number of pages • Distributed DBMS: network I/Os + local disk I/Os • td : time to read/write one page to local disk • ts: time to ship one page over the network to another node • Single server: • Nested Loop Join • Index Nested Loop Join • Sort Merge Join • Hash Join • Distributed: • Semi-Join • Bloom Join Lipyeow Lim -- University of Hawaii at Manoa

Nested Loop Join S1 R1 For each data page PS1 of S1 For each tuple s in PS1 For each data page PR1 of R1 For each tuple r in PR1 if (s.sid==r.sid) then output s,r • Worst case number of local disk reads = Npages(S1) + |S1|*Npages(R1) Lipyeow Lim -- University of Hawaii at Manoa

Index Nested Loop Join S1 Index(R1.sid) R1 For each data page PS1 of S1 For each tuple s in PS1 if (s.sid  Index(R1.sid)) then fetch r & output <s,r> • Worst case number of local disk reads with tree index = Npages(S1) + |S1|*( 1 + logFNpages(R1)) • Worst case number of local disk reads with hash index = Npages(S1) + |S1|* 2 Lipyeow Lim -- University of Hawaii at Manoa

Sort Merge Join S1 R1 • Sort S1 on SID • Sort R1 on SID • Compute join on SID using Merging algorithm • If join attributes are relatively unique, the number of disk pages = Npages(S1) log Npages(S1) + Npages(R1) log Npages(R1) + Npages(S1) + Npages(R1) • If the number of duplicates in the join attributes is large, the number of disk pages approaches that of nested loop join. Lipyeow Lim -- University of Hawaii at Manoa

Distributed Joins • Consider: • Reserves join Sailors • Depends on: • Which node get the query • Whether tables are fragmented/partitioned or not • Node 1 gets query • Perform join at Node 3 (or 4) ship results to Node 1 ? • Ship tables to Node 1 ? • Node 3 gets query • Fetch sailors in loop ? • Cache sailors locally ? • Node 1 • Node 2 • Node 3 • Node 4 Network Boats1 Boats2 Reserves Sailors Lipyeow Lim -- University of Hawaii at Manoa

Distributed Joins over Fragments R join S = R.sid=S.sid (R  S) = R.sid=S.sid ((R1R2)  (S1 S2)) = R.sid=S.sid ((R1  S1)  (R1 S2)  (R2  S1)  (R2 S2)) = R.sid=S.sid (R1  S1)  R.sid=S.sid(R1 S2)  R.sid=S.sid (R2  S1)  R.sid=S.sid(R2 S2) = (R1 join S1)  (R1 join S2)  (R2 join S1)  (R2 join S2) • Node 1 • Node 2 • Node 3 • Node 4 Network This equivalence applies to splitting a relation into pages in a single server DBMS system too! Reserves1 Reserves2 Sailors1 Sailors2 Equivalent to a union of joins over each pair of fragments Lipyeow Lim -- University of Hawaii at Manoa

Distributed Nested Loop • Consider performing R1 join S2 on Node 1 • Page-oriented nested loop join: For each page r of R1 Fetch r from local disk For each page s of S2 Fetch s if scache Output r join s • Cost = Npages(R1)* td + Npages(R1)*Npages(S2)*(td + ts) • If cache can hold entire S2, cost is Npages(R1)* td+Npages(S2)* ts +Npages(R1)*Npages(S2)*td • Node 1 • Node 2 Network foreach R1 page r R1 S2 Fetch S2 page s r join s Lipyeow Lim -- University of Hawaii at Manoa

Semijoins • Consider performing R1 join S2 on Node 1 • S2 needs to be shipped to R1 • Does every tuple in S2 join with R1 ? • Semijoin: • Don’t ship all of S2 • Ship only those S2 rows that will join with R1 • Assumes that the join causes a reduction in S2! • Cost = Npages(R1)*td + Npages(πsidR1)*ts + Cost() + Npages(sidjsidS2)*ts + Cost(R1 join sidjsidS2) • Node 1 • Node 2 Network πsidR1 (jsid, πsidR1 πsidS2) R1 S2 sidjsidS2 R1 join sidjsidS2 Lipyeow Lim -- University of Hawaii at Manoa

Bloomjoins • Consider performing R1 join S2 on Node 1 • Can we do better than semijoin ? • Bloomjoin: • Don’t ship all of (πsidR1) • Node 1: Ship a “bloom filter” (like a signature) of (πsidR1) • Hash each sid • Set the bit for hash value in a bit vector • Send the bit vector v1 • Node 2: • Hash each (πsidS2) to bit vector v2 • Computer (v1  v2) • Send rows of S2 in the intersection • False positives • Node 1 • Node 2 Network v1=Bloom (πsidR1) v2=Bloom(πsidS2) R1 S2 jsid=v1v2 sidjsidS2 R1 join sidjsidS2 Lipyeow Lim -- University of Hawaii at Manoa

ICS 421 Spring 2010 Query Evaluation ( i )