80 likes | 154 Views
Explore query optimization strategies in relational algebra, analyzing implementation choices and order of operators to create efficient query plans. Learn about various join methods and their costs, such as Nested Loops, Hash Join, and Sort/Merge Join. Gain insights into cost estimation, reduction factors, and algorithm selection for operators.
E N D
CS186 Final Review Query Optimization
sname rating > 5 bid=100 sid=sid Sailors Reserves Overview • Query Relational Algebra Tree • Implementation choices at each operator • Order of operators • What is a query plan? • Tree of R.A. ops (and some others) with choice of algorithm for each op. ?
Join • Joins (Chapter 14.4) • (Page-oriented) Nested loops join • Blocked nested loops: • simple, exploits buffer pages • Indexed nested loops: • best if 1 relation small and one indexed • Sort/Merge Join • sort cost + merge cost (M+N M*N) • Hash Join • partition cost (2(M+N)) + match cost (M+N)
(On-the-fly) sname (On-the-fly) (Page-Oriented Nested loops) sname sid=sid (Scan & Write to temp T2) (On-the-fly) rating > 5 bid=100 bid=100 rating>5 (On-the-fly) (Page-Oriented Nested loops) Sailors Reserves sid=sid Reserves Sailors Query Tree • Rating from 1 to 10 • 100 boats • |S| = 500, |R|=1000 SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5
Query Tree • Left-deep plans • Fully pipelined plans: Intermediate results not written to temporary files. • All left-deep trees are fully pipelined? (SM join?) • Relational Algebra Equivalences • Chapter 15.3 • Cost estimation • Reduction Factors (also called “selectivity”) • values are uniformly distributed • histogram
(On-the-fly) sname (Sort-Merge Join) sid=sid (Scan; (Scan; write to write to rating > 5 bid=100 temp T2) temp T1) (On-the-fly) sname Reserves Sailors (Page-Oriented Nested loops) sid=sid (Scan & Write to temp T2) rating > 5 bid=100 (On-the-fly) Reserves Sailors Choice of Algorithm for Operator • Rating from 1 to 10, 100 boats • |S| = 500, |R|=1000 4250 IOs 3560 IOs
Choice of Algorithm for Operator • What affect choices? • # Buffer Pages • Statistic of the relation • Index? • Clustered • Unclustered: access each tuple 1 I/O
Conclusion • What you should know? • Given a SQL query, how to give the best (good) query plan? • Best plan for accessing each relation • Enumeration to get trees (left-deep join) • Select algorithm for each operator • Cost estimation • # of Buffer? • Index