GSLPI: a Cost-based Query Progress Indicator

GSLPI: a Cost-based Query Progress Indicator Jiexing Li#, RimmaNehme*, Jeff Naughton# #University of Wisconsin-Madison *Microsoft Jim Gray Systems Lab

Progress indicator (PI) • A PI provides feedback to users on how much of the task has been completedor when the task will finish. • It is useful for workload management, admission control, skew handling, etc.

Main results • Existing PIs generate inaccurate results, if a query has phases with different performance characteristics. • A new PI that provides more accurate predictions by using • optimizer’s cost estimates, and • deeper analysis of query pipelines.

Execution plan • Blocking operator: no outputs until the operator has consumed at least one of its inputs (e.g., sort and hash match) • Pipeline: a subtree of concurrently executing operators delimited by blocking operators Hash Match Hash Match Pipeline: P2 Filter Filter Table Scan [B] Table Scan [B] Pipeline: P1 Table Scan [A] Table Scan [A] An execution plan

Basic ideas of previous PIs • Pipelines are processed one after another. • Each of them processes a certain number of tuples. • Speed: how fast tuples can be processed. • Estimated cardinalities • Current speed (future speed is unknown) Time P1 Pn P2 … Cardinality N1 Cardinality Nn Cardinality N2 Estimated remaining time

Limitations of previous PIs • select count(distinct orderkey) from lineitem Compute Scalar P2 Hash Match (orderkey) Table Scan [lineitem] P1 P1 P2 Execution plan Test results for TPC-H 10GB data

Limitations of Previous PIs (cont.) • select count(distinct orderkey) from lineitem • Before P2 starts, only the processing speed for P1 is known. • Over estimates the execution time for P2 by using the current speed of P1 for future pipeline P2. P1 P2

GSLPI improvement • Different phases have different processing speeds. • How does GSLPI estimate the processing speeds for future pipelines? P2 P1 Cardinality N1 Cardinality N2 Speed S1 Speed S2 ?

GSLPI: a cost-based progress indicator P1 Pn P2 … Cardinality N1 Cardinality Nn Cardinality N2 Further division (uniform speed) P1 P2 Pn P’2 … N’2 N2 N1 Nn ? Cost: work to process a tuple Cost C1 Cost Cn Cost C’2 Cost C2 ? Speed: adjusts based on the cost Speed S1 Speed S2 Speed S’2 Speed Sn Estimates the remaining time Estimated T1 Estimated T2 Estimated T’2 Estimated Tn

GSLPI: Cost • Redistributes the cost for operators on the boundaries. • P1 scans a tuple and inserts it into the hash table. • P2 fetches a tuple from the hash table and counts. • Splits the cost for Hash Match operator. • Identifies bottleneck cost. • Due to concurrent execution. select count(distinct orderkey) from lineitem Estimated [CPU, IO] Compute Scalar [0, 0] P2 Hash Match (orderkey) [275, 0] Table Scan [lineitem] P1 [66, 767] Execution plan

GSLPI: Speed • select count(distinct orderkey) from lineitem • P2 should be able to process its tuples much faster than P1. • Adjusts the speeds for future pipelines based on how difficult it is to process a tuple. Compute Scalar P2 Hash Match (orderkey) Table Scan [lineitem] P1 Execution plan

Experimental evaluation • Database: 10GB TPC-H. • Setup: Intel Core 2 Duo CPU and 4GB RAM. • Tested on 22 TPC-H queries. • They usually contain 6 to 16 operators, which may be divided into 3 to 9 different pipelines. • Estimated the remaining time in every 10 seconds.

Example of progress estimation P5 P4 P6, P7 P1, P2, P3 Experiment results for TPC-H Q5

Error metric • Similar to Q5, 20 out of 22 TPC-H queries process tuples at different speeds during their executions. • Estimated percentage of completion: • Error: the distance between fi and the actual percentage of completion. • For each query, we calculated the average error and the maximum error. fi

GSLPI accuracy Due to cardinality estimation error!

Conclusions • Investigated the limitations with previous PIs. • Developed a new progress indicator: GSLPI. • Evaluated their performance. • More accurate than previous PIs. • Remaining progress estimation error for TPC-H queries is mostly due to cardinality estimation error!

Thank you!

GSLPI: a Cost-based Query Progress Indicator

GSLPI: a Cost-based Query Progress Indicator

Presentation Transcript

Query Processing

Cost-Based Query Transformation in Oracle

Adaptive Query Processing: Progress and Challenges

Multi-Query Optimization

Personalized Query Expansion for the Web

Query Suggestion

Query Optimization, part 2

Plan Space Analysis Detecting Plan Regressions in Cost-based Query Optimizers

CS411 Database Systems

Query Optimization

Cost-based Query Scrambling for Initial Delays

Understanding Temporal Intent of User Query based on Time-based Query Classification

Data -intensive Computing Systems Query Optimization (Cost-based optimization)

Lecture 8 : SQL-Based Database Queries

Query Processing

Lecture 22: Query Execution

Chapter 13: Query Processing

Progress on coatings and grooves M.Taborelli Progress on C based coatings

Completing the Physical-Query-Plan

Status of progress of TDI indicator

CS 245: Database System Principles Notes 7: Query Optimization

CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization)