1 / 17

GSLPI: a Cost-based Query Progress Indicator

GSLPI: a Cost-based Query Progress Indicator. Jiexing Li # , Rimma Nehme * , Jeff Naughton # # University of Wisconsin-Madison * Microsoft Jim Gray Systems Lab. Progress indicator (PI).

whitby
Download Presentation

GSLPI: a Cost-based Query Progress Indicator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GSLPI: a Cost-based Query Progress Indicator Jiexing Li#, RimmaNehme*, Jeff Naughton# #University of Wisconsin-Madison *Microsoft Jim Gray Systems Lab

  2. Progress indicator (PI) • A PI provides feedback to users on how much of the task has been completedor when the task will finish. • It is useful for workload management, admission control, skew handling, etc.

  3. Main results • Existing PIs generate inaccurate results, if a query has phases with different performance characteristics. • A new PI that provides more accurate predictions by using • optimizer’s cost estimates, and • deeper analysis of query pipelines.

  4. Execution plan • Blocking operator: no outputs until the operator has consumed at least one of its inputs (e.g., sort and hash match) • Pipeline: a subtree of concurrently executing operators delimited by blocking operators Hash Match Hash Match Pipeline: P2 Filter Filter Table Scan [B] Table Scan [B] Pipeline: P1 Table Scan [A] Table Scan [A] An execution plan

  5. Basic ideas of previous PIs • Pipelines are processed one after another. • Each of them processes a certain number of tuples. • Speed: how fast tuples can be processed. • Estimated cardinalities • Current speed (future speed is unknown) Time P1 Pn P2 … Cardinality N1 Cardinality Nn Cardinality N2 Estimated remaining time

  6. Limitations of previous PIs • select count(distinct orderkey) from lineitem Compute Scalar P2 Hash Match (orderkey) Table Scan [lineitem] P1 P1 P2 Execution plan Test results for TPC-H 10GB data

  7. Limitations of Previous PIs (cont.) • select count(distinct orderkey) from lineitem • Before P2 starts, only the processing speed for P1 is known. • Over estimates the execution time for P2 by using the current speed of P1 for future pipeline P2. P1 P2

  8. GSLPI improvement • Different phases have different processing speeds. • How does GSLPI estimate the processing speeds for future pipelines? P2 P1 Cardinality N1 Cardinality N2 Speed S1 Speed S2 ?

  9. GSLPI: a cost-based progress indicator P1 Pn P2 … Cardinality N1 Cardinality Nn Cardinality N2 Further division (uniform speed) P1 P2 Pn P’2 … N’2 N2 N1 Nn ? Cost: work to process a tuple Cost C1 Cost Cn Cost C’2 Cost C2 ? Speed: adjusts based on the cost Speed S1 Speed S2 Speed S’2 Speed Sn Estimates the remaining time Estimated T1 Estimated T2 Estimated T’2 Estimated Tn

  10. GSLPI: Cost • Redistributes the cost for operators on the boundaries. • P1 scans a tuple and inserts it into the hash table. • P2 fetches a tuple from the hash table and counts. • Splits the cost for Hash Match operator. • Identifies bottleneck cost. • Due to concurrent execution. select count(distinct orderkey) from lineitem Estimated [CPU, IO] Compute Scalar [0, 0] P2 Hash Match (orderkey) [275, 0] Table Scan [lineitem] P1 [66, 767] Execution plan

  11. GSLPI: Speed • select count(distinct orderkey) from lineitem • P2 should be able to process its tuples much faster than P1. • Adjusts the speeds for future pipelines based on how difficult it is to process a tuple. Compute Scalar P2 Hash Match (orderkey) Table Scan [lineitem] P1 Execution plan

  12. Experimental evaluation • Database: 10GB TPC-H. • Setup: Intel Core 2 Duo CPU and 4GB RAM. • Tested on 22 TPC-H queries. • They usually contain 6 to 16 operators, which may be divided into 3 to 9 different pipelines. • Estimated the remaining time in every 10 seconds.

  13. Example of progress estimation P5 P4 P6, P7 P1, P2, P3 Experiment results for TPC-H Q5

  14. Error metric • Similar to Q5, 20 out of 22 TPC-H queries process tuples at different speeds during their executions. • Estimated percentage of completion: • Error: the distance between fi and the actual percentage of completion. • For each query, we calculated the average error and the maximum error. fi

  15. GSLPI accuracy Due to cardinality estimation error!

  16. Conclusions • Investigated the limitations with previous PIs. • Developed a new progress indicator: GSLPI. • Evaluated their performance. • More accurate than previous PIs. • Remaining progress estimation error for TPC-H queries is mostly due to cardinality estimation error!

  17. Thank you!

More Related