Parallel Execution Plans. Joe Chang [email protected] www.qdpma.com. About Joe Chang. SQL Server Execution Plan Cost Model True cost structure by system architecture Decoding statblob (distribution statistics) SQL Clone – statistics-only database Tools
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Yes, this can happen, how will you know
No I have not been smoking pot
How much in CPU do I pay for this?
Great management tool, what else?
Reference: Adam Machanic PASS
I/O and CPU Cost components
Estimated Execution Plan
Cost is duration in seconds on some reference platform
IO Cost for scan: 1 = 10,800KB/s, 810 implies 8,748,000KB
IO in Nested Loops Join: 1 = 320/s, multiple of 0.003125
Actual CPU Time (Data in memory)
LU 1919 1919
Scan 8736 8727
(926.67- 323655 * 0.0001581) / 0.003125 = 280160 (86.6%)
True cross-over approx 1,400,000 rows
1 row : page
1,093,729 pages/1350 = 810.17 (8,748MB)
Actual CPU Time
LU 2138 321
Scan 18622 658
(817- 280326 * 0.0001581) / 0.003125 = 247259 (88%)
8748000KB/8/1350 = 810
Note Actual Number
of Rows, Rebinds,
For Loop Join inner source and Key Lookup,
Actual Num Rows = Num of Exec × Num of Rows
Note: gold circle with double arrow, and parallelism operations
IO Cost same
CPU reduce by degree of parallelism, except no reduction for DOP 16
IO contributes most of cost!
CPU cost only reduces
On a weak storage system, a single thread can saturate the IO channel,
Additional threads will not increase IO (reduce IO duration).
A very powerful storage system can provide IO proportional to the number of threads. It might be nice if this was optimizer option?
The IO component can be a very large portion of the overall plan cost
Not reducing IO cost in parallel plan may inhibit generating favorable plan,
i.e., not sufficient to offset the contribution from the Parallelism operations.
A parallel execution plan is more likely on larger systems (-P to fake it?)
Search: Understanding Hash Joins
For In-memory, Grace, Recursive
CPU Cost is linear with number of rows, outer and inner source
See BOL on Hash Joins for In-Memory, Grace, Recursive
IO Cost is zero for small intermediate data size,
beyond set point proportional to server memory(?)
IO is proportional to excess data (beyond in-memory limit)
Parallel Plan: Memory allocation is per thread!
Summary: Hash Join plan cost depends on memory if IO component is not zero, in which case is disproportionately lower with parallel plans. Does not reflect real cost?
BOL: Optimizing Data Warehouse Query Performance Through Bitmap Filtering
A bitmap filter uses a compact representation of a set of values from a table in one part of the operator tree to filter rows from a second table in another part of the tree. Essentially, the filter performs a semi-join reduction; that is, only the rows in the second table that qualify for the join to the first table are processed.
SQL Server uses the Bitmap operator to implement bitmap filtering in parallel query plans. Bitmap filtering speeds up query execution by eliminating rows with key values that cannot produce any join records before passing rows through another operator such as the Parallelism operator. A bitmap filter uses a compact representation of a set of values from a table in one part of the operator tree to filter rows from a second table in another part of the tree. By removing unnecessary rows early in the query, subsequent operators have fewer rows to work with, and the overall performance of the query improves. The optimizer determines when a bitmap is selective enough to be useful and in which operators to apply the filter. For more information, see Optimizing Data Warehouse Query Performance Through Bitmap Filtering.
1) Split large chunk of work among threads,
2) Each thread works independently,
3) Small amount of coordination to consolidate threads
1) Split large chunk of work among threads,
2) Each thread works on first stage
3) Large coordination effort between threads
4) More work
No Repartition Streams
No Repartition Streams operations!
8-way Quad-Core Opteron
Windows Server 2008 R2
SQL Server 2008 SP1 + HF 27
CPU-sec to SUM 1 or 2 columns in Line Item
Speed up relative to DOP 1
8-way systems were configured for AD- not good!