1 / 40

Modern Performance - SQL Server

Modern Performance - SQL Server. Joe Chang www.qdpma.com & SolidQ. About Joe. SQL Server consultant since 1999 Query Optimizer execution plan cost formulas (2002) True cost structure of SQL plan operations (2003?) Database with distribution statistics only, no data 2004

kedem
Download Presentation

Modern Performance - SQL Server

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modern Performance - SQL Server Joe Chang www.qdpma.com & SolidQ

  2. About Joe • SQL Server consultant since 1999 • Query Optimizer execution plan cost formulas (2002) • True cost structure of SQL plan operations (2003?) • Database with distribution statistics only, no data 2004 • Decoding statblob/stats_stream • writing your own statistics • Disk IO cost structure • Tools for system monitoring, execution plan analysis See ExecStats http://www.qdpma.com/ExecStats/SQLExecStats.html Download: http://www.qdpma.com/ExecStatsZip.html Blog: http://sqlblog.com/blogs/joe_chang/default.aspx

  3. Overview • Why performance is still important today? • Brute force? • Yes, but … • Special Topics • Automating data collections • SQL Server Engine • What developers/DBA need to know?

  4. CPU & Memory 2001 and 2012 P P L2 PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E FSB PCI-E PCI-E MCH QPI QPI QPI QPI PCI-E PCI-E PCI-E PCI-E P P C3 C3 C3 C3 LLC LLC LLC LLC C4 C4 C4 C4 QPI C2 C2 C2 C2 C5 C5 C5 C5 2001 – 4 sockets, 4 cores Pentium III Xeon, 900MHz 4-8GB memory? Xeon MP 2002-4 C1 C1 C1 C1 C6 C6 C6 C6 C0 C0 C0 C0 C7 C7 C7 C7 MI MI MI MI MI MI MI MI QPI QPI DMI 2 PCI-E PCI-E PCI-E PCI-E PCI-E QPI 2012 – 4 sockets, 8 cores each 4 x 8 = 32 cores total 768GB (48 x 16GB), Westmere-EX 1TB • 15 cores in next generation? Each core today is more than 10x over PIII PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E 16GB $191 32GB $794

  5. Storage 2001 versus 2012/13 QPI 192 GB 192 GB MCH SSD SSD SSD SSD QPI PCI PCI PCI PCI PCIe x4 PCIe x8 PCIe x8 PCIe x8 PCIe x8 PCIe x8 HDD HDD HDD HDD RAID RAID RAID RAID 10GbE IB RAID RAID RAID RAID HDD HDD HDD HDD HDD HDD HDD HDD 2001 100 x 10K HDD 125 IOPS each = 12.5K IOPS IO Bandwidth limited: 1.3GB/s (1/3 memory bandwidth) 2013 64 SSDs, >10K+ IOPS each, 1M IOPS possible IO Bandwidth 10GB/s easy SAN vendors – questionable BW http://www.qdpma.com/Storage/Storage2013.html http://www.qdpma.com/ppt/Storage_2013.pptx

  6. SAN Node 1 Node 2 Node 1 Node 2 768 GB 768 GB 768 GB 768 GB 8 Gbps FC or 10Gbps FCOE x8 x8 x8 x8 x8 x8 SSD SSD Switch Switch 8 Gb FC Switch Switch SP A SP B 24 GB 24 GB SP A SP B x4 SAS 2GB/s 24 GB 24 GB x4 SAS 2GB/s Main Volume Data 5 Data 16 Data 12 Data 15 Data 6 Data 7 Data 13 Data 9 Data 8 Data 11 Data 3 Data 4 Log 4 Log 3 Log 1 Log 2 Data 14 Data 1 Data 2 Data 10 Log volume Hot Spares SSD 10K 7.2K SSD 1 SSD 2 SSD 3 SSD 4 http://sqlblog.com/blogs/joe_chang/archive/2013/05/10/enterprise-storage-systems-emc-vmax.aspx http://sqlblog.com/blogs/joe_chang/archive/2013/02/25/emc-vnx2-and-vnx-future.aspx

  7. Performance Past, Present, Future • When will servers be so powerful that … • Been saying this for a long time • Today – 10 to 100X overkill • 32-cores, 60-cores later in 2013? • Enough memory that IO is only sporadic • Unlimited IOPS with SSD • What can go wrong? Today’s topic

  8. Factors to Consider SQL Tables Indexes Compile Parameters Statistics Query Optimizer Storage Engine Hardware

  9. Special Topics • Data type mistmatch • Multiple Optional Search Arguments (SARG) • Function on SARG • Parameter Sniffing versus Variables • Statistics related (big topic) • first OR, then AND/OR combinations • Complex Query with sub-expressions • Parallel Execution Not in order of priority http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

  10. 1a. Data type mismatch DECLARE@namenvarchar(25) = N'Customer#000002760' SELECT * FROM CUSTOMER WHERE C_NAME = @name SELECT* FROM CUSTOMER WHERE C_NAME = CONVERT(varchar, @name) .NET auto-parameter discovery? Unable to use index seek

  11. 1b. Type Mismatch – Row Estimate SELECT* FROM CUSTOMER WHERE C_NAME LIKEN'Customer#00000276%' SELECT* FROM CUSTOMER WHERE C_NAME LIKE'Customer#00000276%' Row estimate error could have severe consequences in a complex query

  12. SELECT TOP plus Row Estimate Error SELECT TOP 1000[Document].[ArtifactID]FROM[Document](NOLOCK)WHERE[Document].[AccessControlListID_D]IN(1,1000064,1000269)AND EXISTS (SELECT[DocumentBatch].[BatchArtifactID]FROM[DocumentBatch](NOLOCK)INNER JOIN[Batch](NOLOCK)ON[Batch].ArtifactID=[DocumentBatch].[BatchArtifactID]WHERE [DocumentBatch].[DocumentArtifactID]=[Document].[ArtifactID]AND[Batch].[Name]LIKEN'%Value%')ORDER BY[Document].[ArtifactID] Data type mismatch – results in estimate rows high Top clause – easy to find first 1000 rows In fact, there are few rows that match SARG Wrong plan for evaluating large number of rows http://www.qdpma.com/CBO/Relativity.html

  13. 2. Multiple Optional SARG DECLARE@Orderkeyint, @Partkeyint = 1 SELECT* FROMLINEITEM WHERE (@Orderkey IS NULL OR L_ORDERKEY = @Orderkey) AND (@PartkeyIS NULL OR L_PARTKEY = @Partkey) AND (@Partkey IS NOT NULL OR @Orderkey IS NOT NULL)

  14. Dynamically Built Parameterized SQL DECLARE@Orderkeyint,@Partkeyint=1 ,@SQLnvarchar(500),@Paramnvarchar(100) SELECT@SQL= N‘/* Comment */ SELECT * FROM LINEITEM WHERE 1=1‘ ,@Param=N'@Orderkey int, @Partkey int' IF (@OrderkeyISNOTNULL) SELECT@SQL=@SQL+N' AND L_ORDERKEY = @Orderkey' IF (@PartkeyISNOTNULL) SELECT@SQL=@SQL+N' AND L_PARTKEY = @Partkey' PRINT@SQL execsp_executesql@SQL,@Param,@Orderkey,@Partkey IF block is easier for few options Dynamically built parameterized SQL better for many options Considering /*comment*/ to help identify this

  15. IF block DECLARE@Orderkeyint, @Partkeyint = 1 IF (@OrderkeyISNOTNULL) SELECT* FROMLINEITEM WHERE (L_ORDERKEY = @Orderkey) AND(@PartkeyIS NULL OR L_PARTKEY = @Partkey) ELSE IF (@PartkeyISNOTNULL) SELECT * FROM LINEITEM WHERE(L_PARTKEY = @Partkey) This is actually the stored procedure parameters Need to consider impact of Parameter Sniffing, Consider the OPTIMIZER FOR hint

  16. 2b. Function on column SARG SELECTCOUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHEREYEAR(L_SHIPDATE) = 1995 ANDMONTH(L_SHIPDATE) = 1 SELECTCOUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHEREL_SHIPDATE BETWEEN'1995-01-01'AND'1995-01-31' DECLARE@Startdate date, @Daysint = 1 SELECTCOUNT(*), SUM(L_EXTENDEDPRICE) FROMLINEITEM WHEREL_SHIPDATE BETWEEN@Startdate ANDDATEADD(dd,1,@Startdate)

  17. Estimated versus Actual Plan - rows Estimated Plan – 1 row??? Actual Plan – actual rows 77,356

  18. 3 Parameter Sniffing -- first call, procedure compiles with these parameters execp_Report @startdate = '2011-01-01', @enddate = '2011-12-31' -- subsequent calls, procedure executes with original plan execp_Report @startdate = '2012-01-01', @enddate = '2012-01-07' Assuming date data type Need different execution plans for narrow and wide range Options: 1) WITH RECOMPILE 2) main procedure calls 1 of 2 identical sub-procedures One sub-procedure is only called for narrow range Other called for wide range Skewed data distributions also important Example: Large & small customers

  19. 4 Statistics • Auto-recompute points • Sampling strategy • Percentage • Random pages versus random rows • Histogram Equal and Range Rows • Out of bounds, value does not exist • etc Statistics Used by the Query Optimizer in SQL Server 2008 Writer: Eric N. Hanson and YavorAngelov Contributor: LuborKollar http://msdn.microsoft.com/en-us/library/dd535534.aspx

  20. Statistics Structure • Stored (mostly) in binary field Scalar values Density Vector – limit 30, half in NC, half Cluster key Histogram Up to 200 steps Consider not blindly using IDENTITY on critical tables Example: Large customers get low ID values Small customers get high ID values http://sqlblog.com/blogs/joe_chang/archive/2012/05/05/decoding-stats-stream.aspx

  21. Statistics Auto/Re-Compute • Automatically generated on query compile • Recompute at 6 rows, 500, every 20%? Has this changed?

  22. Statistics Sampling • Sampling theory • True random sample • Sample error - square root N • Relative error 1/ N • SQL Server sampling • All rows in random pages

  23. Row Estimate Problems • Skewed data distribution • Out of bounds • Value does not exist

  24. Loop Join - Table Scan on Inner Source Estimated out from first 2 tabes (at right) is zero or 1 rows. Most efficient join to third table (without index on join column) is a loop join with scan. If row count is 2 or more, then a fullscan is performed for each row from outer source Default statistics rules may lead to serious ETL issues Consider custom strategy

  25. Compile Parameter Not Exists Main procedure has cursor around view_Servers First server in view_Servers is ’CAESIUM’ Cursor executes sub-procedure for each Server sql: SELECTMAX(ID) FROM TReplWS WHERE Hostname = @ServerName But CAESIUM does not exist in TReplWS!

  26. Good and Bad Plan?

  27. SqlPlan Compile Parameters

  28. SqlPlan Compile Parameters <?xmlversion="1.0"encoding="utf-8"?> <ShowPlanXMLxmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"Version="1.1"Build="10.50.2500.0"> <BatchSequence> <Batch> <Statements> <StmtSimpleStatementText="@ServerName varchar(50) SELECT @maxid = ISNULL(MAX(id),0) FROM TReplWS WHERE Hostname = @ServerName" StatementId="1"StatementCompId="43"StatementType="SELECT"StatementSubTreeCost="0.0032843"StatementEstRows="1" StatementOptmLevel="FULL"QueryHash="0x671D2B3E17E538F1"QueryPlanHash="0xEB64FB22C47E1CF2" StatementOptmEarlyAbortReason="GoodEnoughPlanFound"> <StatementSetOptionsQUOTED_IDENTIFIER="true"ARITHABORT="false"CONCAT_NULL_YIELDS_NULL="true"ANSI_NULLS="true" ANSI_PADDING="true"ANSI_WARNINGS="true"NUMERIC_ROUNDABORT="false" /> <QueryPlanCachedPlanSize="16"CompileTime="1"CompileCPU="1"CompileMemory="168"> <RelOpNodeId="0"PhysicalOp="Compute Scalar"LogicalOp="Compute Scalar" EstimateRows="1"EstimateIO="0"EstimateCPU="1e-007“ AvgRowSize="15"EstimatedTotalSubtreeCost="0.0032843"Parallel="0"EstimateRebinds="0"EstimateRewinds="0"> </RelOp> <ParameterList> <ColumnReferenceColumn="@ServerName"ParameterCompiledValue="'CAESIUM'" /> </ParameterList> </QueryPlan> </StmtSimple> </Statements> </Batch> </BatchSequence> </ShowPlanXML> Compile parameter values at bottom of sqlplan file

  29. 5a Single Table OR -- Single table SELECT*FROMLINEITEM WHEREL_ORDERKEY= 1 ORL_PARTKEY=184826

  30. 5a Join 2 Tables, OR in SARG -- subsequent calls, procedure executes with original plan SELECTO_ORDERDATE,O_ORDERKEY,L_SHIPDATE,L_QUANTITY FROMLINEITEM INNERJOINORDERSONO_ORDERKEY=L_ORDERKEY WHEREL_PARTKEY=184826 ORO_CUSTKEY= 137099

  31. 5a UNION instead of OR SELECTO_ORDERDATE,O_ORDERKEY,L_SHIPDATE,L_QUANTITY,O_CUSTKEY,L_PARTKEY FROMLINEITEMINNERJOINORDERSONO_ORDERKEY=L_ORDERKEY WHEREL_PARTKEY= 184826 UNION (ALL) SELECTO_ORDERDATE,O_ORDERKEY,L_SHIPDATE,L_QUANTITY,O_CUSTKEY,L_PARTKEY FROMLINEITEMINNERJOINORDERSONO_ORDERKEY=L_ORDERKEY WHEREO_CUSTKEY= 137099 -- AND (L_PARTKEY<> 184826 OR L_PARTKEY IS NULL) -- Caution: select list should have keys to ensure correct rows UNION removes duplicates UNION ALL does not -- Hugo Kornelis trick --

  32. 5b AND/OR Combinations SELECT xx FROM A WHERE col1 IN (expr1) AND col2 NOT IN (expr2) SELECT xx FROM A WHERE (expr1) AND (expr2 OR expr3) • Hash Join is good method to process many rows • Requirement is equality join condition • In complex SQL with AND/OR or IN NOT IN combinations • Query optimizer may not be to determine that equality join condition exists • Execution plan will use loop join, • and attempt to force hash join will be rejected • Re-write using UNION in place of OR • And LEFT JOIN in place of NOT IN More on AND/OR combinations: http://www.qdpma.com/CBO/Relativity3.html

  33. Complex Query with Sub-expression • Query complexity – really high compile cost • Repeating sub-expressions (including CTE) • Must be evaluated multiple times • Main Problem - Row estimate error propagation • Solution/Strategy – Get a good execution plan • Temp table when estimate is high, actual is low. When Estimate is low, and actual rows is high, need to balance temp table insert overhead versus plan benefit. Would a join hint work? More on AND/OR combinations: http://www.qdpma.com/CBO/Relativity4.html http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

  34. Temp Table and Table Variable • Forget what other people have said • Most is Cr@p • Temp Tables – subject to statistics auto/re-compile • Table variable – no statistics, assumes 1 row • Question: In each specific case: does the statistics and recompile help or not? • Yes: temp table • No: table variable

  35. Parallelism • Designed for 1998 era • Cost Threshold for Parallelism: default 5 • Max Degree of Parallelism – instance level • OPTION (MAXDOP n) – query level • Today – complex system – 32 cores • Plan cost 5 query might run in 10ms? • Some queries at DOP 4 • Others at DOP 16? Really need to rethink parallelism / NUMA strategies More on Parallelism: http://www.qdpma.com/CBO/ParallelismComments.html http://www.qdpma.com/CBO/ParallelismOnset.html

  36. Full-Text Search Loop Join with FT as inner Source Full Text search Potentially executed many times

  37. varchar(max) stored in lob pages • Disk IO to lob pages is synchronous? • Must access row to get 16 byte link? • Feature request: index pointer to lob

  38. Summary • Hardware today is really powerful • Storage may not be – SAN vendor disconnect • Standard performance practice • Top resource consumers, index usage • But also Look for serious blunders http://www.qdpma.com/CBO/SQLServerCostBasedOptimizer.html http://www.qdpma.com/CBO/Relativity.html http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

  39. Special Topics • Data type mismatch • Multiple Optional Search Arguments (SARG) • Function on SARG • Parameter Sniffing versus Variables • Statistics related (big topic) • AND/OR • Complex Query with sub-expressions • Parallel Execution

More Related