1 / 41

Designing and Tuning High Speed Data Loading

Designing and Tuning High Speed Data Loading. Thomas Kejser Senior Program Manager tkejser@microsoft.com. Agenda. Tuning Methodology Bulk Load API Basics Design Pattern and Techniques Parallelism Table Layout Tuning the SQL Server Engine Tuning the Network Stack

dee
Download Presentation

Designing and Tuning High Speed Data Loading

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing and Tuning High Speed Data Loading Thomas Kejser Senior Program Manager tkejser@microsoft.com

  2. Agenda • Tuning Methodology • Bulk Load API Basics • Design Pattern and Techniques • Parallelism • Table Layout • Tuning the SQL Server Engine • Tuning the Network Stack • Tuning Integration Services

  3. Tuning ETL and ELT Tuning Methodology

  4. The Tuning Loop • Get a baseline • Make small change at a time • Agree on targets for optimization • Actual runtime • CPU, Memory, I/O • The greedy tuner: • “Tune it till it breaks, then fix it, so you can break it again”

  5. Tools of the Trade - Windows • Perfmon • Logical Disk • Memory • Processor • Process (specifically the DTEXEC process) • Network Interface • Task Manager • WinDbg • KernRate

  6. Tool of the Trade – SQL Server • Sys.dm_os_wait_stats • All my tuning starts here • Get familiar with common wait types • Sys.dm_os_latch_stats • Allows deep dive into LATCH_<X> waits • Sys.dm_os_spinlock_stats • When too much CPU seems to be spend • Sys.dm_io_virtual_filestats • Because I/O systems are rarely perfect

  7. Designing and Tuning High Speed Data Loading Bulk load API Basics

  8. Four ways to Load Data to SQL Server • Integration Services • OLEDB Destination • SQL Server Destinations • BULK INSERT • CSV or fixed width files • BCP • Like BULK INSERT, but can be run remotely • INSERT ... SELECT

  9. Minimally logged and Bulk • Bulk Load • Feeds a continuous stream of data into a table • As opposed to running singleton INSERT statements • Minimally logged • Only allocations are logged, not individual rows/pages • Key Takeway:An operation can be a bulk load operation without being minimally logged

  10. To TABLOCK or not to TABLOCK • General Rule(batch style): • Heaps: Use TABLOCK on Heaps • Cluster Indexes: Do NOT use TABLOCK • Minimally logged: • INSERT Heap WITH (TABLOCK) SELECT ... • If TF610 is on: • INSERT ClusterIndex SELECT ... • Same rules apply for SSIS OLEDB and SQL Destinations in SSIS

  11. Designing and Tuning High Speed Data Loading Design Patterns

  12. Integration Services or T-SQL • Sometimes: Matter or preference • Integration Services is graphical • Some users like this • Hard to make modular • SQL Server uses T-SQL ”text language” • Modular programming • The right tool for the right job • Learn both…

  13. SQL Server – Which load method? BULK INSERT / BCP INSERT ... SELECT Pro Can perform transformations Any OLEDB enabled input Cons Takes X-locks on table Linked Servers or OPENROWSET needed • Pro • Can takes BU-lock • No need for Linked Servers or OPENROWSET • Cons • Only CSV and fixed width files for input

  14. Integration Services – Which Destination? OLEDB Destination SQL Server Destination Pro: Fastest option Easy to configure Con: Must run on same box as SQL Server (shared memory connections) • Pros: • Can be used over TCP/IP • ETL Servers can be scaled out remote • Con: • Typically slower than SQL Destination

  15. Design Pattern: Parallel Load • Create a (priority) queue for your packages • SQL Table good for this purpose • Packages / T-SQL include a loop: • Loop takes one item from queue • Until queue empty… DTEXEC (1) Get Task Do Work Loop Priority Queue DTEXEC (2) … Get Task Do Work Loop Pn P5 P4 P3 P2 P1

  16. Design Pattern: Table Hash Partitioning hash • Create filegroups to hold the partitions • Equally balance over LUN using optimal layout • Use CREATE PARTITION FUNCTION command • Partition the tables into #cores partitions • Use CREATE PARTITION SCHEME command • Bind partition function to filegroups • Add hash column to table (tinyint, just one byte per row) • Calculate a good hash distribution • For example, use hashbytes with modulo or binary_checksum 0 1 2 3 4 5 6 253 254 255

  17. Design Pattern: Large Updates Sales_New SWITCH SWITCH Sales_Old Sales Updated Update Records Sales_Delta BULK INSERT

  18. Design Pattern: Large Deletes 2001 (Filtered) SWITCH Sales_Temp (2001 Filtered) BULK INSERT SWITCH Sales_Temp (2001)

  19. Designing and Tuning High Speed Data Loading Tuning the SQL Server Engine

  20. ALLOC_FREESPACE_CACHE - Heap limits • Measure: Sys.dm_os_latch_waits • Long waits for ALLOC_FREESPACE_CACHE • SQL Server® Books Online: • “Used to synchronize the access to a cache of pages with available space for heaps and binary large objects (BLOBs). Contention on latches of this class can occur when multiple connections try to insert rows into a heap or BLOB at the same time. You can reduce this contention by partitioning the object.” • Hypothesis: More heaps = more speed

  21. PAGELATCH_UP – PFS contention • Measure: • sys.dm_os_wait_stats • Hypothesis Generation • I/O problem? • What can we predict? • Fix: Add more files to the filegoup!

  22. RESOURCE_SEMAPHORE- Query memory usage • DW load queries will often be very memory intensive • By default, a single query can max use 25% of SQL Server’s allocated memory • Queries waiting to get a memory grant will wait for: RESOURCE_SEMAPHORE • Can use RG to work around it

  23. SOS_SCHEDULER_YIELD • Hypothesis: Caused by two bulk commands at same scheduler • Predict: • We should see multiple bulk commands on same scheduler • Observe: And we do… • scheduler_idin sys.dm_exec_requests

  24. Fixing SOS_SCHEDULER_YIELD • How can we fix this? • Two ways: • Terminate and reconnect • Soft NUMA BULK INSERT Soft-NUMA Node 0 Core 0 TCP port 1433 x CPU cores x CPU cores Soft-NUMA Node X BULK INSERT TCP port 1433 + X Core X

  25. I/O Related Waits for BULK INSERT Table • BULK insert uses a double buffering scheme • Important to feed it fast enough • Also, target SQL Server must be able to absorb writes PAGEIOLATCH_EX Parse 64KB 64KB CSV IMPROVIO_WAIT OLEDB ASYNC_NETWORK_IO

  26. CXPACKET – When it Matters • Statements of type • INSERT…SELECT • Measure:Sometimes throughput drops with higher DOP • Hypothesis: backpressure in query execution

  27. Drinking From a Fire Hose Solution: OPTION (MAXDOP = X)

  28. SQL Server waits - Summary

  29. Designing and Tuning High Speed Data Loading Tuning the Network Stack

  30. How to Affinitize NICs • Using the Interrupt-Affinity Policy Tool you can affinitize individual NICs to CPU cores • Affinitize each of the NIC to their own core • One NIC per hard NUMA node • You mileage may very – depends on the box • Match Soft NUMA TCP/IP connections with NIC • NIC on the hardware NUMA node maps to SQL bulk stream target on same node

  31. Tune Network Parameters • Jumbo Frames = 9014 bytes enabled • Adaptive Inter-Frame spacing disabled • Flow control = Tx & Rx enabled • Client & server Interrupt Moderation = Medium • Coalesc buffers = 256 • Set server Rx buffers to 512 and server Tx buffers to 512 • Set client Rx buffers to 512 and client Tx buffers to 256 • Link speed 1000mbps Full Duplex

  32. Network Packet Size • Measure • Perfmon shows huge discrepancy between num reads and writes • Hypothesis: • This is caused by small network packet size (Default 4096) forcing stream to be broken into smaller pieces • Test and prove: • Adjusting network packet size to 32K • Increases throughput by 15%

  33. Designing and Tuning High Speed Data Loading Tuning Integration Services

  34. Integration Services vs. SQL • Lab Test Setup • Transform fact data with surrogate key lookups • 5 dimension tables, 100K rows each • Partitioned fact table, total of 320M rows • Test speed of hash joins Integration Services lookup join is comparable in speed with T-SQL!

  35. Baseline of Package • Sanity check: • How much memory does each package use? • How much CPU does each package stream use? • Need enough CPU and Memory to run them all • Performance counters: • Process – Private Bytes / Working Set (DTEXEC) • Processor – % Processor Time • Network interface • Network / Current Bandwidth • Network / Bytes Total/sec

  36. Scaling the Package - Method • Using the parallel load technique described earlier you can run multiple copies of the package • Using the baseline of the package, you can now calculate how many scale servers you will need

  37. Data Loading – Fast Enough? • Bulk load scales near linearly with bulk streams • Measured so far up to 96 cores • Possible to reach 100% CPU load on all cores • “Just” Get rid of all bottlenecks

  38. Tuning ETL and ELT AppenDIX

  39. Data Loading Links • The Data Loading Performance Guide • Top 10 SQL Server Integration Services Best Practices • Managing and Deploying SQL Server Integration Services • SQL Server 2005 Integration Services: A Strategy for Performance • Integration Services: Performance Tuning Techniques • High Impact Data Warehousing with SQL Server Integration Services

More Related