410 likes | 652 Views
Designing and Tuning High Speed Data Loading. Thomas Kejser Senior Program Manager tkejser@microsoft.com. Agenda. Tuning Methodology Bulk Load API Basics Design Pattern and Techniques Parallelism Table Layout Tuning the SQL Server Engine Tuning the Network Stack
E N D
Designing and Tuning High Speed Data Loading Thomas Kejser Senior Program Manager tkejser@microsoft.com
Agenda • Tuning Methodology • Bulk Load API Basics • Design Pattern and Techniques • Parallelism • Table Layout • Tuning the SQL Server Engine • Tuning the Network Stack • Tuning Integration Services
Tuning ETL and ELT Tuning Methodology
The Tuning Loop • Get a baseline • Make small change at a time • Agree on targets for optimization • Actual runtime • CPU, Memory, I/O • The greedy tuner: • “Tune it till it breaks, then fix it, so you can break it again”
Tools of the Trade - Windows • Perfmon • Logical Disk • Memory • Processor • Process (specifically the DTEXEC process) • Network Interface • Task Manager • WinDbg • KernRate
Tool of the Trade – SQL Server • Sys.dm_os_wait_stats • All my tuning starts here • Get familiar with common wait types • Sys.dm_os_latch_stats • Allows deep dive into LATCH_<X> waits • Sys.dm_os_spinlock_stats • When too much CPU seems to be spend • Sys.dm_io_virtual_filestats • Because I/O systems are rarely perfect
Designing and Tuning High Speed Data Loading Bulk load API Basics
Four ways to Load Data to SQL Server • Integration Services • OLEDB Destination • SQL Server Destinations • BULK INSERT • CSV or fixed width files • BCP • Like BULK INSERT, but can be run remotely • INSERT ... SELECT
Minimally logged and Bulk • Bulk Load • Feeds a continuous stream of data into a table • As opposed to running singleton INSERT statements • Minimally logged • Only allocations are logged, not individual rows/pages • Key Takeway:An operation can be a bulk load operation without being minimally logged
To TABLOCK or not to TABLOCK • General Rule(batch style): • Heaps: Use TABLOCK on Heaps • Cluster Indexes: Do NOT use TABLOCK • Minimally logged: • INSERT Heap WITH (TABLOCK) SELECT ... • If TF610 is on: • INSERT ClusterIndex SELECT ... • Same rules apply for SSIS OLEDB and SQL Destinations in SSIS
Designing and Tuning High Speed Data Loading Design Patterns
Integration Services or T-SQL • Sometimes: Matter or preference • Integration Services is graphical • Some users like this • Hard to make modular • SQL Server uses T-SQL ”text language” • Modular programming • The right tool for the right job • Learn both…
SQL Server – Which load method? BULK INSERT / BCP INSERT ... SELECT Pro Can perform transformations Any OLEDB enabled input Cons Takes X-locks on table Linked Servers or OPENROWSET needed • Pro • Can takes BU-lock • No need for Linked Servers or OPENROWSET • Cons • Only CSV and fixed width files for input
Integration Services – Which Destination? OLEDB Destination SQL Server Destination Pro: Fastest option Easy to configure Con: Must run on same box as SQL Server (shared memory connections) • Pros: • Can be used over TCP/IP • ETL Servers can be scaled out remote • Con: • Typically slower than SQL Destination
Design Pattern: Parallel Load • Create a (priority) queue for your packages • SQL Table good for this purpose • Packages / T-SQL include a loop: • Loop takes one item from queue • Until queue empty… DTEXEC (1) Get Task Do Work Loop Priority Queue DTEXEC (2) … Get Task Do Work Loop Pn P5 P4 P3 P2 P1
Design Pattern: Table Hash Partitioning hash • Create filegroups to hold the partitions • Equally balance over LUN using optimal layout • Use CREATE PARTITION FUNCTION command • Partition the tables into #cores partitions • Use CREATE PARTITION SCHEME command • Bind partition function to filegroups • Add hash column to table (tinyint, just one byte per row) • Calculate a good hash distribution • For example, use hashbytes with modulo or binary_checksum 0 1 2 3 4 5 6 253 254 255
Design Pattern: Large Updates Sales_New SWITCH SWITCH Sales_Old Sales Updated Update Records Sales_Delta BULK INSERT
Design Pattern: Large Deletes 2001 (Filtered) SWITCH Sales_Temp (2001 Filtered) BULK INSERT SWITCH Sales_Temp (2001)
Designing and Tuning High Speed Data Loading Tuning the SQL Server Engine
ALLOC_FREESPACE_CACHE - Heap limits • Measure: Sys.dm_os_latch_waits • Long waits for ALLOC_FREESPACE_CACHE • SQL Server® Books Online: • “Used to synchronize the access to a cache of pages with available space for heaps and binary large objects (BLOBs). Contention on latches of this class can occur when multiple connections try to insert rows into a heap or BLOB at the same time. You can reduce this contention by partitioning the object.” • Hypothesis: More heaps = more speed
PAGELATCH_UP – PFS contention • Measure: • sys.dm_os_wait_stats • Hypothesis Generation • I/O problem? • What can we predict? • Fix: Add more files to the filegoup!
RESOURCE_SEMAPHORE- Query memory usage • DW load queries will often be very memory intensive • By default, a single query can max use 25% of SQL Server’s allocated memory • Queries waiting to get a memory grant will wait for: RESOURCE_SEMAPHORE • Can use RG to work around it
SOS_SCHEDULER_YIELD • Hypothesis: Caused by two bulk commands at same scheduler • Predict: • We should see multiple bulk commands on same scheduler • Observe: And we do… • scheduler_idin sys.dm_exec_requests
Fixing SOS_SCHEDULER_YIELD • How can we fix this? • Two ways: • Terminate and reconnect • Soft NUMA BULK INSERT Soft-NUMA Node 0 Core 0 TCP port 1433 x CPU cores x CPU cores Soft-NUMA Node X BULK INSERT TCP port 1433 + X Core X
I/O Related Waits for BULK INSERT Table • BULK insert uses a double buffering scheme • Important to feed it fast enough • Also, target SQL Server must be able to absorb writes PAGEIOLATCH_EX Parse 64KB 64KB CSV IMPROVIO_WAIT OLEDB ASYNC_NETWORK_IO
CXPACKET – When it Matters • Statements of type • INSERT…SELECT • Measure:Sometimes throughput drops with higher DOP • Hypothesis: backpressure in query execution
Drinking From a Fire Hose Solution: OPTION (MAXDOP = X)
Designing and Tuning High Speed Data Loading Tuning the Network Stack
How to Affinitize NICs • Using the Interrupt-Affinity Policy Tool you can affinitize individual NICs to CPU cores • Affinitize each of the NIC to their own core • One NIC per hard NUMA node • You mileage may very – depends on the box • Match Soft NUMA TCP/IP connections with NIC • NIC on the hardware NUMA node maps to SQL bulk stream target on same node
Tune Network Parameters • Jumbo Frames = 9014 bytes enabled • Adaptive Inter-Frame spacing disabled • Flow control = Tx & Rx enabled • Client & server Interrupt Moderation = Medium • Coalesc buffers = 256 • Set server Rx buffers to 512 and server Tx buffers to 512 • Set client Rx buffers to 512 and client Tx buffers to 256 • Link speed 1000mbps Full Duplex
Network Packet Size • Measure • Perfmon shows huge discrepancy between num reads and writes • Hypothesis: • This is caused by small network packet size (Default 4096) forcing stream to be broken into smaller pieces • Test and prove: • Adjusting network packet size to 32K • Increases throughput by 15%
Designing and Tuning High Speed Data Loading Tuning Integration Services
Integration Services vs. SQL • Lab Test Setup • Transform fact data with surrogate key lookups • 5 dimension tables, 100K rows each • Partitioned fact table, total of 320M rows • Test speed of hash joins Integration Services lookup join is comparable in speed with T-SQL!
Baseline of Package • Sanity check: • How much memory does each package use? • How much CPU does each package stream use? • Need enough CPU and Memory to run them all • Performance counters: • Process – Private Bytes / Working Set (DTEXEC) • Processor – % Processor Time • Network interface • Network / Current Bandwidth • Network / Bytes Total/sec
Scaling the Package - Method • Using the parallel load technique described earlier you can run multiple copies of the package • Using the baseline of the package, you can now calculate how many scale servers you will need
Data Loading – Fast Enough? • Bulk load scales near linearly with bulk streams • Measured so far up to 96 cores • Possible to reach 100% CPU load on all cores • “Just” Get rid of all bottlenecks
Tuning ETL and ELT AppenDIX
Data Loading Links • The Data Loading Performance Guide • Top 10 SQL Server Integration Services Best Practices • Managing and Deploying SQL Server Integration Services • SQL Server 2005 Integration Services: A Strategy for Performance • Integration Services: Performance Tuning Techniques • High Impact Data Warehousing with SQL Server Integration Services