1 / 49

Implementing a Microsoft SQL Server Data Warehouse Fast Track

SESSION CODE: BIE402. Implementing a Microsoft SQL Server Data Warehouse Fast Track. Brian Knight Founder, Pragmatic Works bknight@pragmaticworks.com. About the Ugly Guy Speaking. SQL Server MVP Founder of Pragmatic Works Co-Founder of BIDN.com, SQLServerCentral.com and SQLShare.com

tannar
Download Presentation

Implementing a Microsoft SQL Server Data Warehouse Fast Track

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SESSION CODE: BIE402 Implementing a Microsoft SQL Server Data Warehouse Fast Track Brian Knight Founder, Pragmatic Works bknight@pragmaticworks.com

  2. About the Ugly Guy Speaking • SQL Server MVP • Founder of Pragmatic Works • Co-Founder of BIDN.com, SQLServerCentral.com and SQLShare.com • Written more than a dozen books on SQL Server

  3. Today’s Problems with Integration • Integration today • Increasing data volumes • Increasingly diverse sources • Requirements reached the Tipping Point • Low-impact source extraction • Efficient transformation • Bulk loading techniques

  4. Agenda • SQL Instance-level • Data load tuning • Fast Track maintenance

  5. SQL Server Fast Track Data Warehouse Solution to help customers and partners accelerate their data • A method for designing a cost-effective, balanced system for Data Warehouse workloads • Reference hardware configurations developed in conjunction with hardware partners using this method • Best practices for data layout, loading and management

  6. Fast Track Data Warehouse Components • Software: • SQL Server 2008 Enterprise • Windows Server 2008 • Configuration guidelines: • Physical table structures • Indexes • Compression • SQL Server settings • Windows Server settings • Loading • Hardware: • Tight specifications for servers, storage & networking • ‘Per core’ building block

  7. Fast Track Performance • HP ProLiant DL785 G6 • (8) AMD OpteronCPUs, 6 core, 2.6 GHz • 48 total CPU cores • 24 TB optimized storage (48 TB max) • 9600 MB/s throughput

  8. * Core-balanced compressed capacity based on 300GB 15k SAS not including hot spares and log drives. Assumes 25% (of raw disk space) allocated for Temp DB. ** Represents storage array fully populated with 300GB15k SAS and use of 2.5:1 compression ratio. This includes the addition of one storage expansion tray per enclosure.30% of this storage should be reserved for DBA operations Fast Track Data Warehouse Reference Configurations

  9. Potential Performance Bottlenecks A A B B FC SWITCH SERVER CPU CORES WINDOWS SQL SERVER CACHE FC HBA FC HBA STORAGE CONTROLLER CACHE A A B B A B CPU Feed Rate HBA Port Rate Switch Port Rate SP Port Rate LUN Read Rate Disk Feed Rate SQL Server Read Ahead Rate DISK DISK DISK DISK LUN LUN

  10. Fast Track SQL DW Architecture vs. Traditional DW Fast Track SQL DW Architecture Dedicated DW Infrastructure Architecture modeled after DW Appliances 1TB – 48TB Pre-Tested Traditional SQL DW Architecture Shared Infrastructure Benefits: More System Predictability Thus User Experience Pretested Configurations Lowers TCO Balanced CPU to I/O Channel Optimized for DW Modular Building Block Approach Scale Out or Up within limits of Server and San Dedicated Network Bandwidth Enterprise Shared SAN Storage Shared Network Bandwidth Dedicated Low Cost SAN Arrays 1 for every 4 CPU Cores EMC AX4 – HP MSA2312 SQL 2008 Data Warehouse 4 Processor 16 Core Server OLTP Applications

  11. Case: Insurance Claims – High-volume loads in a short load window • Example: Load and enrich 50 GB of incremental data in less than 1 hour • Only possible with a highly parallel load design • Use partitioned destination table • # partitions = # cores • Parallel loading to staging table first • Separate filegroups per-partition prevents interleaving during load

  12. Results Price per TB (8TB) – Cal : $22K / TB Price per TB (16TB) – Cal: $13K / TB

  13. Case Study • Replaced AS/400 DB2 with SQL Server • Replaced CICS with SSIS • Saved ~$50,000 a month • Took 12 hour process down to 50 minutes

  14. DW Products Positioning Incremental HW Expansion, Fast parallel loading by default,HA by default PDW with Hub-and-spoke Scale Complexity HA by default SW-HW integration 4 3 PDW SQL Server 2008 with Fast Track Reference Architecture 2 SQL Server 2008 1 Start here

  15. Fast Track Data Striping • Fast Track evenly spreads SQL data files across physical RAID-1 disk arrays FT Storage Enclosure Raid-1 Primary Data Log CREATE FILEGROUP DB1 DB1-7.ndf DB1-1.ndf DB1-5.ndf DB1-3.ndf DB1.ldf ARY05v09 ARY01D1v01 ARY02D1v03 ARY04D1v07 ARY03D1v05 Disk 1 & 2 ARY02D2v04 ARY01D2v02 ARY04D2v08 ARY03D2v06 DB1-8.ndf DB1-2.ndf DB1-6.ndf DB1-4.ndf

  16. Fast Track File Layout

  17. SQL Server File System • Three layers of storage configuration • SAN file system • Logical storage allocation • Primary Data (user databases) • (4) 2 disk RAID-1 arrays per enclosure • Log • (1) 2 disk RAID-1 array per enclosure • Database file creation • User databases • Tempdb • Transaction logs

  18. Writing Sequential Data • Sequential scan performance starts with database creation and extent allocation • Recall that the –E startup option is used • Allocate 64 extents at a time (4MB) • Pre-allocation of user databases is recommened • Autogrow should be avoided if possible • If used, always use 4MB increments

  19. Mounting the SAN File System • Creating LUNS • Mount points can be used to map LUN’s to the Windows Server OS • Fast Track RA recommends using a naming scheme to identity LUN to physical disk relationship. • LUN, RAID, and Physical Disk number are used as components of the windows volume name • Naming scheme enables targeted IO validation of disk (LUN), array, and storage processor using a tool such as SQLIO • Primary Data arrays: 2 LUN per Array • LOG array: 1 LUN

  20. SQL Server Configuration • SQL Server Startup • -E : Allocate 64 extents at a time (4MB) • This is not a guarantee of a logically contiguous extent allocation • -T1117: Autogrow in even increments • -T610 : Minimal logging during data loads • All databases should be sized to meet expected growth for next 12-18 months • Autogrow for ALL Databases should be set to 4 MB

  21. SQL Server Files • Transaction Log • Create a single transaction log file per database and place on a dedicated Log LUN • Enable auto-grow for log files • The transaction log size for each database should be at least twice the size of the largest DML operation

  22. SQL Server Files • User Databases • Create at least one Filegroup containing one data file per LUN • FT targets 1:1 LUN to CPU core affinity • Make all files the same size • Effectively stripes database files across data LUNs • Multiple file groups may be advantageous • Disable Auto-Grow for the database • Transaction Log is allocated to a Log LUN

  23. Data Load in a Fast Track

  24. Conventional data loads lead to fragmentation • Bulk Inserts into Clustered Index using a moderate ‘batchsize’ parameter • Each ‘batch’ is sorted independently • Overlapping batches lead to page splits 1:31 1:32 1:33 1:34 1:35 1:31 1:36 1:32 1:37 1:33 1:38 1:34 1:39 1:35 1:40 Key Order of Index

  25. Techniques to Maximize Scan Throughput • Minimize use of NonClustered indexes on Fact Tables • Load techniques to avoid fragmentation • Load in Clustered Index order (e.g. date) when possible • Index Creation always MAXDOP 1, SORT_IN_TEMPDB • Isolate volatile tables in separate filegroup • Isolate staging tables in separate filegroup or DB • Periodic maintenance

  26. Minimizing Extent Fragmentation • Extent fragmentation can be minimized through use of filegroups • Separate filegroups for volatile data • Separate filegroups for staging tables • Partition key tables across multiple filegroups • Useful if data volatility varies across partition ranges • Isolate data operations that generate significant fragmentation to dedicated filegroups or databases

  27. Loading Data • Primary method used to create sequential data layout • Goals • Maximize sequential data layout • Minimize fragmentation • Key considerations • Concurrent load operations to the same file will induce fragmentation • DML change operations (Update/Delete) may induce fragmentation

  28. Loading Data • Load recommendations for Fast Track are broken into two general scenarios • Migration: Very large one-time, or infrequent loads • Typically target empty tables • Less time sensitive relative to SLA’s • Incremental: Routine operational loads • Typically target populated tables • Time sensitive

  29. Migration Loads – Heap • Minimal Logging is recommended • Tablock and/or TF610 may be required • Partitioned/Non Partitioned • Load directly into target table • Set BATCHSIZE appropriately • Parallelize Bulk Inserts if necessary

  30. Migration Loads – CI • Minimal Logging is recommended • Tablock and/or TF610 may be required • Bulk Insert to a Staging table • Stage option 1: Heap with matching partition • Stage option 2: CI with matching partition • Concurrent Bulk Inserts • Stage option 1: Yes • Stage option 2: Limited by TempDB sort • Insert-Select from Stage • Maxdop 1, single query • Concurrent Inserts if: • Partition range restricted or • Multiple Filegroups targeted • Partition Switching can also be used

  31. Data Loading – Recommendations for Incremental Loads • Clustered Index Table Loads • Option 1 – Direct load into table • Sorts and commit size must fit into memory • Option 2 – Empty table • Load into empty clustered index table • Serial or parallelized • Non-parallelized INSERT SELECT statement to move to final table

  32. Incremental Loads – CI • Minimal Logging is recommended • Tablock and/or TF610 may be required • Bulk-Insert to identical CI staging table • Insert Select with maxdop1 • Direct Bulk-Insert to Target table • Ensure sorts fit in memory • Higher performance options • Partition Target table across multiple Filegroups • Concurrent Bulk inserts across Filegroups • Concurrent Bulk Insert to partitioned Heap Stage • Concurrent ranged restricted Insert-Select

  33. Fast Track Load Parallel Loads Demo

  34. Maintenance considerations • Use ALTER INDEX … REBUILD … … WITH (MAXDOP = 1, SORT_IN_TEMPDB) • Single threaded -- avoids creating new extent fragmentation • Can rebuild just the “current” partition • Avoid ALTER INDEX … REORGANIZE • Pages will become physically ordered, but significant extent fragmentation may occur

  35. Column Statistics • AUTO CREATE and AUTO UPDATE is recommended • Ideally statistics are gathered on all columns of a table • Minimum of columns used in WHERE or HAVING clauses • Performance considerations will determine a balance between ideal & minimum cases • FULLSCAN recommended for Dimensions • Composite statistics for joins that utilize composite keys • Composite statistics cannot be auto created

  36. Tenets of Parallel Design • Partition the problem • Preferable into equal sized pieces • Eliminate conflicts • Stateless design • Reduce the need for common resources • Schedule efficiently • Optimize the Gantt chart

  37. Schedule Efficiently • Create a priority queue of work • Start multiple copies of the package • Packages process work in a loop DTEXEC (1) Get Task Do Work Loop TaskQueue DTEXEC (2) … Get Task Do Work Loop Pn P5 P4 P3 P2 P1

  38. Demo Results

  39. Demo Results

  40. Demo Results

  41. Demo Results

  42. Using A Queue To Control Parallelism Demo

  43. Fast Track Data Warehouse Pricing

  44. Conclusion • Tenets of parallel design • Partition the problem • Eliminate conflicts • Schedule efficiently

  45. Required Slide Resources Learning • Sessions On-Demand & Community • Microsoft Certification & Training Resources www.microsoft.com/teched www.microsoft.com/learning • Resources for IT Professionals • Resources for Developers http://microsoft.com/technet http://microsoft.com/msdn

  46. Required Slide Complete an evaluation on CommNet and enter to win!

  47. Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st http://northamerica.msteched.com/registration You can also register at the North America 2011 kiosk located at registrationJoin us in Atlanta next year

  48. © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

  49. Required Slide

More Related