1 / 38

Winning with Column Store in SQL Server 2016 – A Customer Story

Winning with Column Store in SQL Server 2016 – A Customer Story. Rick Heiges, Data Platform MVP (2007-2017) Principal Solutions Architect, DB Best Technologies, LLC. Thank You to. Our Venue SQLPASS Chapter Volunteers for SQL Saturday Our Speakers. THANK YOU!!!.

madget
Download Presentation

Winning with Column Store in SQL Server 2016 – A Customer Story

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Winning with Column Store in SQL Server 2016 – A Customer Story Rick Heiges, Data Platform MVP (2007-2017) Principal Solutions Architect, DB Best Technologies, LLC

  2. Thank You to Our Venue SQLPASS Chapter Volunteers for SQL Saturday Our Speakers

  3. THANK YOU!!! • For Giving Up Your Saturday • For Not Giving Up on that One Problem • For Wanting to Continue to Learn • For Not Wanting to Stagnate in Your Career • For Going that Extra Mile to SQL Saturday • For NOT Going all those extra miles to • Denmark • Minnesota • Kansas City

  4. BIO Rick Heiges • Data Platform MVP • 2007-2017 • Principal Solutions Architect – DB Best Technologies, LLC • Rick@DBBest.com • PASS Involvement • Member since 2001 • Board of Directors 2003-2011 • Initiated Summit Ambassador Program • Initiated 24 Hours of PASS • Frequent Speaker at Summit and SQL Saturday Events • BLOG: http://sqlblog.com/blogs/rick_heiges @heigesr2 linkedin.com/in/rickheiges

  5. www.dbbest.com Follow us on Twitter @dbbest_tech Like us on Facebook facebook.com/DBBestTech

  6. SQL Server Migration Assistant (SSMA) • Tool Developed by DB Best • Microsoft acquired SSMA in 2005 • DB Best Maintains / Updates SSMA • SSMA for Oracle Extension Pack • Support Data Migration • Oracle Providers to enable Server to Server connectivity • Provides user-defined functions that Emulate Oracle system functions which helps to decrease manual effort for conversion which may be substantial

  7. Business Scenario • Manufacturing Plants Around the World • Hundreds of Measurements taken at various points on Assembly line for each unit • Data Collected and Stored in a Centralized Data Warehouse on a constant basis • Plant and Regional Managers Generate Reports on DW on hourly, daily, weekly, bi-weekly, monthly, quarterly, and adhoc basis • Often references data just collected

  8. Business Scenario Graphic Plant Plant DW Plant ETL Plant Reports Plant Plant Manager

  9. Initial Situation • Customer was running into Performance Issues in timely reporting and analysis on the DW • Customer started testing Oracle’s In-Memory Feature (Partition pinned in Memory) • Early testing was not impressive. Enabled, but not being used at this time • Customer decided to look at SQL Server 2016 for solution and started to look at Migrating DW

  10. Customer First Steps • Contacted Microsoft – Got CAT invovled • Created a small VM (4 vCPUs and 8GB RAM) for initial setup / testing • Customer Initially used SSMA to convert Schema • Created Data Loads for SSIS to get some functional testing done • Identified Queries that exhibited Poor Performance • Used “normal” indexes initially • 4 week Pilot to prove power of Column Store in SQL 2016

  11. System Environment Notes Initial Schema Conversion and Reality Check • Initial VM was 2 HT Cores (4 vCPU) with 8GB RAM • Windows Server 2012 R2 • SQL Server 2016 CTP 2.3 • Upgraded VM to 4HT Cores (8 vCPU) with 32GB RAM for Discovery / Initial Testing • Upgraded VM to 16 vCPU with 256GB RAM for Final Testing • Dev / Test Environment is on 2 nodes of a 4 node Oracle RAC with 256GB RAM per node and 79 cores allocated

  12. Migration Challenges • Customer used various Partitioning strategies • Referential • Range-List • Does not always map well to Range Partitioning in SQL Server • Data Type Changes – Parameter Data Types • SSMA Converted Code has goal of functionality (not performance)

  13. Dealing with Partitioning Differences • Keep ahead of the end of the window – Split Empty partitions in advance • Schedule Jobs to monitor to automate • Partitioning Key must be part of any Unique Index key • RANGE Partitioning only • Usually, Partitioning on Dates is best practice • With Column Store indexes in mind, Don’t partition to small

  14. Best Practices forMigrating to SQL Server • Audit Report • Indexes • Features

  15. Data Loading Audit Report • Row Count – simple to implement, but does not interrogate validity of data from Source to Target other than Row Count • Consider running a similar query against Source and Target systems on Business Critical tables that may have dollar amounts associated with them. • Sum of a column would be bare minimum. Consider also Min, Max, and Average for higher confidence

  16. CI: Clustered Index (Rowstore) • Not Required, but…. • Is a default and best practice • Is generally more performant than a Heap • Is generally a unique, primary key value • Is Required for Partitioning • Think of it as a Page Number in a book • …. Unless There is a Reason Not to Have One • Fill Factor • Page Splits

  17. NCI: Non-Clustered Index (Rowstore) • Often used to aid retrieval of data without knowing the PK • Often used for FK and common columns used in searches • Use Included Columns for Covering Queries with an NCI • Similar to an Index in the back of a book. Look in the index to find the page to turn to.

  18. Columnstore Refresher how is it different? Data stored as columns Data stored as rows … Benefits: • Improved compression: Data from same domain compress better • Reduced I/O: Fetch only columns needed • Improved Performance: More data fits in memory C1 C4 C5 C3 C2

  19. Key Terms / Concepts • Columnstore / Rowstore – Physical Organization • Row Group – Rows Compressed into Columnstore format usually in sets of the Max (1,048,576) • Column Segment – Column of Data within Row Group • Deltastore – Clustered Index that stores data in rows until a threshold(Min 102,400) is met to move to a columnstore • Tuple Mover – Background Process converts rows in the deltastore to the columnstore • Batch Execution – Query Processing Method that operates on multiple rows together (aka vector-based execution)

  20. Deltastore Basics Row Group • Bulk Load Data goes directly into Columnstore based on large volumes of data • Single and smaller loads of data first go to Deltastore • Tuple Mover runs in the background and moves rows from rowstore to columnstore

  21. NCCI: Non-Clustered Columnstore Index • First Introduced in SQL 2012 • Made Table Read-Only in 2012/2014 • Only 1 NCCI allowed in 2012/2014 • Use Partition “Switch” to add data • Now updateable in SQL 2016 • Multiple NCCI allowed • Complements “Row Store”

  22. CCI: Clustered Columstore Index • Introduced in SQL 2014 • Converts Rowstore into Columnstore • No other indexes allowed in 2014 • Incredible Compression • Often up to 90% • Domain of Data is similar repeatable data • Can have NCI on CCI in 2016 • Allows for better SEEK performance • Certain Restrictions apply • Unique, FK, Data Types

  23. Index Compatibility Index Matrix Summary Table is physically organized as a Heap, Cluster Index (Row) or CCI (Column) NCI and NCCI are secondary indexes that can dramatically affect performance

  24. Initial Changes to SQL 2016 Database / Server • Trace Flags • 1117/1118/2371 no longer needed • TempDB • Number of Files / Growth Rate / Location • Customer Database • Files / Filegroups / Growth • SQL Memory Settings • Examine Waitstats • Quiet Baseline

  25. Changes on SQL 2016 Side • Parameter Types changed in .NET/ App to match SQL data type changes • Clustered Indexes and PKs reviewed • Selected Tables were chosen for CCI • Selected Tables were chosen to have NCCI added • Some Additional NCI added to Selected Tables • Selected Tables were also Partitioned • Date is preferred • Arbitrary Boundary Point similar to a time period

  26. Traditional DW Design Dim Table A Dim Table C Dim Table B Fact Table Dim Table E Dim Table D • Almost all queries hit Fact Table • Approx 450 columns (400+ of Measurement Data • 100 GB Fact Table uncompressed • 100s of millions of rows

  27. Fact Table • Initial Load of 2015 Data (on 4vCPU/8GB VM) • Created Partitions – Date Based (2 weeks) • 100+ GB in size • Converted to CCI • Rebuild CCI – Reorganize – Compressed to 8GB • Early Investigative Querying from DBA Team indicated that we were optimistic about Testing • Loaded Additional Years of Data (2014, 2013) • 2014 and 2013 data volumes were smaller due to not as many measurements being captured

  28. Related Tables • Smaller Dim Tables • Clustered Index • Additional Non-Clustered Indexes • Large Dim Tables • Clustered Index / Partitioned • Several with Non-Clustered Index • Several with Non-Clustered Column Store Index • Based on query usage

  29. Testing • Used Consistent Dates - Returned same number of rows across both systems • Queries chosen were based on problematic queries currently used in Production • Queries were very complex with criteria focusing on specific date ranges, plants, and measurement types (some with multiple CASE clauses) • Simple Queries were not officially tested, but were anecdotally tested by the DBA team without issues

  30. The Contestants • Oracle 12c • 2 Nodes of a 4 Node RAC • 40cores of 79 allocated • 256GB per node (512GB) • Dedicated Attached Storage • No ETL • No clearing of cache • No Perf Data • Key Tables are the same volume except for Dim Table A: 3 Billion • SQL Server 2016 CTP 2.3 • VM: 16vCPU/256GB • 240GB allocated to SQL • Storage on shared SAN • ETL Job running loading data • Clear Caches before every set of queries • Basic Perfmon Data Collection • Key Tables are the same volume except for Dim Table A: 600 Million (Due to allocated space)

  31. SQL 2016 Test Results

  32. Perfmon Initial Notes – We could be even faster! • CPU at Approx. 20% • Serious IO Latency on both Read and Write • F: .010 / .041 • G: .010 / .037 • H: .010 / .038 • L: Read Spike .068 @ start of testing • D: write .020 • No Memory Pressure

  33. Future Considerations • Some Data Architecture Changes coming to the Fact Table • Further Index Tuning and Decisions • Further Code and Query Tuning

  34. SQL 2016 Architecture Option: Use AlwaysOn Availability Groups Application Listener Primary Sync Secondary Data Data Async Secondary Data • AlwaysOn AG Primary Replica • Sync to Replica in same DC to provide HA • Async to Replica in DR DC for DR • May also use Replicas for Read Only Queries • May also have additional Replicas closer to plants to reduce network latency

  35. Summary • SQL on a smaller VM beat Oracle on Big Iron • SQL Server 2016 with Columnstore Indexes performs better than the Oracle System • Efforts were made to simulate consistent ETL flow as in current real world conditions • Further Query Tuning and more Robust IO subsystem will solidify SQL Server Dominant Performance

  36. Thank You! • Rick Heiges • Data Platform MVP (2007-2017) • Follow Me on Twitter: @heigesr2 • Rick@DBBest.com

  37. Questions?

  38. Thank You for Attending

More Related