SQL Server: A Data Platform for Large-Scale Applications

SQL Server: A Data Platform for Large-Scale Applications José Blakeley, Software Architect Database Systems Group, Microsoft Corporation

SQL Server Design Philosophy • To solve 90+% of the problems automatically • Self Tuning • Dynamic Optimization • Self Configuration • Self Management • Self Healing • The remaining percentage takes higher touch • Requires careful design for scalability • Good knowledge of how the RDBMS platform works • This talk describes how we are codifying our solution for large-scale DW into SQL Server UCI ISG Seminar

SQL Data PlatformAny Place, Any Type, Any Scale Enterprise Data Platform Dynamic Development Beyond Relational Pervasive Insight UCI ISG Seminar

Outline • SQL Server Data Platform • Basic concepts – common ground • Assumptions • Workloads • SQL Server Parallel DW DBMS • Philosophy • System Architecture • Software Architecture • Summary UCI ISG Seminar

Workload Types • Online Transaction Processing (OLTP) • Balanced read-update ratio (60%-40%) • Fine-grained inserts and updates • High transaction throughput e.g., 10s K/s • Usually very short transactions e.g., 1-3 tables • Sometimes multi-step e.g., financial • Relatively small data sizes e.g., few TBs • Data Warehousing and Business Analysis (DW) • Read-mostly (90%-10%) • Few updates in place, high-volume bulk inserts • Concurrent query throughput e.g., 10s K / hr • Per query response time < 2 s • Snowflake, star schemas are common e.g., 5-10 tables • Complex queries (filter, join, group-by, aggregation) • Very large data sizes e.g., 10s TB - PB OLTP = Day-to-day business DW = Analysis over recorded data UCI ISG Seminar

436,892,631 rows Star Join Schema 41 rows 34 rows SELECTFE0.RegionName, FL0.FiscalYearName, SUM (A.ActualRevenueAmt) FROMTECSPURSL00 A JOINSalesDate L ONA.SalesDateID = L.SalesDateID JOINUpperGeography UG ONA.TRCreditedSubsidiaryId = UG.CreditedSubsidiaryID JOINRegion FE0 ONUG.CreditedRegionID = FE0.RegionID JOINFiscalYear FL0 ONL.FiscalYearID = FL0.FiscalYearID GROUP BY FE0.RegionName, FL0.FiscalYearName 118 rows 13,517 rows UCI ISG Seminar 1/8/2010

Why look at workloads? $B • DBMS architectures of 30 years ago designed for mixed workloads • OLTP and DW workloads have grown into big market segments • Workload-specific engines can be profitable • Innovations are driven by deeper understanding of workloads 8 6 OLTP Mixed DW 4 2 1995 2000 2005 2010 UCI ISG Seminar

Changing TPC-H Landscape 68x perf 12% cost • New DW startups disrupting TPC-H • Column-store, compression, and scale-out UCI ISG Seminar

Data WarehousingSQL Server 2008 • Provides out-of-the-box SMP scale for ~10 TB • Data compression – row and page • Star join, bitmap filters, partitioned table parallelism • Minimally Logged INSERT • MERGE • Resource governor – CPU and memory • Many successful apps for 10-100 TB running today via custom scale-out • E.g., Danske Bank, Clalit Health, US Dep. of Agriculture, Pan-STARRS • Building out-of-the-box scale-out for 10s-100s of TB • This is the focus of this talk UCI ISG Seminar

Outline • SQL Server Data Platform • Basic concepts – common ground • Assumptions • Workloads • SQL Server Parallel DW DBMS • Design Philosophy • Hardware Architecture • Software Architecture • Summary UCI ISG Seminar

Design Philosophy • Shared-nothing, distributed, parallel DBMS • Implicit data and function partitioning • Appliance concept • Software + hardware solution • Optimized for DW workloads • Scalable to Petabytes UCI ISG Seminar

Appliance Concept DBMS System Software Parallel DW Reference Hardware Platforms INDUSTRY STANDARD SERVERS INDUSTRY STANDARD NETWORKING INDUSTRY STANDARD STORAGE UCI ISG Seminar

Hardware Architecture Compute Nodes Storage Nodes SQL SQL SQL SQL SQL SQL SQL SQL SQL Control Nodes Active / Passive Client Drivers (ODBC, OLE-DB, ADO.NET) Management Servers Data Center Monitoring Dual Infiniband Dual Fiber Channel Landing Zone ETL Load Interface Backup Nodes Spare Compute Node Corporate Backup Solution 1 Rack Server Appliance UCI ISG Seminar

Node Types • Control node: • Where clients apps connect • Parallel engine runs here • Contains system metadata • Compute nodes: • Store user data • Perform query execution • Not accessible to outside world • Landing Zone: • Staging place for data loading • Accessible to outside world • Can be augmented with 3rd party HW and SW • Backup node: • Backup file storage • Accessible to outside world • Can be augmented with 3rd party HW and SW • Management node: • Windows domain controller (Active Directory) • SW upgrades staging place • Holds SW images in case a node needs reimaging

Software Architecture Internet Explorer Query Tool MS BI (AS, RS) DWSQL Other 3rd Party Tools IIS Data Access (OLEDB, ODBC, ADO.NET, JDBC) Admin Console Compute Node Compute Nodes Data Movement Service Compute Nodes MPP Engine Coordinator Data Movement Service SQL Server User Data Landing Zone Node SQL Server Data Movement Service DW Authentication DW Configuration DW Schema TempDB Control Node UCI ISG Seminar

Key Components • MPP Engine Coordinator • Provides single system image • SQL compilation • Global metadata and appliance configuration • Global query optimization and plan generation • Global query execution coordination • Global transaction coordination • Authentication and authorization • Supportability (HW and SW status info) • Data Movement Service • Data movement across the appliance • Distributed query execution operators • Parallel Loader • Run from the Landing Zone • SSIS or command line tool • Parallel Database Copy • High performance data export • Enables Hub-Spoke scenarios • Parallel Backup/Restore • Backup files stored on Backup Nodes • Backup files may be archived into external device/system UCI ISG Seminar

Query Processing • SQL statement compilation • Parsing, validation, optimization • Builds an MPP execution plan • A set of parallel QE steps • Executes the plan • Coordinates workflow among steps • Assembles the resultset • Returns resultset to client

Example – Schema TPCH -------------------------------------------------------------------------------- -- Customer Table -- distributiond on c_custkey -------------------------------------------------------------------------------- CREATE TABLE customer ( c_custkeybigint, c_namevarchar(25), c_addressvarchar(40), c_nationkey integer, c_phone char(15), c_acctbal decimal(15,2), c_mktsegment char(10), c_commentvarchar(117)) WITH (distribution=hash(c_custkey)) ; -------------------------------------------------------------------------------- -- Orders Table -------------------------------------------------------------------------------- CREATE TABLE orders ( o_orderkeybigint, o_custkeybigint, o_orderstatus char(1), o_totalprice decimal(15,2), o_orderdate date, o_orderpriority char(15), o_clerk char(15), o_shippriority integer, o_commentvarchar(79)) WITH (distribution=hash(o_orderkey)) ; -------------------------------------------------------------------------------- -- LineItem Table -- distributiond on l_orderkey -------------------------------------------------------------------------------- CREATE TABLE lineitem ( l_orderkeybigint, l_partkeybigint, l_suppkeybigint, l_linenumberbigint, l_quantity decimal(15,2), l_extendedprice decimal(15,2), l_discount decimal(15,2), l_tax decimal(15,2), l_returnflag char(1), l_linestatus char(1), l_shipdate date, l_commitdate date, l_receiptdate date, l_shipinstruct char(25), l_shipmode char(10), l_commentvarchar(44)) WITH (distribution=hash(l_orderkey)) ; UCI ISG Seminar

Example - Query SELECT TOP 10 L_ORDERKEY, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE, O_ORDERDATE, O_SHIPPRIORITY FROM CUSTOMER, ORDERS, LINEITEM WHERE C_MKTSEGMENT = 'BUILDING' AND C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE < '1995-03-05' AND O_ORDERDATE >= '1994-09-15 00:00:00.000' GROUP BY L_ORDERKEY, O_ORDERDATE, O_SHIPPRIORITY ORDER BY REVENUE DESC, O_ORDERDATE ; UCI ISG Seminar

Example – Execution Plan ------------------------------ -- Step 1: create temp table at control node ------------------------------ CREATE TABLE [tempdb].[dbo].[Q_[TEMP_ID_664]] ( [l_orderkey] BIGINT, [REVENUE] DECIMAL(38, 4), [o_orderdate] DATE, [o_shippriority] INTEGER ); ------------------------------ -- Step 2: create temp tables at all compute nodes ------------------------------ CREATE TABLE [tempdb].[dbo].[Q_[TEMP_ID_665]_[PARTITION_ID]] ( [l_orderkey] BIGINT, [l_extendedprice] DECIMAL(15, 2), [l_discount] DECIMAL(15, 2), [o_orderdate] DATE, [o_shippriority] INTEGER, [o_custkey] BIGINT, [o_orderkey] BIGINT ) WITH ( DISTRIBUTION = HASH([o_custkey]) ); ------------------------------- -- Step 3: SHUFFLE_MOVE -------------------------------- SELECT [l_orderkey], [l_extendedprice], [l_discount], [o_orderdate], [o_shippriority], [o_custkey], [o_orderkey] FROM [dwsys].[dbo].[orders] JOIN [dwsys].[dbo].[lineitem] ON ([l_orderkey] = [o_orderkey]) WHERE ([o_orderdate] < '1995-03-05' AND [o_orderdate] >= '1994-09-15 00:00:00.000') INTO Q_[TEMP_ID_665]_[PARTITION_ID] SHUFFLE ON (o_custkey); ------------------------------ -- Step 4: PARTITION_MOVE ------------------------------ SELECT [l_orderkey], sum(([l_extendedprice] * (1 - [l_discount]))) AS REVENUE, [o_orderdate], [o_shippriority] FROM [dwsys].[dbo].[customer] JOINtempdb.Q_[TEMP_ID_665]_[PARTITION_ID] ON ([c_custkey] = [o_custkey]) WHERE [c_mktsegment] = 'BUILDING' GROUP BY [l_orderkey], [o_orderdate], [o_shippriority] INTO Q_[TEMP_ID_664]; ------------------------------ -- Step 5: Drop temp tables at all compute nodes ------------------------------ DROP TABLE tempdb.Q_[TEMP_ID_665]_[PARTITION_ID]; ------------------------------- -- Step 6: RETURN result to client -------------------------------- SELECT TOP 10 [l_orderkey], sum([REVENUE]) AS REVENUE, [o_orderdate], [o_shippriority] FROMtempdb.Q_[TEMP_ID_664] GROUP BY [l_orderkey], [o_orderdate], [o_shippriority] ORDER BY [REVENUE] DESC, [o_orderdate] ; ------------------------------- -- Step 7: Drop temp table at control node -------------------------------- DROP TABLE tempdb.Q_[TEMP_ID_664]; UCI ISG Seminar

Other Important Functionality • Fault tolerance • All HW components have redundancy: • CPUs, Disks, networks, power, storage processors • Control and Compute nodes use failover clustering • Management nodes have active & standby • Integration with SQL Server BI tools • SS Integration Services (ETL) has PDW as a destination • SS Analysis Services (OLAP) has PDW as a source • SS Reporting Services • Excel UCI ISG Seminar

Future Challenges • Richer DW analysis • Map-reduce-like functionality inside the cluster • Data mining, embedded analytics • Richer hub-and-spoke configurations • MPP to MPP • Distributed transactions • Isolation levels, deadlocks, logs • Distributed, parallel query processing • Execution strategies (DW) • Optimization techniques (DW) • Integration with other data services • Streaming • Increased HW architecture choices • Low-power clusters UCI ISG Seminar

Summary • SQL Server Data Platform overview • Workload types • SQL Server Parallel DW DBMS • Design Philosophy • System Architecture • Software Architecture • Query Example • Future challenges UCI ISG Seminar

Resources • Microsoft SQL Server 2008 case studies • http://www.microsoft.com/sqlserver/2008/en/us/case-studies.aspx • SQL Server best practices • http://technet.microsoft.com/en-us/sqlserver/bb671430.aspx • Dave Salch, Eric Kraemer, Umair Waheed, Paul Dyke, Fast Track Data Warehouse 2.0 Architecture, SQL Server Technical Article, MSDN, Nov. 2009. • SQL Server Customer Advisory Team • http://sqlcat.com/Default.aspx • Research on balanced HW architectures • Yogesh Simmhan, et. al. GrayWulf: Scalable Software Architecture for Data Intensive Computing, HICSS pp.1-10, 42nd Hawaii International Conference on System Sciences, 2009 • Alexander S. Szalay, et al., Low Power Amdahl-Balanced Blades for Data Intensive Computing, SC 2009. UCI ISG Seminar

THANKS! UCI ISG Seminar

SQL Server: A Data Platform for Large-Scale Applications

SQL Server: A Data Platform for Large-Scale Applications

Presentation Transcript

Large-Scale SQL Server Deployments for DBAs

Gather SQL Server Performance Data with PowerShell

4. SCALE-UP OF BIOREACTOR SYSTEMS

Big Data and Cloud Computing: New Wine or just New Bottles?

Planning Large-Scale Microsoft SQL Server 2008 R2 Enterprise Edition Deployments with Microsoft SharePoint Server 2010

Client-Server concept

Introducing SQL Server

Hacking Web Applications

Introduction to Large Scale Modeling Systems

Large-Scale Copy Detection

Thesis Defense Large -Scale Graph Computation on Just a PC

Data

GraphChi : Large-Scale Graph Computation on Just a PC

JSP – Java Server Page

From Semistructured Data to XML

From Semistructured Data to XML

Scalable Web Architectures

Week 4 The Large Scale Universe

Large Scale Studies of Dyslexia in Florida