1 / 22

Inside SQL Server Polybase

Inside SQL Server Polybase. Bob Ward, Principal Architect, Microsoft. https://aka.ms/bobsql https://aka.ms/bobwardms https://aka.ms/bobsqldemos. Session learning objectives. At the end of this session, you should be better able to… Understand what is Polybase and how to use it.

ikeb
Download Presentation

Inside SQL Server Polybase

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inside SQL Server Polybase Bob Ward, Principal Architect, Microsoft https://aka.ms/bobsql https://aka.ms/bobwardms https://aka.ms/bobsqldemos

  2. Session learning objectives At the end of this session, you should be better able to… • Understand what is Polybase and how to use it. • Understand how Polybase works to know its capabilities for data virtualization • Learn how to use Polybase to build a data hub with SQL Server

  3. “It’s all about Data Virtualization” What is SQL Server Polybase? Distributed compute engine integrated with SQL Server Query data where it lives using T-SQL Distributed, scalable query performance Manual/deploy with SQL Server Auto deploy/optimize with Big Data Clusters Analytics Apps T-SQL SQL Server PolyBase external tables NoSQL ODBC Relational databases Big data Excel Cosmos DB HDFS Intelligence over all data

  4. The Journey to Data Virtualization in SQL Server 2019 Linux support is coming David Dewitt Rimma Nehme SQL Server 2019 is in Preview Some details subject to change

  5. Using Polybase in SQL Server: T-SQL EXTERNAL TABLE Only for HDFS Login and password Not simple without BDC WWI SQL And join to any other table or external table metadata Results streamed HDFS INSERT only for HDFS Data lives here Cosmos DB

  6. control and execution The SQL Server Polybase Architecture Data flow All editions EE only All editions “Compute” node “Head” node “Compute” node DW dbs DW dbs DW dbs SQL Engine SQL Engine SQL Engine Need more scale? Add compute nodes tempdb tempdb tempdb Polybase Engine Polybase Engine Polybase Engine ……... Polybase Data Movement Service Polybase Data Movement Service Polybase Data Movement Service Scan or pushdown Scale out with partitions shuffle mpdwsvc.exe Your data sources HDFS Cosmos DB

  7. DemoDive into Polybase

  8. SQL Server Polybase and Hadoop

  9. Polybase and Other Connectors \binn\Polybase\ODBC Drivers LOCATION “string” in EXTERNAL DATA SOURCE Built-in Connectors (use ODBC) sqlserver oracle teradata mongodb SQL Server, Azure SQL Database, Azure SQL Data Warehouse No client software install required Scale out with partitions MongoDB or CosmosDB (using MongoDB API) 64bit ODBC 3.0+ compliant ODBC Connector odbc You install the driver Ex. SAP HANA (HDBCODBC Driver)

  10. Polybase as a Semantic Layer

  11. External Tables vs Linked Servers * Insert into HDFS allowed

  12. SQL Server 2019: Data Virtualization Modern StockItems Legacy Suppliers Mobile App Orders WWI SQL SQL Server 2019 WideWorldimporters Accounts Receivable Customers from Acquisition Order Reviews

  13. DemoData Virtualization around the WideWorld

  14. Lessons Learned COLLATE required for character columns. Compat may be required ORACLE case sensitive for LOCATION = <instance>.<schema>.<table> LOCATION for SQL Server is <db>.<schema>.<table> EXTERNAL tables don’t support these types (they may be more) VARCHAR(MAX) GEOGRAPHY Computed Columns JSON MongoDB (CosmosDB) observations Be careful of types in your document LOCATION = <database>.<collection> Need to dive in EXTERNAL TABLE compatibility (Ex. Row Level Security)

  15. SQL Server 2019 Big Data Clusters and Polybase The Problem Customers want to integrate with Big Data and other data sources easily Polybase is not simple to setup, configure, maintain, and scale elastically Customers may not have a Hadoop cluster or want to build one Polybase “classic” (MapReduce) could be better The Solution Kubernetes and containers to deploy and scale elastically Everything pre-installed including HDFS cluster Build a control plane to help manage and monitor Enhance SQL Server to read from HDFS natively Provide a data mart for cached results Introduce Spark and Notebooks for Data scientists

  16. mpdwsvc.exe uses SQLPAL Polybase in SQL Server 2019 Big Data Clusters Control Plane • Controller Svc • Azure FSM Engine • Kibana • Grafana Customapps SQL BI Analytics • Elastic Search • InfluxDB • Configuration Store (SQL Server) Polybase head node In Linux container Cluster Polybase compute nodes In Linux containers SQL Server master instance Spark Spark Spark SQL Server SQL Server SQL Server Compute pool Compute pool Directly read from HDFS Compute pool • SQL Compute Node • SQL Compute Node • SQL Compute Node • SQL Compute Node • SQL Compute Node • … • HDFS Data Node • HDFS Data Node • HDFS Data Node “Built-in” Data Sources Storage pool Data mart External data sources • … MapReduce Not used • SQL Data Node • SQL Data Node • Storage • Storage Kubernetes pod IoT data Persistent storage Node Node Node Node Node Node Node

  17. Storage and Data Pools Data Sources CREATE EXTERNAL DATA SOURCE SqlDataPool WITH (LOCATION = 'sqldatapool://service-mssql-controller:8080/datapools/default’); CREATE EXTERNAL TABLE… WITH ( DATA_SOURCE = SqlDataPool, DISTRIBUTION = ROUND_ROBIN ); CREATE EXTERNAL DATA SOURCE SqlStoragePool WITH (LOCATION = 'sqlhdfs://service-mssql-controller:8080’); CREATE EXTERNAL TABLE … WITH ( DATA_SOURCE = SqlStoragePool, LOCATION = '/clickstream_data', FILE_FORMAT = csv_file ); Preinstalled in model of Master Instance mssql-controller REST endpoint

  18. Azure Data Studio and Data Virtualization Using PROSE for intelligent import and schema detection

  19. Session takeaways • Polybase = Data Virtualization = Reduced Need for ETL • Polybase provides distributed read scale performance • Big Data Clusters automate the deployment of Polybase • Download and try it yourself • Sign up for EAP for SQL Server 2019 Big Data Clusters

  20. Questions? aka.ms/SQLBits19

  21. https://aka.ms/bobsql https://aka.ms/bobwardms https://aka.ms/bobsqldemos Session resources SQL Server 2019 Polybase documentation SQL Server Big Data Clusters documentation Polybase demos on GitHub Azure SQL Database Elastic Query documentation Loading data into Azure SQL Data Warehouse with Polybase

More Related