1 / 45

Big Data Working with Terabytes in SQL Server

Big Data Working with Terabytes in SQL Server. Andrew Novick www.NovickSoftware.com. Agenda. Challenges Architecture Solutions. Introduction. Andrew Novick – Novick Software, Inc. Business Application Consulting SQL Server .Net www.NovickSoftware.com Books:

Mia_John
Download Presentation

Big Data Working with Terabytes in SQL Server

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big DataWorking with Terabytes in SQL Server Andrew Novick www.NovickSoftware.com

  2. Agenda PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server • Challenges • Architecture • Solutions

  3. Introduction • Andrew Novick – Novick Software, Inc. • Business Application Consulting • SQL Server • .Net • www.NovickSoftware.com • Books: • Transact-SQL User-Defined Functions • SQL 2000 XML Distilled PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  4. What’s Big? • 100’s of gigabytes and up to 10’s of terabytes • 100,000,000 rows an up to 100’s of Billions of rows PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  5. Big Scenarios • Data Warehouse • Very Large OLTP databases (usually with reporting functions) PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  6. Big Hardware • Multi-core 8-64 • RAM 16 GB to 256 GB • SAN’s or direct attach RAID • 64 Bit SQL Server PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  7. Challenges

  8. Challenges PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server • Load Performance (ETL) • Query Performance • Data Management Performance

  9. How fast can you load rows into the database? PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  10. Load 1,000,000 rows into a 400,000,000 million row table that has 12 indexes? 12 Hours PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  11. Load 1,000,000 rows into an empty table and add 12 indexes? 5 Minutes PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  12. How do you speed up queries on1,000,000,000 row tables? PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  13. Challenge - Backup • Let’s say you have a 10 TB database. • Now back that up. PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  14. Backup Calculation • 10 TB = 10000 GB • Typical Backup speed - 1 to 20 GB / Min • At 10 GB/Minute Who as 16 hours to spare? Who’s got 1000 minutes? PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  15. Architecture What do we have to work with?

  16. Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk Disk Disk Disk Disk Disk SQL Server Storage Architecture SQL Server Storage Table1 Table2 FileGroupA FileGroupB FileA1 FileB1 FileB2 PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  17. Solutions

  18. Solutions • Use Multiple FileGroups/Files • INSERT into empty unindexed tables • Partitioned Tables and/or Views • Use READ_ONLY FileGroups PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  19. At 3 PM on the 1st of the month:Where do you want your data to be? PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  20. Spread to as many disks as possible PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  21. I/O Performance • Little has changed in 50 years • Size for Performance Not for Space • My app needs 1500 reads/sec and 800 writes/sec I/O throughput is a function of the number of disk drives. PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  22. Solution: Load Performance PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server • Insert into empty tables • Index and add foreign keys after the insert • Add the Slices to • Partitioned Views • Partitioned Tables

  23. Partitioning PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  24. Partitioned Views CREATE VIEW Fact AS SELECT * FROM Fact_20080405 UNION ALL SELECT * FROM Fact_20080406 PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server Created like any view

  25. Partitioned Views: Check Constraints • Check constraints tell SQL Server which data is in which table ALTER TABLE Fact_20080405 ADD CONSTRAINT CK_FACT_20080405_Date CHECK (FactDate >= ‘2008-04-05’ and FactDate < ‘2008-04-06’) PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  26. Partitioned View - 2 • Looks to a query like any table or view SELECT FactDate, ….. FROM Fact WHERE CustID=334343 AND FactDate = ‘2008-04-05’ PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  27. Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk Disk Disk Disk Disk Disk Partitioned View SQL Server Storage View Fact Table1 Table2 Fact_20080330 Fact_20080401 Fact_20080401 Fact_20080331 FGF1 FGF1 FGF2 FGF2 FGF3 FGF3 FGF4 FGF4 FileGroupA FileGroupB FileA1 FileB1 FileB2 F1 F1 F2 F2 F3 F3 F4 F4 PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  28. Partition Elimination • The query compiler can eliminate partitions from consideration in the plan • Partition elimination happens at query compile time. • It is often necessary to make partition values string constants. PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  29. Demo 1 – Partitioned Views PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  30. Partitioned Tables PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server SQL Server Enterprise/2005 Require a non-null partitioning column Check constraints tell SQL Server what data is in each parturition All tables are partitioned!

  31. Partitioned Function • Defines how to split data CREATE PARTITION FUNCTION Fact_PF(smalldatetime) RANGE RIGHT FOR VALUES (‘2001-07-01’, ‘2001-07-02’) PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  32. Partition Scheme • Defines where to store each range of data CREATE PARTITION SCHEME Fact_PS AS PARTITION Fact_pf TO (PRIMARY, FG_20010701, FG_20010702) PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  33. Creating a Partitioned Table • The table is created ON thePartitioned Scheme CREATE TABLE Fact (Fact_Date smalldatetime , all my other columns) ON Fact_PS (Fact_Date) PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  34. Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk Disk Disk Disk Disk Disk Partitioned Table SQL Server Storage Table Fact Table1 Table2 Fact.$Partition=1 Fact.$Partition=3 Fact.$Partition=4 Fact.$Partition=2 FileGroupA FileGroupB FGF1 FGF2 FGF3 FGF4 FileA1 FileB1 FileB2 F1 F2 F3 F4 PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  35. Demo 2 – Partitioned Tables PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  36. Partitioning Goals PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server • Adequate Import Speed • Maximize Query Performance • Make use of all available resources • Data Management • Migrate data to cheaper resources • Delete old data easily

  37. Achieving Query Speed PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server • Eliminate partitions during query compile • All disk resources should be used • Spread Data to use all drives • Parallelize by querying multiple partitions • All available memory should be used • All available CPUs should be used

  38. Issues with Partitioning • No foreign keys can reference the Partitioned Table • Identity columns must be more closely managed. • UPDATES on partitioned tables with part of the table in READ_ONLY filegroups must have partition elimination that restricts the updates to READ_WRITE filegroups. • INSERTs into partitioned views require all columns and face additional restrictions. PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  39. Solution: Backup Performance • Backup less! • Maintain data in a READ_ONLY state • Compress Backups PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  40. Read_Only FileGroups • Requires only one Backup after becoming read_only • Don’t require page or row locks • Don’t require maintenance • The ALTER requires exclusive access to the database before SQL 2008 ALTER DATABASE <database> MODIFY FILEGROUP <filegroup> SET READ_ONLY PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  41. Partial Backup • Partial Base • Backs up read_write filegroups • Partial Differential • Differential backup of read_write filegroups BACKUP DATABASE <db name> READ_WRITE_FILEGROUPS ….. BACKUP DATABASE <db name> READ_WRITE_FILEGROUPS WITH DIFFERENTIAL …. PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  42. Maintenance Operations • Maintain only READ_WRITE data • DBCC CHECKFILEGROUP • ALTER INDEX • REBUILD PARTITION = • REORGANIZE PARTITION = • Avoid SHRINK PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  43. Sliding Window Always There Data Temporal Data 2008-01 Temporal Data 2008-02 Temporal Data 2008-03 Temporal Data 2008-04 Temporal Data 2008-05 PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  44. SQL Server 2008 – What’s New • Row, page, and backup compression • Filtered Indexes • Optimization for star joins • Lock Escalation to the Partition Level • Partitioned Indexed Views • Fewer operations require exclusive access to the database PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  45. Thanks for Coming Andrew Novick • anovick@NovickSoftware.com www.NovickSoftware.com PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

More Related