1 / 46

Big Data Working with Terabytes in SQL Server

Big Data Working with Terabytes in SQL Server. Andrew Novick www.NovickSoftware.com. Agenda. What’s Big? Concerns ETL/Load Performance Query Performance Backup/Restore Performance Architecture Solutions. Introduction. Andrew Novick – Novick Software, Inc.

lucien
Download Presentation

Big Data Working with Terabytes in SQL Server

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big DataWorking with Terabytes in SQL Server Andrew Novick www.NovickSoftware.com

  2. Agenda • What’s Big? • Concerns • ETL/Load Performance • Query Performance • Backup/Restore Performance • Architecture • Solutions

  3. Introduction • Andrew Novick – Novick Software, Inc. • Business Application Consulting • SQL Server • .Net • www.NovickSoftware.com • Books: • Transact-SQL UDFs • SQL 2000 XML Distilled

  4. SQL Pass 2008 • November 18-21 – Seattle

  5. What’s big?

  6. What’s Big? • 100’s of gigabytes and up to 10’s of terabytes • 100,000,000 rows an up to 100’s of Billions of rows

  7. Big Scenarios • Data Warehouse • Very Large OLTP databases (usually with reporting functions)

  8. Big Hardware • Multi-core 8-64 • RAM 16 GB to 256 GB • SAN’s or direct attach RAID • 64 Bit SQL Server

  9. Concerns

  10. What me worry?

  11. Concerns • Load Speed (ETL) • Query Speed • Data Management • Backup / Restore • DBCC CHECKDB, remove Fragmentation

  12. Architecture What do we have to work with?

  13. Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk Disk Disk Disk Disk Disk SQL Server Storage Architecture SQL Server Storage Table1 Table2 FileGroupA FileGroupB FileA1 FileB1 FileB2

  14. Solutions

  15. Solution to what? • Load Speed (ETL) • Query Speed • Data Management • Backup / Restore • DBCC CHECKDB, remove Fragmentation

  16. Solutions • Use Multiple FileGroups/Files • Spread Data to maximize resource use • Sliding Window if there is a time dimension • Partitioned Tables and/or Views • ETL – Insert into empty unindexed tables • Use READ_ONLY FileGroups to minimize maintenance needs.

  17. I/O Performance • Little has changed in 50 years • Watch out for bottlenecks in the I/O Path • Memory reduces the need for I/O Disks can only do so many I/O operations per second The more disk heads you have the higher the I/O throughput.

  18. At 3 PM on the 1st of the month:Where do you want your data to be?

  19. Spread to as many disk resources as possible.

  20. Sliding Window Always There Data Temporal Data 2008-01 Temporal Data 2008-02 Temporal Data 2008-03 Temporal Data 2008-04 Temporal Data 2008-05

  21. Read_Only FileGroups • Require only one Backup • Don’t require page or row locks • Don’t require maintenance • The ALTER requires exclusive access to the database before SQL 2008 ALTER DATABASE <database> MODIFY FILEGROUP <filegroup> SET READ_ONLY

  22. Concern - Load Performance (ETL) • 4 Hour maximum window for any load • Load into large indexed tables is unacceptably long. • Example: 2 million row insert into 400 million row table with 10 indexes took 12 hours.

  23. Concern – Query Performance • Users have little patience • Data warehouse Queries • Frequent small to medium to support UI • Less frequent large queries on fact tables may access 10’s of GB

  24. Partitioning

  25. Partitioned Views CREATE VIEW Fact AS SELECT * FROM Fact_20080405 UNION ALL SELECT * FROM Fact_20080406 ALTER TABLE Fact_20080405 ADD CONSTRAINT CK_FACT_20080405_Date CHECK (FactDate >= ‘2008-04-05’ and FactDate < ‘2008-04-06’ Available in SQL Server Standard Available in SQL Server 2000 Created like any view Check constraints tell SQL Server which data is in which table

  26. Partitioned View - 2 • Looks to a query like any table or view • Can take advantage of parallel execution. • Limited to 256 tables • Can cross servers (Performance Warning) SELECT FactDate, ….. FROM Fact WHERE CustID=334343 AND FactDate = ‘2008-04-05’

  27. Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk Disk Disk Disk Disk Disk Partitioned View SQL Server Storage View Fact Table1 Table2 Fact_20080330 Fact_20080401 Fact_20080401 Fact_20080331 FGF1 FGF1 FGF2 FGF2 FGF3 FGF3 FGF4 FGF4 FileGroupA FileGroupB FileA1 FileB1 FileB2 F1 F1 F2 F2 F3 F3 F4 F4

  28. Partition Elimination • The query compiler can eliminate partitions from consideration in the plan • Partition elimination happens at query compile time. • Values matching the partitioning column must be constants to allow partition elimination.

  29. Demo 1 – Partitioned Views

  30. Partitioned Tables SQL Server Enterprise SQL Server 2005 and Above Require a non-null partitioning column Check constraints tell SQL Server what data is in each parturition All tables are partitioned!

  31. Partitioned Tables 2 • Partition Function • Defines how to split data • Partition Scheme • Defines where to store each range of data CREATE Partitioned View Fact_PF(smalldatetime) RANGE RIGHT FOR VALUES (‘2001-07-01’, ‘2001-07-02’) CREATE PARTITION SCHEME Fact_PF AS PARTITION Fact_pf TO(PRIMARY, FG_20010701, FG_20010702)

  32. Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk Disk Disk Disk Disk Disk Partitioned Table SQL Server Storage Table Fact Table1 Table2 Fact.$Partition=1 Fact.$Partitoin=3 Fact.$Partition=4 Fact.$Partition=2 FileGroupA FileGroupB FGF1 FGF2 FGF3 FGF4 FileA1 FileB1 FileB2 F1 F2 F3 F4

  33. Demo 2 – Partitioned Tables

  34. Partitioning Goals • Adequate Import Speed • Maximize Query Performance • Make use of all available resources • Data Management • Migrate data to cheaper resources • Delete old data easily

  35. Achieving Load Speed • Insert into empty tables • Index and add foreign keys after the insert • Add the Slices to • Partitioned Views • Partitioned Tables

  36. Achieving Query Speed • Eliminate access to partitions during query compile • All disk resources should be used • Parallel access • All available memory should be used • All available CPUs should be used • Parallel query

  37. Solution • Partition at a sufficiently high grain • Spread dimension data to all useable disks • Separate Data and Index FileGroups • Multiple files per FileGroup • Spread Fact data by partition key to all useable disks • Rotate file locations to maximize dispersion

  38. Concern – Data Management (Backup) • Let’s say you have a 10 TB database. • Now back that up.

  39. Backup Calculation • 10 TB = 10000 GB • Typical Backup speed • Low end 1 GB per minute • High end 10 GB per minute • At 10 GB/Minute Who’s got 1000 minutes?

  40. Achieving Backup Performance • Backup less! • Maintain data in a READ_ONLY state • Compress Backups

  41. Partial Backup • Partial Base • Backs up read_write filegroups • Partial Differential • Differential backup of read_write filegroups BACKUP DATABASE <db name> READ_WRITE_FILEGROUPS ….. BACKUP DATABASE <db name> READ_WRITE_FILEGROUPS WITH DIFFERENTIAL ….

  42. Maintenance Operations • Maintain only READ_WRITE data • DBCC CHECKFILEGROUP • ALTER INDEX • REBUILD PARTITION = • REORGANIZE PARTITION = • Avoid SHRINK

  43. SQL Server 2008 – What’s New • Row, page, and backup compression • Filtered Indexes • Optimization for star joins • MERGE T-SQL DML • Resource Governor • Fewer operations require exclusive access to the database

  44. New England Visual Basic Pro • Focused on VB.Net development • Meetings @ MS Waltham – MPR C • 1st Thursday - 6:15 to 8:30 PM • Sept 4 – Jim O’Neil – ASP.Net Dynamic Data • Sept 25 – Chris Hammond – DotNetNuke • Oct 2 – Kathleen Dollard – XML Litterals in VB 9 • Nov 6 – Joe Stagner – Stupid Hacker Tricks and How 2 Defend • Feb 5 ’09 – Joe Hill – Novell – Mono/VB/etc…. • www.NEVB.com

  45. Thanks for Coming Andrew Novick • anovick@NovickSoftware.com www.NovickSoftware.com

More Related