Big Data Working with Terabytes in SQL Server - PowerPoint PPT Presentation

Big data working with terabytes in sql server l.jpg
Download
1 / 45

Big Data Working with Terabytes in SQL Server. Andrew Novick www.NovickSoftware.com. Agenda. Challenges Architecture Solutions. Introduction. Andrew Novick – Novick Software, Inc. Business Application Consulting SQL Server .Net www.NovickSoftware.com Books:

Related searches for Big Data Working with Terabytes in SQL Server

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Big Data Working with Terabytes in SQL Server

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Big data working with terabytes in sql server l.jpg

Big DataWorking with Terabytes in SQL Server

Andrew Novick

www.NovickSoftware.com


Agenda l.jpg

Agenda

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server

  • Challenges

  • Architecture

  • Solutions


Slide3 l.jpg

Introduction

  • Andrew Novick – Novick Software, Inc.

  • Business Application Consulting

    • SQL Server

    • .Net

  • www.NovickSoftware.com

  • Books:

    • Transact-SQL User-Defined Functions

    • SQL 2000 XML Distilled

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


What s big l.jpg

What’s Big?

  • 100’s of gigabytes and up

    to 10’s of terabytes

  • 100,000,000 rows an up

    to 100’s of Billions of rows

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Big scenarios l.jpg

Big Scenarios

  • Data Warehouse

  • Very Large OLTP databases (usually with reporting functions)

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Big hardware l.jpg

Big Hardware

  • Multi-core 8-64

  • RAM 16 GB to 256 GB

  • SAN’s or direct attach RAID

  • 64 Bit SQL Server

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Challenges l.jpg

Challenges


Challenges8 l.jpg

Challenges

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server

  • Load Performance (ETL)

  • Query Performance

  • Data Management Performance


Slide9 l.jpg

How fast can you load

rows into the database?

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Slide10 l.jpg

Load 1,000,000 rows into a 400,000,000 million row table that has 12 indexes?

12 Hours

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Slide11 l.jpg

Load 1,000,000 rows into an empty table and add 12 indexes?

5 Minutes

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Slide12 l.jpg

How do you speed up queries on1,000,000,000

row tables?

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Challenge backup l.jpg

Challenge - Backup

  • Let’s say you have a 10 TB database.

  • Now back that up.

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Backup calculation l.jpg

Backup Calculation

  • 10 TB = 10000 GB

  • Typical Backup speed - 1 to 20 GB / Min

  • At 10 GB/Minute

Who as 16 hours to spare?

Who’s got 1000 minutes?

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Architecture l.jpg

Architecture

What do we have to work with?


Sql server storage architecture l.jpg

Logical Disk System – Windows Drives

Drive C:

Drive D:

Drive E:

Physical IO - subsystem

Disk

Disk

Disk

Disk

Disk

Disk

SQL Server Storage Architecture

SQL Server Storage

Table1

Table2

FileGroupA

FileGroupB

FileA1

FileB1

FileB2

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Solutions l.jpg

Solutions


Solutions18 l.jpg

Solutions

  • Use Multiple FileGroups/Files

  • INSERT into empty unindexed tables

  • Partitioned Tables and/or Views

  • Use READ_ONLY FileGroups

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


At 3 pm on the 1 st of the month where do you want your data to be l.jpg

At 3 PM on the 1st of the month:Where do you want your data to be?

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Slide20 l.jpg

Spread to as many disks as possible

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


I o performance l.jpg

I/O Performance

  • Little has changed in 50 years

  • Size for Performance Not for Space

    • My app needs 1500 reads/sec and 800 writes/sec

I/O throughput is a function of the number of disk drives.

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Solution load performance l.jpg

Solution: Load Performance

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server

  • Insert into empty tables

  • Index and add foreign keys after the insert

  • Add the Slices to

    • Partitioned Views

    • Partitioned Tables


Partitioning l.jpg

Partitioning

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Partitioned views l.jpg

Partitioned Views

CREATE VIEW Fact AS

SELECT * FROM Fact_20080405

UNION ALL SELECT * FROM Fact_20080406

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server

Created like any view


Partitioned views check constraints l.jpg

Partitioned Views: Check Constraints

  • Check constraints tell SQL Server which data is in which table

ALTER TABLE Fact_20080405

ADD CONSTRAINT CK_FACT_20080405_Date CHECK (FactDate >= ‘2008-04-05’

and FactDate < ‘2008-04-06’)

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Partitioned view 2 l.jpg

Partitioned View - 2

  • Looks to a query like any table or view

SELECT FactDate, ….. FROM Fact WHERE CustID=334343

AND FactDate = ‘2008-04-05’

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Partitioned view l.jpg

Logical Disk System – Windows Drives

Drive C:

Drive D:

Drive E:

Physical IO - subsystem

Disk

Disk

Disk

Disk

Disk

Disk

Partitioned View

SQL Server Storage

View Fact

Table1

Table2

Fact_20080330

Fact_20080401

Fact_20080401

Fact_20080331

FGF1

FGF1

FGF2

FGF2

FGF3

FGF3

FGF4

FGF4

FileGroupA

FileGroupB

FileA1

FileB1

FileB2

F1

F1

F2

F2

F3

F3

F4

F4

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Partition elimination l.jpg

Partition Elimination

  • The query compiler can eliminate partitions from consideration in the plan

  • Partition elimination happens at query compile time.

  • It is often necessary to make partition values string constants.

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Demo 1 partitioned views l.jpg

Demo 1 – Partitioned Views

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Partitioned tables l.jpg

Partitioned Tables

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server

SQL Server Enterprise/2005

Require a non-null partitioning column

Check constraints tell SQL Server what data is in each parturition

All tables are partitioned!


Partitioned function l.jpg

Partitioned Function

  • Defines how to split data

CREATE PARTITION FUNCTION

Fact_PF(smalldatetime)

RANGE RIGHT FOR VALUES

(‘2001-07-01’, ‘2001-07-02’)

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Partition scheme l.jpg

Partition Scheme

  • Defines where to store each range of data

CREATE PARTITION SCHEME Fact_PS

AS PARTITION Fact_pf

TO (PRIMARY, FG_20010701, FG_20010702)

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Creating a partitioned table l.jpg

Creating a Partitioned Table

  • The table is created ON thePartitioned Scheme

CREATE TABLE Fact (Fact_Date smalldatetime

, all my other columns)

ON Fact_PS (Fact_Date)

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Partitioned table l.jpg

Logical Disk System – Windows Drives

Drive C:

Drive D:

Drive E:

Physical IO - subsystem

Disk

Disk

Disk

Disk

Disk

Disk

Partitioned Table

SQL Server Storage

Table Fact

Table1

Table2

Fact.$Partition=1

Fact.$Partition=3

Fact.$Partition=4

Fact.$Partition=2

FileGroupA

FileGroupB

FGF1

FGF2

FGF3

FGF4

FileA1

FileB1

FileB2

F1

F2

F3

F4

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Slide35 l.jpg

Demo 2 – Partitioned Tables

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Partitioning goals l.jpg

Partitioning Goals

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server

  • Adequate Import Speed

  • Maximize Query Performance

    • Make use of all available resources

  • Data Management

    • Migrate data to cheaper resources

    • Delete old data easily


Achieving query speed l.jpg

Achieving Query Speed

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server

  • Eliminate partitions during query compile

  • All disk resources should be used

    • Spread Data to use all drives

    • Parallelize by querying multiple partitions

  • All available memory should be used

  • All available CPUs should be used


Issues with partitioning l.jpg

Issues with Partitioning

  • No foreign keys can reference the Partitioned Table

  • Identity columns must be more closely managed.

  • UPDATES on partitioned tables with part of the table in READ_ONLY filegroups must have partition elimination that restricts the updates to READ_WRITE filegroups.

  • INSERTs into partitioned views require all columns and face additional restrictions.

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Solution backup performance l.jpg

Solution: Backup Performance

  • Backup less!

  • Maintain data in a READ_ONLY state

  • Compress Backups

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Read only filegroups l.jpg

Read_Only FileGroups

  • Requires only one Backup after becoming read_only

  • Don’t require page or row locks

  • Don’t require maintenance

  • The ALTER requires exclusive access to the database before SQL 2008

ALTER DATABASE <database> MODIFY FILEGROUP <filegroup> SET READ_ONLY

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Partial backup l.jpg

Partial Backup

  • Partial Base

    • Backs up read_write filegroups

  • Partial Differential

    • Differential backup of read_write filegroups

BACKUP DATABASE <db name>

READ_WRITE_FILEGROUPS …..

BACKUP DATABASE <db name>

READ_WRITE_FILEGROUPS

WITH DIFFERENTIAL ….

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Maintenance operations l.jpg

Maintenance Operations

  • Maintain only READ_WRITE data

    • DBCC CHECKFILEGROUP

    • ALTER INDEX

      • REBUILD PARTITION =

      • REORGANIZE PARTITION =

  • Avoid SHRINK

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Sliding window l.jpg

Sliding Window

Always

There

Data

Temporal

Data

2008-01

Temporal

Data

2008-02

Temporal

Data

2008-03

Temporal

Data

2008-04

Temporal

Data

2008-05

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Sql server 2008 what s new l.jpg

SQL Server 2008 – What’s New

  • Row, page, and backup compression

  • Filtered Indexes

  • Optimization for star joins

  • Lock Escalation to the Partition Level

  • Partitioned Indexed Views

  • Fewer operations require exclusive access to the database

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


Slide45 l.jpg

Thanks for Coming

Andrew Novick

  • anovick@NovickSoftware.com

    www.NovickSoftware.com

PASS Community Summit 2008AD-202Big Data: Working with Terabytes in SQL Server


  • Login