big data working with terabytes in sql server
Download
Skip this Video
Download Presentation
Big Data Working with Terabytes in SQL Server

Loading in 2 Seconds...

play fullscreen
1 / 45

Big Data Working with Terabytes in - PowerPoint PPT Presentation


  • 400 Views
  • Uploaded on

Big Data Working with Terabytes in SQL Server. Andrew Novick www.NovickSoftware.com. Agenda. Challenges Architecture Solutions. Introduction. Andrew Novick – Novick Software, Inc. Business Application Consulting SQL Server .Net www.NovickSoftware.com Books:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Big Data Working with Terabytes in' - Mia_John


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
big data working with terabytes in sql server

Big DataWorking with Terabytes in SQL Server

Andrew Novick

www.NovickSoftware.com

agenda
Agenda

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  • Challenges
  • Architecture
  • Solutions
slide3

Introduction

  • Andrew Novick – Novick Software, Inc.
  • Business Application Consulting
    • SQL Server
    • .Net
  • www.NovickSoftware.com
  • Books:
    • Transact-SQL User-Defined Functions
    • SQL 2000 XML Distilled

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

what s big
What’s Big?
  • 100’s of gigabytes and up

to 10’s of terabytes

  • 100,000,000 rows an up

to 100’s of Billions of rows

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

big scenarios
Big Scenarios
  • Data Warehouse
  • Very Large OLTP databases (usually with reporting functions)

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

big hardware
Big Hardware
  • Multi-core 8-64
  • RAM 16 GB to 256 GB
  • SAN’s or direct attach RAID
  • 64 Bit SQL Server

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

challenges8
Challenges

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  • Load Performance (ETL)
  • Query Performance
  • Data Management Performance
slide9

How fast can you load

rows into the database?

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

slide10

Load 1,000,000 rows into a 400,000,000 million row table that has 12 indexes?

12 Hours

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

slide11

Load 1,000,000 rows into an empty table and add 12 indexes?

5 Minutes

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

slide12

How do you speed up queries on1,000,000,000

row tables?

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

challenge backup
Challenge - Backup
  • Let’s say you have a 10 TB database.
  • Now back that up.

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

backup calculation
Backup Calculation
  • 10 TB = 10000 GB
  • Typical Backup speed - 1 to 20 GB / Min
  • At 10 GB/Minute

Who as 16 hours to spare?

Who’s got 1000 minutes?

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

architecture

Architecture

What do we have to work with?

sql server storage architecture

Logical Disk System – Windows Drives

Drive C:

Drive D:

Drive E:

Physical IO - subsystem

Disk

Disk

Disk

Disk

Disk

Disk

SQL Server Storage Architecture

SQL Server Storage

Table1

Table2

FileGroupA

FileGroupB

FileA1

FileB1

FileB2

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

solutions18
Solutions
  • Use Multiple FileGroups/Files
  • INSERT into empty unindexed tables
  • Partitioned Tables and/or Views
  • Use READ_ONLY FileGroups

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

at 3 pm on the 1 st of the month where do you want your data to be
At 3 PM on the 1st of the month:Where do you want your data to be?

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

slide20

Spread to as many disks as possible

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

i o performance
I/O Performance
  • Little has changed in 50 years
  • Size for Performance Not for Space
    • My app needs 1500 reads/sec and 800 writes/sec

I/O throughput is a function of the number of disk drives.

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

solution load performance
Solution: Load Performance

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  • Insert into empty tables
  • Index and add foreign keys after the insert
  • Add the Slices to
    • Partitioned Views
    • Partitioned Tables
partitioning
Partitioning

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

partitioned views
Partitioned Views

CREATE VIEW Fact AS

SELECT * FROM Fact_20080405

UNION ALL SELECT * FROM Fact_20080406

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

Created like any view

partitioned views check constraints
Partitioned Views: Check Constraints
  • Check constraints tell SQL Server which data is in which table

ALTER TABLE Fact_20080405

ADD CONSTRAINT CK_FACT_20080405_Date CHECK (FactDate >= ‘2008-04-05’

and FactDate < ‘2008-04-06’)

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

partitioned view 2
Partitioned View - 2
  • Looks to a query like any table or view

SELECT FactDate, ….. FROM Fact WHERE CustID=334343

AND FactDate = ‘2008-04-05’

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

partitioned view

Logical Disk System – Windows Drives

Drive C:

Drive D:

Drive E:

Physical IO - subsystem

Disk

Disk

Disk

Disk

Disk

Disk

Partitioned View

SQL Server Storage

View Fact

Table1

Table2

Fact_20080330

Fact_20080401

Fact_20080401

Fact_20080331

FGF1

FGF1

FGF2

FGF2

FGF3

FGF3

FGF4

FGF4

FileGroupA

FileGroupB

FileA1

FileB1

FileB2

F1

F1

F2

F2

F3

F3

F4

F4

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

partition elimination
Partition Elimination
  • The query compiler can eliminate partitions from consideration in the plan
  • Partition elimination happens at query compile time.
  • It is often necessary to make partition values string constants.

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

demo 1 partitioned views
Demo 1 – Partitioned Views

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

partitioned tables
Partitioned Tables

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

SQL Server Enterprise/2005

Require a non-null partitioning column

Check constraints tell SQL Server what data is in each parturition

All tables are partitioned!

partitioned function
Partitioned Function
  • Defines how to split data

CREATE PARTITION FUNCTION

Fact_PF(smalldatetime)

RANGE RIGHT FOR VALUES

(‘2001-07-01’, ‘2001-07-02’)

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

partition scheme
Partition Scheme
  • Defines where to store each range of data

CREATE PARTITION SCHEME Fact_PS

AS PARTITION Fact_pf

TO (PRIMARY, FG_20010701, FG_20010702)

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

creating a partitioned table
Creating a Partitioned Table
  • The table is created ON thePartitioned Scheme

CREATE TABLE Fact (Fact_Date smalldatetime

, all my other columns)

ON Fact_PS (Fact_Date)

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

partitioned table

Logical Disk System – Windows Drives

Drive C:

Drive D:

Drive E:

Physical IO - subsystem

Disk

Disk

Disk

Disk

Disk

Disk

Partitioned Table

SQL Server Storage

Table Fact

Table1

Table2

Fact.$Partition=1

Fact.$Partition=3

Fact.$Partition=4

Fact.$Partition=2

FileGroupA

FileGroupB

FGF1

FGF2

FGF3

FGF4

FileA1

FileB1

FileB2

F1

F2

F3

F4

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

slide35

Demo 2 – Partitioned Tables

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

partitioning goals
Partitioning Goals

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  • Adequate Import Speed
  • Maximize Query Performance
    • Make use of all available resources
  • Data Management
    • Migrate data to cheaper resources
    • Delete old data easily
achieving query speed
Achieving Query Speed

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

  • Eliminate partitions during query compile
  • All disk resources should be used
    • Spread Data to use all drives
    • Parallelize by querying multiple partitions
  • All available memory should be used
  • All available CPUs should be used
issues with partitioning
Issues with Partitioning
  • No foreign keys can reference the Partitioned Table
  • Identity columns must be more closely managed.
  • UPDATES on partitioned tables with part of the table in READ_ONLY filegroups must have partition elimination that restricts the updates to READ_WRITE filegroups.
  • INSERTs into partitioned views require all columns and face additional restrictions.

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

solution backup performance
Solution: Backup Performance
  • Backup less!
  • Maintain data in a READ_ONLY state
  • Compress Backups

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

read only filegroups
Read_Only FileGroups
  • Requires only one Backup after becoming read_only
  • Don’t require page or row locks
  • Don’t require maintenance
  • The ALTER requires exclusive access to the database before SQL 2008

ALTER DATABASE <database> MODIFY FILEGROUP <filegroup> SET READ_ONLY

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

partial backup
Partial Backup
  • Partial Base
    • Backs up read_write filegroups
  • Partial Differential
    • Differential backup of read_write filegroups

BACKUP DATABASE <db name>

READ_WRITE_FILEGROUPS …..

BACKUP DATABASE <db name>

READ_WRITE_FILEGROUPS

WITH DIFFERENTIAL ….

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

maintenance operations
Maintenance Operations
  • Maintain only READ_WRITE data
    • DBCC CHECKFILEGROUP
    • ALTER INDEX
      • REBUILD PARTITION =
      • REORGANIZE PARTITION =
  • Avoid SHRINK

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

sliding window
Sliding Window

Always

There

Data

Temporal

Data

2008-01

Temporal

Data

2008-02

Temporal

Data

2008-03

Temporal

Data

2008-04

Temporal

Data

2008-05

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

sql server 2008 what s new
SQL Server 2008 – What’s New
  • Row, page, and backup compression
  • Filtered Indexes
  • Optimization for star joins
  • Lock Escalation to the Partition Level
  • Partitioned Indexed Views
  • Fewer operations require exclusive access to the database

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

slide45

Thanks for Coming

Andrew Novick

www.NovickSoftware.com

PASS Community Summit 2008 AD-202 Big Data: Working with Terabytes in SQL Server

ad