Storage performance on sql server l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 75

Storage Performance on SQL Server PowerPoint PPT Presentation


  • 420 Views
  • Uploaded on
  • Presentation posted in: General

Storage Performance on SQL Server. Joe Chang. Coverage. Emphasis is on Line of Business DB Different priorities for less critical apps Performance Fault-tolerance covered else where. Overview. IO Performance Objectives The Complete Storage Environment Direct-Attach and SAN

Download Presentation

Storage Performance on SQL Server

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Storage performance on sql server l.jpg

Storage Performance on SQL Server

Joe Chang


Coverage l.jpg

Coverage

  • Emphasis is on Line of Business DB

    • Different priorities for less critical apps

  • Performance

    • Fault-tolerance covered else where


Overview l.jpg

Overview

  • IO Performance Objectives

  • The Complete Storage Environment

    • Direct-Attach and SAN

  • Storage Components

  • Disk Performance

  • SQL Server IO Characteristics

  • Configuration Examples

  • SSD


Old rules l.jpg

Old Rules

  • Meet transaction throughput

  • Disk Performance Criteria

    • Read from Data, Write to Logs

  • Separate Data and Log files?

  • Disk Queue Depth < 2 per disk

    • Prevalent use of SAN

      • LUNs with unknown number of disks

    • Latency (Avg Disk Sec/Read)


Storage performance criteria l.jpg

Storage Performance Criteria

  • SELECT (Read) Query

    • Data must be read into buffer cache if not already in cache – read from data

  • INSERT/UPDATE/DELETE (Write) Query

    • Data must be read into buffer cache

    • Transaction must be written to log

    • Buffer is marked as dirty, lazy writer handles

  • Large Query (as necessary)

    • Write and Read to tempdb


Flashback 1994 2009 l.jpg

Flashback: 1994 - 2009

  • 1994: Pentium 100MHz

    • 64MB, 4 x 16MB SIMM ($700+ each?)

    • OS + DB executable ~ 16-24MB

    • Net: 40MB Buffer cache

      • Difficult to support transactions

      • Reports run on 1st of month

  • Today: 4 x Quad Core

    • 128GB, 32 x 4GB, $4800

    • 3000 X increase in buffer cache


Requirements then and now l.jpg

Requirements Then and Now

  • Old: Support transactions

    • No longer really an issue for most environments (after proper SQL tuning!)

  • Today: Minimize disruptions to transactions

    • Large query or table scan while supporting transactions

    • Checkpoint – write dirty buffers to data

    • Transaction Log backup

    • Backup & Restore


Cost versus value requirements l.jpg

Cost versus Value/Requirements

  • Money is no object:

    • With sufficient number of disks, IO channels, proper configuration

    • It is possible to avoid most disruptions

  • Otherwise – Manage IO disruptions

    • Establish tolerable disruptions: 5-30 seconds?

    • Large reports run off-hours

    • Configure sufficient performance to handle transient events


Complete storage environment direct attach and san l.jpg

Complete Storage EnvironmentDirect Attach and SAN


Most common mistakes l.jpg

Most Common Mistakes

  • Storage sized to capacity requirements

  • 2 HBA (or RAID Controllers)

  • Too few big capacity disk drives

  • Fill system PCI-E slots with controllers

  • Many small 15K drives (146 3.5 or 73 2.5)


Direct attach l.jpg

CPU

CPU

CPU

CPU

IO HUB

IO HUB

PCI-E

PCI-E

PCI-E

PCI-E

HBA

HBA

HBA

HBA

SAS

SAS

SAS

SAS

Direct Attach

System IO capabilities is distributed across multiple PCI-E slots.

Single controller does not have sufficient IO

Single (or even Dual) SAS/FC port does not have sufficient IO

Distribute IO over multiple

PCI-E channels

Controllers (SAS or FC)

Dual port SAS or FC

Disk Array Enclosures (DAE)

Do not daisy chain (shared SAS/FC) until all channels are filled!

Server System

SAS

SAS

SAS

SAS


Slide12 l.jpg

Server System

CPU

CPU

CPU

CPU

IO HUB

IO HUB

PCI-E

PCI-E

PCI-E

PCI-E

HBA

HBA

HBA

HBA

FC

FC

FC

FC

FC

FC

FC

FC

HBA

HBA

HBA

HBA

HBA

HBA

HBA

HBA

FC

FC

FC

FC

FC

FC

FC

FC

SAN

SAN is really computer system(s)

Typically connected by FC to host and storage

Can be fault-tolerant in all components and paths: HBA, cables, switches, SP, disks

No special performance enhancements

Slight degradation (excessive layers)

Write cache is mirrored between SP’s

Really important!

Distribute load over all front-end and back-end FC ports

SAN

SP A

SP B


Direct attach san l.jpg

Direct Attach & SAN

  • Direct Attach

    • RAID Controller in Server

    • Fault-tolerant disks,

      • sometimes controller/path, 2-node clusters

  • SAN

    • Host Bus Adapter, (switches)

    • Service Processor

    • Full component and path fault tolerance

      • Multi-node clusters


San vendor view l.jpg

SAN Vendor View

SAN

Switch

DW-BI

DB

OLTP

DB

Email

Web

Share

Point

QA

DB

One immensely powerful SAN serving storage needs of all servers

Storage consolidation – centralize management and minimize unused space

Problem is: SAN is not immensely powerful

What happens if LUN for another server fails, and a restore from backup is initiated during busy hours


Proper view l.jpg

Storage

Storage

SAN

SAN

SAN

DW/BI

OLTP

Email

Share

point

File

Server

Proper View

Nothing should disrupt the operation of a line-of-business server

Data Warehouse is not be mixed with transaction processing DB

Consider multiple storage systems for very large IOPS loads instead of a single SAN


Storage systems l.jpg

Storage Systems

Direct

Attach

HP MSA 60, Dell MD 1000

DA High

Density

HP MSA 50, 70, Dell MD 1120

SAN

Entry

HP MSA 2000, (Dell MD 3000)

Mid range

EMC CLARiiON, HP EVA, NetApp FAS3100

Enterprise

EMC DMX, Hitachi, 3 PAR, FAS6000


Emc clariion l.jpg

EMC CLARiiON


Slide18 l.jpg

Multi-Core Processors

Multi-Core Processors

CPU Module

CPU Module

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Multi-core processors

Increased memory

64-bit FLARE

Up to 960 drives

= up to twice the performance, scale

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Memory

Memory

x8 CMI

Power Supply

IO Complex

IO Complex

Fibre Channel module

Fibre Channel module

Power Supply

Adaptive Cooling

Fibre Channel module

Fibre Channel module

= Energy efficiency

Fibre Channel module

Fibre Channel module

SPS

SPS

LCC

LCC

Fibre Channel module

Fibre Channel module

iSCSI module

iSCSI module

iSCSI module

iSCSI module

High-performance Flash drives

Low power SATA II drives

Virtual Provisioning

= Capacity optimization

Spin Down


Emc dmx l.jpg

EMC DMX


Cache l.jpg

Cache

  • If system memory is 128GB

    • What you expect to find in 16GB SAN cache

    • That is not in the buffer cache?

  • Performance benchmarks

    • Most use direct attach storage

    • With SAN: cache disabled

    • Alternative: tiny read cache, almost all to write


Complete environment summary l.jpg

Complete Environment Summary

  • Server System

    • Memory Bandwidth

    • IO bandwidth, port, PCI-E slots

  • Pipes/channels from Server to Storage

  • Storage System

    • RAID controller, etc

    • Pipes to disk drives

    • Disk drives

If system memory is 128GB, what you expect to find in the 16GB SAN cache that is not in the buffer cache?


Storage components l.jpg

Storage Components


Storage components interfaces l.jpg

Storage Components/Interfaces

  • System IO

  • Disk Drives

  • HBA and RAID Controller

    • SAS (3Gbit/s going to 6), FC (4Gbit/s to 8)

  • Storage Enclosures (DAE)

  • Disk Drives

  • SAN – Systems

  • SAN – Switches


Server systems pci e gen 1 l.jpg

Server Systems: PCI-E Gen 1

PCI-E Gen 1: 2.5Gbit/s per lane, bi-directional

  • Dell PowerEdge 2950 – 2 x8, 1 x4

  • Dell PowerEdge R900 – 4 x8, 3 x4 (shared)

  • HP ProLiant DL385G5p – 2 x8, 2 x4

  • HP ProLiant DL585G5 – 3 x8, 4 x4

  • HP ProLiant DL785G5 – 3 x16, 3 x8, 5 x4

Most PCI-E slots have dedicated bandwidth, some may be shared bandwidth (with expander chip)


Server systems pci e gen 2 l.jpg

Server Systems: PCI-E Gen 2

PCI-E Gen 2: 5.0Gb/s per lane

x4: 2 GB/sec in each direction

  • Dell PowerEdge R710 – 2 x8, 2 x4

  • Dell PowerEdge R910(?)

  • HP ProLiant DL370G6 – 2 x16, 2 x8, 6 x4

Intel 5520 chipset: 36 PCI-E Gen 2 lanes, 1 ESI (x4)

ProLiant ML/DL 370G6 has 2 5520 IOH devices


Disk drives l.jpg

Disk Drives

  • Rotational Speed – 7200, 10K, 15K

    • Average Rotational latency 4, 3, 2 milli-sec

  • Average Seek Time

    • 8.5, 4.7, 3.4ms (7200, 10K, 15K RPM)

    • 2.5 in 15K 2.9 ms avg. seek

  • Average Random Access Time

    • Rotational + Seek + Transfer + Overhead

  • Native Command Queuing


Disk interfaces l.jpg

Disk Interfaces

  • SATA – mostly 7200RPM

    • SATA disk can be used in SAS system

    • SATA Adapter cannot connect to SAS disk

  • SAS –15K

    • 3.5 in LFF, 2.5in SFF

    • Currently 3 Gbits/sec, next gen: 6 Gb/s

  • FC – typically in SAN

    • 4 Gbit/s, next: 8 Gbit/s


Disk drives 3 5in lff l.jpg

Disk Drives (3.5in, LFF)

65mm

84mm

95mm

7200RPM, 1TB

Barracuda 12:

8.5ms, 125MB/s

Barracuda LP

95MB/s (5900)

10,000RPM, 5ms

End of life?

15,000RPM, 3.4ms

146, 300, 450GB

167MB/sec

Lower RPM drives have higher bit density and larger platters contributing to very low $/GB.

Desktop rated for 2 years @ 20% duty cycle, server for 5 years @ 100%


Seagate drives l.jpg

Seagate Drives

Savvio 15K.2

Savvio 10K.3

Barracuda ES

15K.7

Cheetah 3.5in LFF drives

15K.2 2.9/3.3

15K.4 36/73/146GB3.5/4.0ms 95?

15K.5 73/146/300GB3.5/4.0ms 125-73

15K.6 146/300/450GB3.4/3.9ms 171-112MB/sec

15K.7300/450/600GB

Savvio 2.5 in SFF drives

15K.1 36/72GB2.9/3.3 ms112-79MB/sec

15K.2 73/146GB2.9/3.3 ms160-120MB/s


Dell powervault l.jpg

Dell PowerVault

  • Dell PowerVault MD 1000 – 15 3.5in

    • $7K for 15 x 146GB 15K drives

  • Dell PowerVault MD 1120 – 24 2.5in

    • $11K for 24 x 73GB 15K


Hp msa l.jpg

HP MSA

  • MSA 60: 12 LFF drives

  • MSA 70: 25 SFF drives


Direct attach cluster capable l.jpg

Direct Attach Cluster Capable

  • Dell PowerVault MD 3000 – 15 3.5in

    • 2 internal dual-port RAID controllers

    • $11.5K for 15 x 146G 15K drives

Listed as Direct Attach, but essentially an entry SAN


Pci e sas raid controllers l.jpg

PCI-E SAS RAID Controllers

  • First Generation

    • PCI-E host interface

    • PCI-X SAS controller

    • PCI-E to PCI-X bridge

      • 800MB/sec

  • Second Generation

    • Native PCI-E to SAS

      • 1.6GB/sec in x8 PCI-E, 2 x4 SAS ports


Fc hba l.jpg

FC HBA

  • QLogic QLE2562

    • Dual port 8Gbs FC, x8 PCI-E Gen 2

  • QLogic QLE 2462

    • Dual Port 4Gbs, x4 PCI-E Gen 1

  • Qlogic QLE 2464

    • Quad port FC, x8 PCI-E Gen 1

  • Emulex LPe12002

  • Emulex LPe11002/11004


Disk performance l.jpg

Disk Performance


Random io theory queue depth 1 l.jpg

Drive

Rotational

Latency

Avg

Seek

8KB

transfer

Total

milli-sec

IOPS

7200

4.17

8.5

0.06

12.7

78.6

10K

3.0

4.7

0.07

7.77

128.7

15K

2.0

3.4

0.05

5.45

183.6

15K SFF

2.0

2.9

0.05

4.95

202

Random IO Theory Queue Depth 1

IO rate based on data distributed over entire disk accessed at random, one IO command issued at a time

Not accounting for other delays


Other factors l.jpg

Other Factors

Short Stroke:

Data is distributed over a fraction of the entire disk

Average seek time is lower (track-to-track minimum)

Command Queuing:

More than one IO issued at a time,

Disk can reorder individual IO accesses, lowering access time per IO


8k random iops vs utilization l.jpg

8K Random IOPS vs Utilization

IOPS for range of Queue depth and space utilization


Latency versus queue depth l.jpg

Latency versus Queue Depth

Latency versus Queue depth for range of space utilization


Disk summary l.jpg

Disk Summary

  • Frequently cited rules for random IO

    • Applies to Queue Depth 1

    • Data spread across entire disk

  • Key Factor

    • Short-stroke

    • High-Queue Depth

  • SAN

    • Complex SAN may hide SS and HQ behavior


Sql server io patterns l.jpg

SQL Server IO Patterns


Sql server io l.jpg

SQL Server IO

  • Transactional queries

    • Read/Write

  • Reporting / DW queries

  • Checkpoints

  • T-Log backups

  • Differential/Full backups


Transactional query l.jpg

Transactional Query

  • Few rows involved

    • SELECT xx FROM Table WHERE Col1 = yy

  • Execution Plan has bookmark lookup or loop joins

  • IO for data not in buffer cache

    • 8KB, random

    • issued 1 at a time, serially (5ms min latency)

      • (up to around 24-26 rows)

      • Even if LUN has many disks, IO depth is 1!


Large query l.jpg

Large Query

  • Plan has bookmark lookup or loop join

    • Uses Scatter-Gather IO

      • More than (approximately) 30 rows

      • Depending on Standard or Enterprise Edition

      • Multiple IO issued with one call,

    • Generates high-queue depth

      • Query for 100 rows can run faster than 20!

High row count non-clustered index seek: Are key lookups really random. Build index with care. Only highly selective SARG in key.


Tempdb l.jpg

Tempdb

  • Large Query may to spool intermediate results to tempdb

  • Sequence of events is:

    • Read from data

    • Write to tempdb

    • Read from tempdb (sometimes)

    • Repeat

  • Disk load is not temporally uniform!

    • Data and tempdb should share common pool of Disks/LUNs


Checkpoint l.jpg

Checkpoint

  • Dirty data buffers written to disk

  • User does not wait on data write

  • SQL Server should throttle checkpoint writes

  • But high-queue depth of writes may result in high-latency reads


Log backup l.jpg

Log Backup

  • Disrupts sequential log writes


Update l.jpg

Update

  • Problem in SQL Server 2000

  • UPDATE uses non-clustered index

  • Plan does not factor in key lookups

  • Execution – fetch one row at a time

    • ~5-10ms per key lookup


Storage configuration examples l.jpg

Storage Configuration Examples


General strategy distribute io l.jpg

General Strategy – Distribute IO

  • Distribute IO across multiple PCI-E slots

  • Distribute IO across multiple HBA/Controllers

  • Distribute IO across many disk drives

    • Daisy chain DAE only after

  • High transaction (write) volume

    • Dedicate HBA/controller, SAN SP, disk drives for logs?


Lff or sff disks l.jpg

LFF or SFF disks

  • LFF 12-15 disks per enclosure

  • SFF 24-25 disks per enclosure

  • 15 disks on x4 SAS,

    • Total bandwidth: 800MB/s,

    • 53MB/s per disk

  • 24 disks on x4 SAS, 33MB/s


Minimum for line of business l.jpg

Minimum for Line-of-Business

2 x Xeon 5500 or 5400 series

64-72GB memory

4 SAS RAID Controllers

$11-13K

4 x 15 Disk Enclosures

60 146GB 15K drives

6TB capacity (3+1 RAID 5)

600GB database

3GB/sec sequential

30K IOPS short-stroke, peak

$28K

SQL Server Ent License

$50K

x4 or x8 PCI-E

12-15 disks per x4 SAS port

800-1000MB/sec bandwidth

SAN Option: 2 dual-port FC HBA

EMC CLARiiON CX2-240, 4 DAE


Intermediate l.jpg

Intermediate

4 x Xeon 7400 series

128GB memory

4 SAS RAID Controllers

$25K

6 LFF (3.5) Disk Enclosures

90 73GB 15K drives

9TB capacity (3+1 RAID 5)

900GB database

3GB/sec+ sequential

45K IOPS short-stroke, peak

$42K

SQL Server Ent License

$100K

x4 PCI-E

x8 PCI-E

x4 PCI-E

x8 PCI-E

1 DAE per controller in x4 PCI-E slots

2 DAE per controller in x8 PCI-E slots,

use both SAS ports, 1 DAE per x4 SAS

Daisy-chain DAE only for very high disks

SAN example: CLARiiON CX4-480, 3 dual-port,HBA

6 DAE


Sff disks for heavy random io l.jpg

SFF disks for Heavy Random IO

4 x Xeon 7400 series

128GB memory

4 SAS RAID Controllers

$25K

6 SFF (2.5in) Disk Enclosures

144 73GB 15K drives

7TB capacity (3+1 RAID 5)

700GB database

3GB/sec+ sequential

70K IOPS short-stroke, peak

$66K

SQL Server Ent License

$100K

x4 PCI-E

x8 PCI-E

x4 PCI-E

x8 PCI-E


Really serious dw l.jpg

Really Serious DW

8 x Opteron 8400 series

246GB memory

8 SAS RAID Controllers

$80K

Or Unisys, NEC, IBM

14 SFF (2.5in) Disk Enclosures

336 73GB 15K drives

16TB capacity (3+1 RAID 5)

7-9GB/sec+ sequential

1.6TB database, 160K IOPS peak

3.2TB, 130K IOPS peak

$154K

SQL Server Ent License

$200K

Need lots of IO bandwidth and slots, more than 4-way Xeon 7400 series with 7300 chipset can handle


San clariion example l.jpg

SAN – CLARiiON example

  • Minimum (disks)

    • CX4-240, 2 dual-port FC HBA, 4 DAE

  • Intermediate (120 disks)

    • CX4-480, 4 dual-port FC HBA, 8 DAE

  • High-bandwidth DW (240 disks)

    • CX4-960, 2 quad, 4 dual-port FC HBA, 16 DAE

  • Very high random IO (480 disks)

    • CX4-960, 2 quad, 4 dual-port HBA, 32 DAE


Storage performance verification l.jpg

Storage Performance Verification


What to test l.jpg

What To Test

  • Sequential

  • Random low queue, high queue

  • High row count Update with nonclustered index

  • Checkpoint writes

  • Full-stroke and Short-stroke


Cache settings l.jpg

Cache Settings

  • Read

    • Read-Ahead, Adaptive Read-Ahead, None

  • Write

    • Write Back, Write Through

  • Read – none or very small (2MB/LUN)

  • Write – Write-Back


San hba settings l.jpg

SAN - HBA Settings

  • NumberOfRequests

    • Default – 32? Prevents multiple hosts from overloading SAN

    • Match to number of disks to control queue depth?

  • MaxSGList


Slide61 l.jpg

SSD


Ssd types l.jpg

SSD Types

  • DRAM – fastest, most expensive

  • NVRAM

    • SLC – more expensive /GB, higher write

    • MLC - low cost per GB

  • Interfaces

    • SAS

    • PCI-E (Fusion-IO, 1GB/sec, 120K IOPS+)

    • Complete SAN (Texas Memory Systems)


Slide63 l.jpg

SSD

  • Intel X-25E, 32 & 64GB

  • Sequential Read 250MB/s, Write 170MB/s

  • Random Read: 35,000 IOPS @ 4KB

  • Random Write: 3,300 IOPS @ 4KB

    • Good but not spectacular

  • Latency: 75 us Read, 85 us Write

    • Really helpful for serial Queue Depth 1 accesses


Sql server io cost structure l.jpg

SQL Server IO Cost Structure

  • Key Lookup, Loop Join

    • 4-5 micro-sec in-memory

    • 15-25 us for 8K read from disk + eviction

    • 45 us for 64K read due to cold cache


Ssd and raid l.jpg

SSD and RAID

  • Does an SSD need to be in RAID

  • Disk drive is fundamentally is single device

    • Motor or media failure results in loss of drive

  • SSD is not required to be a single device

    • Composed of SoC, interfaces SAS to NVRAM

    • Dual SoC plus ECC w/chip kill could make SSD fault-tolerant


Additional slides l.jpg

Additional Slides


Partition alignment l.jpg

Partition Alignment

  • http://blogs.msdn.com/jimmymay/default.aspx

  • Misaligned Theory

  • With 64K stripe, warm cache, 8KB IO

    • on average every 8 random IO accesses will generate 10 actual IO, 25% gain

  • 64K stripe, cold cache, 64KB IO

    • Every disk access generates 2 IO, 100% gain


Raid theory l.jpg

RAID Theory

Theoretical performance per drive for N drives in a RAID group

RAID 5 write: 1 read data, 1 read parity, 1 write data, 1 write parity. Write penalty is reduced if entire stripe can be written


Emc clariion69 l.jpg

EMC CLARiiON


Netapp l.jpg

NetApp

  • Write Anywhere File Layout (WAFL)

  • Very different characteristics

  • Overrides many standard database strategies

  • No need to defragment

    • See NetApp specific documents

    • Index rebuild to clean up unused space may still be helpful


Enterprise san l.jpg

Enterprise SAN

  • Massive cross-bar

  • RAID groups

    • RAID 5 3+1 or 7+1, RAID 10 2+2 or 4+4

  • Hyper Volume: 16GB slices from RAID group

  • LUNS created from Hyper Volumes

Theory: Massive number disks, say 1000 disks, can do 150K IOPS. Each server averages 10K IOPS steady, with surges to 50K. Many servers can share large SAN


Table scan to disk l.jpg

Table Scan to Disk


Low queue writes l.jpg

Low Queue Writes

Read activity drops sharply during checkpoints

4 15K SCSI


Updates all data in memory l.jpg

Updates – All data in memory

Checkpoints does not slow SQL batch, no reads required


Hp test system 2 l.jpg

HP Test System 2

rx8620

rx8620

16 Itanium 2

16 Itanium 2

8 2Gb/s

8 2Gb/s

1.5GHz

1.5GHz

6 SCSI Disks

6 SCSI Disks

FC ports

FC ports

HSV110

HSV110

HSV110

HSV110

HSV110

HSV110

HSV110

HSV110


  • Login