ece 6160 advanced computer networks introduction to storage devices n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
ECE 6160: Advanced Computer Networks Introduction to Storage Devices PowerPoint Presentation
Download Presentation
ECE 6160: Advanced Computer Networks Introduction to Storage Devices

Loading in 2 Seconds...

play fullscreen
1 / 51

ECE 6160: Advanced Computer Networks Introduction to Storage Devices - PowerPoint PPT Presentation


  • 612 Views
  • Uploaded on

ECE 6160: Advanced Computer Networks Introduction to Storage Devices. Instructor: Dr. Xubin (Ben) He Email: Hexb@tntech.edu Tel: 931-372-3462. Prev… . Gigabit Ethernet MAC PHY VLAN. Anatomy: Key components for a typical computer. Motivation: Who Cares About I/O?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ECE 6160: Advanced Computer Networks Introduction to Storage Devices' - jaden


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ece 6160 advanced computer networks introduction to storage devices

ECE 6160: Advanced Computer NetworksIntroduction to Storage Devices

Instructor: Dr. Xubin (Ben) He

Email: Hexb@tntech.edu

Tel: 931-372-3462

slide2
Prev…
  • Gigabit Ethernet
    • MAC
    • PHY
  • VLAN

ECE6160:Advanced Computer Networks

anatomy key components for a typical computer
Anatomy: Key components for a typical computer

ECE6160:Advanced Computer Networks

motivation who cares about i o
Motivation: Who Cares About I/O?
  • CPU Performance: 60% per year
  • I/O system performance limited by mechanical delays (disk I/O)

< 10% per year (IO per sec)

  • Amdahl's Law: system speed-up limited by the slowest part!

10% IO & 10x CPU => 5x Performance (lose 50%)

10% IO & 100x CPU => 10x Performance (lose 90%)

  • I/O bottleneck:

Diminishing fraction of time in CPU

Diminishing value of faster CPUs

ECE6160:Advanced Computer Networks

i o systems
I/O Systems

interrupts

Processor

Cache

Memory - I/O Bus

Main

Memory

I/O

Controller

I/O

Controller

I/O

Controller

Graphics

Disk

Disk

Network

ECE6160:Advanced Computer Networks

storage technology drivers
Storage Technology Drivers
  • Driven by the prevailing computing paradigm
    • 1950s: migration from batch to on-line processing
    • 1990s: migration to ubiquitous computing
      • computers in phones, books, cars, video cameras, …
      • nationwide fiber optical network with wireless tails
  • Effects on storage industry:
    • Embedded storage
      • smaller, cheaper, more reliable, lower power
    • Data utilities
      • high capacity, hierarchically managed storage

ECE6160:Advanced Computer Networks

outline
Outline
  • Disk Basics/Disk History
  • Current Disk options
  • Disk fallacies and performance
  • FLASH
  • Tapes
  • RAID

ECE6160:Advanced Computer Networks

disk device terminology

Inner

Track

Outer

Track

Sector

Head

Arm

Platter

Actuator

Disk Device Terminology
  • Several platters, with information recorded magnetically on both surfaces (usually)
  • Bits recorded in tracks, which in turn divided into sectors (e.g., 512 Bytes)
  • Actuator moves head (end of arm,1/surface) over track (“seek”), select surface, wait for sector rotate under head, then read or write
    • “Cylinder”: all tracks under heads
tracks sectors and cylinders
Tracks, Sectors, and Cylinders

A cylinder is comprised of the set of tracks described by all the heads (on separate platters) at a single seek position

ECE6160:Advanced Computer Networks

photo of disk head arm actuator

{

Platters (12)

Photo of Disk Head, Arm, Actuator

Spindle

Arm

Head

Actuator

ECE6160:Advanced Computer Networks

source: www.seagate.com

disk device performance
Disk Device Performance
  • Disk Latency = Seek Time + Rotation Time + Transfer Time + Controller Overhead
  • Seek Time? depends on tracks move arm, seek speed of disk
  • Rotation Time? depends on speed disk rotates, how far sector is from head
  • Transfer Time? depends on data rate (bandwidth) of disk (bit density), size of request

Inner

Track

Outer

Track

Sector

Head

Controller

Arm

Spindle

Platter

Actuator

ECE6160:Advanced Computer Networks

disk device performance1
Disk Device Performance
  • Average rotation time:
    • Average distance sector from head?
    • 1/2 time of a rotation
      • 10000 Revolutions Per Minute  166.67 Rev/sec
      • 1 revolution = 1/ 166.67 sec  6.00 milliseconds
      • 1/2 rotation (revolution)  3.00 ms
  • Average seek time?
    • Sum all possible seek distances from all possible tracks / # possible
      • Assumes average seek distance is random
    • Disk industry standard benchmark
    • About 10ms (ATA) and 5ms (SCSI)

ECE6160:Advanced Computer Networks

data rate inner vs outer tracks
Data Rate: Inner vs. Outer Tracks
  • To keep things simple, orginally kept same number of sectors per track
    • Since outer track longer, lower bits per inch
  • Competition  decided to keep BPI the same for all tracks (“constant bit density”)

 More capacity per disk

 More of sectors per track towards edge

 Since disk spins at constant speed, outer tracks have faster data rate

  • Bandwidth outer track 1.7X inner track!
    • Inner track highest density, outer track lowest, so not really constant
    • 2.1X length of track outer / inner, 1.7X bits outer / inner

ECE6160:Advanced Computer Networks

disk performance model trends
Disk Performance Model /Trends
  • Capacity

+ 100%/year (2X / 1.0 yrs)

  • Transfer rate (BW)

+ 40%/year (2X / 2.0 yrs)

  • Rotation + Seek time

– 8%/ year (1/2 in 10 yrs)

  • MB/$

> 100%/year (2X / 1.0 yrs)

60GB/$300 => $5/GB =>0.5c/MB

ECE6160:Advanced Computer Networks

state of the art barracuda 180

Latency =

Queuing Time +

Controller time +

Seek Time + Rotation Time +

Size / Bandwidth

{

per access

+

per byte

State of the Art: Barracuda 180
  • 181.6 GB, 3.5 inch disk
  • 12 platters, 24 surfaces
  • 24,247 cylinders
  • 7,200 RPM;
  • 7.4/8.2 ms avg. seek (r/w)
  • 64 to 35 MB/s (internal)
  • 0.1 ms controller time
  • 10.3 watts (idle)

Track

Sector

Cylinder

Track Buffer

Arm

Platter

Head

Q:Calculate time to read 64 KB (128 sectors) for Barracuda 180, sector is on outer track.

source: www.seagate.com

ECE6160:Advanced Computer Networks

disk performance example
Disk Performance Example
  • Calculate time to read 64 KB (128 sectors) for Barracuda 180 X using advertised performance; sector is on outer track

Disk latency = average seek time + average rotational delay + transfer time + controller overhead

= 7.4 ms + 0.5 * 1/(7200 RPM) + 64 KB / (64 MB/s) + 0.1 ms

= 7.4 ms + 0.5 /(7200 RPM/(60000ms/M)) + 64 KB / (64 KB/ms) + 0.1 ms

= 7.4 + 4.2 + 1.0 + 0.1 ms = 12.7 ms

ECE6160:Advanced Computer Networks

areal density linear density x track density
Areal Density: linear density x track density
  • Bits recorded along a track
    • Metric is Bits Per Inch (BPI) along a track
  • Number of tracks per surface
    • Metric is Tracks Per Inch (TPI)
  • Disk Designs Brag about bit density per unit area
    • Metric is Bits Per Square Inch
    • Called Areal Density
    • Areal Density =BPI x TPI

ECE6160:Advanced Computer Networks

areal density
Areal Density
  • Areal Density =BPI x TPI
  • Change slope 30%/yr to 60%/yr about 1991

ECE6160:Advanced Computer Networks

historical perspective
Historical Perspective
  • 1956 IBM Ramac — early 1970s Winchester
    • Developed for mainframe computers, proprietary interfaces
    • Steady shrink in form factor: 27 in. to 14 in
  • Form factor and capacity drives market, more than performance
  • 1970s: Mainframes  14 inch diameter disks
  • 1980s: Minicomputers,Servers  8”,5 1/4” diameter
  • PCs, workstations Late 1980s/Early 1990s:
    • Mass market disk drives become a reality
      • industry standards: SCSI, IPI, IDE
    • Pizzabox PCs  3.5 inch diameter disks
    • Laptops, notebooks  2.5 inch disks
    • Palmtops didn’t use disks, so 1.8 inch diameter disks didn’t make it
  • 2000s:
    • 1 inch for cameras, cell phones?

ECE6160:Advanced Computer Networks

disk history
Disk History

Data

density

Mbit/sq. in.

Capacity of

Unit Shown

Megabytes

1973:

1. 7 Mbit/sq. in

140 MBytes

1979:

7. 7 Mbit/sq. in

2,300 MBytes

source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even more data into even smaller spaces”

disk history1
Disk History

1989:

63 Mbit/sq. in

60,000 MBytes

1997:

1450 Mbit/sq. in

2300 MBytes

1997:

3090 Mbit/sq. in

8100 MBytes

source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even mroe data into even smaller spaces”

1 inch disk drive
1 inch disk drive!
  • 2003 IBM MicroDrive:
    • 1.7” x 1.4” x 0.2”
    • 1 GB, 3600 RPM, 13.3 MB/s, 15 ms seek
    • Digital camera, PalmPC?
  • 2003 USB mini flash drive
    • 75mmx28mmx12mm, 17g
    • 256MB
    • 0.5-1MB/s
  • 2006 MicroDrive?
    • 9 GB, 50 MB/s!
      • Assuming it finds a niche in a successful product
      • Assuming past trends continue
disk characteristics in 2003

Seagate Cheetah

3146807LW

Hitachi TravelStar 7K60

IBM Microdrive (07N5574)

Dimension

3.5”, 1.5 lbs

2.5”, 0.2 lbs

1.7”, 0.6 lbs

Interface

Ultra 320 SCSI

ATA-100 (Ultra)

CF+

Capacity

146.8GB

60GB

1GB

Data Transfer Rate

320MB/s

100MB/s

13.3MB/s

Average seek time

4.7ms

10ms

15ms

Spindle Speed

10000RPM

7200RPM

3600RPM

Buffer

8MB

8MB

128KB

Price (warehouse.com)

$899.95

$299.95

$319.95

Disk Characteristics in 2003
disk characteristics in 2007

Seagate Cheetah

ST3146707LW

Hitachi TravelStar 7K60

Seagate Barracuda ES

ST3750640NS

Dimension

3.5”

2.5”, 0.2 lbs

3.5”

Interface

Ultra 320 SCSI

ATA-100 (Ultra)

Serial ATA-300

Capacity

146.8GB

60GB

750GB

Data Transfer Rate

320MB/s

100MB/s

Average seek time

4.7ms

10ms

Spindle Speed

10000RPM

7200RPM

7200RPM

Buffer

8MB

8MB

16MB

Price (warehouse.com)

$259

$93.99 (80GB)

$319.95

Disk Characteristics in 2007
today s hard drive 09 29 2008 warehouse com
Today’s Hard Drive (09/29/2008)(warehouse.com)
  • Seagate Barracuda 7200.11 - hard drive - 750 GB - SATA-300 Hard drive - 750 GB - internal - 3.5" - SATA-300 - 7200 rpm - buffer: 32 MB, $116
  • Seagate FreeAgent Pro - hard drive - 500 GB - FireWire/Hi-Speed USB/eSATA Hard drive – 500 GB – external – 3.5” – USB 2.0, eSATA-300, FireWire interfaces – 7200 rpm – buffer: 32 MB, $100

ECE6160:Advanced Computer Networks

fallacy use data sheet average seek time
Fallacy: Use Data Sheet “Average Seek” Time
  • Manufacturers needed standard for fair comparison (“benchmark”)
    • Calculate all seeks from all tracks, divide by number of seeks => “average”
  • Real average would be based on how data laid out on disk, where seek in real applications, then measure performance
    • Usually, tend to seek to tracks nearby, not to random track
  • Rule of Thumb: observed average seek time is typically about 1/4 to 1/3 of quoted seek time (i.e., 3X-4X faster)
    • Barracuda 180 X avg. seek: 7.4 ms  2.5 ms

ECE6160:Advanced Computer Networks

fallacy use data sheet transfer rate
Fallacy: Use Data Sheet Transfer Rate
  • Manufacturers quote the speed off the data rate off the surface of the disk.
  • Sectors contain an error detection and correction field (can be 20% of sector size) plus sector number as well as data
  • There are gaps between sectors on track
  • Rule of Thumb: disks deliver about 3/4 of internal media rate (1.3X slower) for data
  • For example, Barracuda 180X quotes 64 to 35 MB/sec internal media rate

 47 to 26 MB/sec external data rate (74%)

ECE6160:Advanced Computer Networks

disk performance example revision
Disk Performance Example: revision
  • Calculate time to read 64 KB for UltraStar 72 again, this time using 1/3 quoted seek time, 3/4 of internal outer track bandwidth; (12.7 ms before)

Disk latency = average seek time + average rotational delay + transfer time + controller overhead

= (0.33 * 7.4 ms) + 0.5 * 1/(7200 RPM) + 64 KB / (0.75 * 65 MB/s) + 0.1 ms

= 2.5 ms + 0.5 /(7200 RPM/(60000ms/M)) + 64 KB / (47 KB/ms) + 0.1 ms

= 2.5 + 4.2 + 1.4 + 0.1 ms = 8.2 ms (64% of 12.7)

ECE6160:Advanced Computer Networks

future disk size and performance
Future Disk Size and Performance
  • Continued advance in capacity (60%/yr) and bandwidth (40%/yr).
  • Slow improvement in seek, rotation (8%/yr).
  • Time to read whole disk.

Year Sequentially Randomly (1 sector/seek)

1990 4 minutes 6 hours

2000 12 minutes 1 week(!)

  • 3.5” form factor make sense in 5 yrs?
    • What is capacity, bandwidth, seek time, RPM?
    • Assume today 80 GB, 100 MB/sec, 6 ms, 10000 RPM
      • 838GB, 537MB/s, 4ms, 15000RPM

ECE6160:Advanced Computer Networks

what about flash
What about FLASH
  • Compact Flash Cards
    • Intel Strata Flash
      • 16 Mb in 1 square cm. (.6 mm thick)
    • 100,000 write/erase cycles.
    • Standby current = 100uA, write = 45mA
    • Compact Flash 256MB~=$120 512MB~=$542
    • Transfer @ 3.5MB/s
  • IBM Microdrive 1G~370
    • Standby current = 20mA, write = 250mA
    • Efficiency advertised in wats/MB
  • VS. Disks
    • Nearly instant standby wake-up time
    • Random access to data stored
    • Tolerant to shock and vibration (1000G of operating shock)

ECE6160:Advanced Computer Networks

disk operations
Disk Operations
  • Seek: move head to track;
  • Rotation: wait for sector under head;
  • Transfer:Move data to/from disks;
  • Overhead:
    • Controller delay
    • Queuing delay

Access time = seek time + rotational delay + transfer time+overhead

ECE6160:Advanced Computer Networks

improving disk performance
Improving disk performance.
  • Use large sectors to improve bandwidth
  • Use track caches and read ahead;
    • Read entire track into on-controller cache
    • Exploit locality (improves both latency and BW)
  • Design file systems to maximize locality
    • Allocate files sequentially on disks (exploit track cache)
    • Locate similar files in same cylinder (reduce seeks)
    • Locate simlar files in near-by cylinders (reduce seek distance)
  • Pack bits closer together to improve transfer rate and density.
  • Use a collection of small disks to form a large, high performance one--->disk array

Stripping data across multiple disks to allow parallel I/O, thus improving performance.

ECE6160:Advanced Computer Networks

use arrays of small disks
Use Arrays of Small Disks?
  • Katz and Patterson asked in 1987:
    • Can smaller disks be used to close gap in performance between disks and CPUs?

Conventional: 4 disk designs

3.5”

5.25”

10”

14”

High End

Low End

Disk Array: 1 disk design

3.5”

ECE6160:Advanced Computer Networks

replace small number of large disks with large number of small disks 1988 disks
Replace Small Number of Large Disks with Large Number of Small Disks! (1988 Disks)

IBM 3390K

20 GBytes

97 cu. ft.

3 KW

15 MB/s

600 I/Os/s

250 KHrs

$250K

x70

23 GBytes

11 cu. ft.

1 KW

110 MB/s

3900 IOs/s

??? Hrs

$150K

IBM 3.5" 0061

320 MBytes

0.1 cu. ft.

11 W

1.5 MB/s

55 I/Os/s

50 KHrs

$2K

Capacity

Volume

Power

Data Rate

I/O Rate

MTTF

Cost

9X

3X

8X

6X

Disk Arrays have potential for large data and I/O rates, high MB per cu. ft., high MB per KW, but what about reliability?

array reliability
Array Reliability
  • MTTF: Mean Time To Failure: average time that a non - repairable component will operate before experiencing failure.
  • Reliability of N disks = Reliability of 1 Disk ÷N
    • 50,000 Hours ÷ 70 disks = 700 hours
  • Disk system MTTF: Drops from 6 years to 1 month!
  • Arrays without redundancy too unreliable to be useful!
  • Solution: redundancy.

ECE6160:Advanced Computer Networks

redundant arrays of inexpensive disks
Redundant Arrays of (Inexpensive) Disks
  • Replicate data over several disks so that no data will be lost if one disk fails.
  • Redundancy yields high data availability
    • Availability: service still provided to user, even if some components failed
  • Disks will still fail
  • Contents reconstructed from data redundantly stored in the array

 Capacity penalty to store redundant info

 Bandwidth penalty to update redundant info

ECE6160:Advanced Computer Networks

levels of raid
Levels of RAID
  • Original RAID paper described five categories (RAID levels 1-5). (Patterson et al, “A case for redundant arrays of inexpensive disks (RAID)”, ACM SIGMOD, 1988)
  • Disk striping with no redundant now is called RAID0 or JBOD(Just a bunch of disks).
  • Other kinds have been proposed in literature,

Level 6 (P+Q Redundancy), Level 10, RAID53, etc.

  • Except RAID0, all the RAID levels trade disk capacity for reliability, and the extra reliability makes parallism a practical way to improve performance.

ECE6160:Advanced Computer Networks

raid 0 nonredundant jbod

file data

block 0

block 1

block 2

block 3

Disk 1

Disk 0

Disk 2

Disk 3

RAID 0: Nonredundant (JBOD)
  • High I/O performance.
    • Data is not save redundantly.
    • Single copy of data is striped across multiple disks.
  • Low cost.
    • Lack of redundancy.
  • Least reliable: single disk failure leads to data loss.

ECE6160:Advanced Computer Networks

raid 0 striped disk array without fault tolerance
RAID 0: Striped Disk Array without Fault Tolerance

RAID Level 0 requires a minimum of 2 drives to implement

ECE6160:Advanced Computer Networks

redundant arrays of inexpensive disks raid 1 disk mirroring shadowing
Redundant Arrays of Inexpensive DisksRAID 1: Disk Mirroring/Shadowing

recovery

group

  • • Each disk is fully duplicated onto its “mirror”
  • Very high availability can be achieved
  • • Bandwidth sacrifice on write:
  • Logical write = two physical writes
    • • Reads may be optimized, minimize the queue and disk search time
  • • Most expensive solution: 100% capacity overhead

Targeted for high I/O rate , high availability environments

slide42

RAID 1: Mirroring and Duplexing

For Highest performance, the controller must be able to perform two concurrent separate Reads per mirrored pair or two duplicate Writes per mirrored pair.

RAID Level 1 requires a minimum of 2 drives to implement

ECE6160:Advanced Computer Networks

raid 4 block interleaved parity

block 0

block 1

block 2

block 3

P(0-3)

block 4

block 5

block 6

block 7

P(4-7)

block 9

block 10

block 11

block 8

P(8-11)

block 13

block 15

block 14

P(12-15)

block 12

RAID 4: Block Interleaved Parity
  • Blocks: striping units
  • Allow for parallel access by multiple I/O requests, high I/O rates
  • Doing multiple small reads is now faster than before. (allows small read requests to be restricted to a single disk).
  • Large writes(full stripe), update the parity:
  • P’ = d0’ + d1’ + d2’ + d3’;
  • Small writes(eg. write on d0), update the parity:
  • P = d0 + d1 + d2 + d3
  • P’ = d0’ + d1 + d2 + d3 = P + d0’ + d0;
  • However, writes are still very slow since the parity
  • disk is the bottleneck.
slide44

RAID 4: Independent Data disks with shared Parity disk

Each entire block is written onto a data disk. Parity for same rank blocks is generated on Writes, recorded on the parity disk and checked on Reads.

RAID Level 4 requires a minimum of 3 drives to implement

ECE6160:Advanced Computer Networks

inspiration for raid 5

D0

D1

D2

D3

P

D7

P

D4

D5

D6

Inspiration for RAID 5
  • RAID 4 works well for small reads
  • Small writes (write to one disk):
    • Option 1: read other data disks, create new sum and write to Parity Disk
    • Option 2: since P has old sum, compare old data to new data, add the difference to P
  • Small writes are limited by Parity Disk: Write to D0, D5 both also write to P disk. Parity disk must be updated for every write operation!
redundant arrays of inexpensive disks raid 5 high i o rate interleaved parity
Redundant Arrays of Inexpensive Disks RAID 5: High I/O Rate Interleaved Parity

Increasing

Logical

Disk

Addresses

D0

D1

D2

D3

P

Independent writes

possible because of

interleaved parity

D4

D5

D6

P

D7

D8

D9

P

D10

D11

D12

P

D13

D14

D15

Example: write to D0, D5 uses disks 0, 1, 3, 4

P

D16

D17

D18

D19

D20

D21

D22

D23

P

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Disk Columns

raid 5 block interleaved distributed parity

block 0

block 2

block 3

P(0-3)

block 1

block 5

block 4

block 6

P(4-7)

block 7

block 9

block 10

block 11

P(8-11)

block 8

block 12

P(12-15)

block 13

block 14

block 15

P(16-19)

block 16

block 17

block 18

block 19

RAID 5: Block Interleaved Distributed-Parity

Left Symmetric Distribution

  • Parity disk = (block number/4) mod 5
  • Eliminate the parity disk bottleneck of RAID 4
  • Best small read, large read and large write performance
  • Can correct any single self-identifying failure
  • Small logical writes take two physical reads and two
  • physical writes.
  • Recovering needs reading all nonfailed disks
slide48

RAID 5: Independent Data disks with distributed parity blocks

Each entire data block is written on a data disk; parity for blocks in the same rank is generated on Writes, recorded in a distributed location and checked on Reads.

RAID Level 5 requires a minimum of 3 drives to implement

ECE6160:Advanced Computer Networks

other raids
Other RAIDs
  • HP AutoRAID
  • AFRAID
  • RAPID
  • SwiftRAID
  • TickerTAIP
  • SMDA
  • ……

ECE6160:Advanced Computer Networks

berkeley history raid i
Berkeley History: RAID-I
  • RAID-I (1989)
    • Consisted of a Sun 4/280 workstation with 128 MB of DRAM, four dual-string SCSI controllers, 28 5.25-inch SCSI disks and specialized disk striping software
  • Today RAID is $19 billion dollar industry, 80% nonPC disks sold in RAIDs

ECE6160:Advanced Computer Networks