OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

OMSE 510: Computing Foundations2: Disks, Buses, DRAM Portland State University/OMSE

Outline of Comp. Architecture Outline of the rest of the computer architecture section: Start with a description of computer devices, work back towards the CPU.

Computer Architecture Is … the attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation. Amdahl, Blaaw, and Brooks, 1964 SOFTWARE

Today • Begin Computer Architecture • Disk Drives • The Bus • Memory

System Bus Computer System (Idealized) Disk Memory CPU Disk Controller

I/O Device Examples Device Behavior Partner Data Rate (KB/sec) Keyboard Input Human 0.01 Mouse Input Human 0.02 Line Printer Output Human 1.00 Floppy disk Storage Machine 50.00 Laser Printer Output Human 100.00 Optical Disk Storage Machine 500.00 Magnetic Disk Storage Machine 5,000.00 Network-LAN Input or Output Machine 20 – 1,000.00 Graphics Display Output Human 30,000.00

A Device: The Disk Disk Drives! - eg. Your hard disk drive - Where files are physically stored - Long-term non-volatile storage device

Magnetic Drum

Spiral Format for Compact Disk

A Device: The Disk Magnetic Disks - Your hard disk drive - Where files are physically stored - Long-term non-volatile storage device

A Magnetic Disk with Three Platters

Organization of a Disk Platter with a 1:2 Interleave Factor

Disk Physical Characteristics • Platters • 1 to 20 with diameters from 1.3 to 8 inches (Recording on both sides) • Tracks • 2500 to 5000 Tracks/inch • Cylinders • all tracks in the same position in the platters • Sectors • 128-256 sectors/track with gaps and info related to sectors between them (typical sector, 256-512 bytes)

Disk Physical Characteristics • Trend as of 2005: • Constant bit density (105 bits/inch) • Ie. More info (sectors) on outer tracks • Strangely enough, history reverses itself • Originally, disks were constant bit density (more efficient) • Then, went to uniform #sectors/track (simpler, allowed easier optimization) • Returning now to constant bit density • Disk capacity follows Moore’s law: doubles every 18 months

Example: Seagate Barracuda • Disk for server • 10 disks hence 20 surfaces • 7500 cylinders, hence 7500*20 = 150000 total tracks • 237 sectors/track (average) • 512 bytes/sector • Total capacity: 150000 * 237 * 512 = 18,201,600,000 bytes = 18 GB

Things to consider • Addressing modes: • Computers always refer to data in blocks (512bytes common) • How to address blocks? • Old school: CHS (Cylinder-Head-Sector) • Computer has an idea how the drive is structured • New School: LBA (Large Block Addressing) • Linear!

Disk Performance • Steps to read from disk: • CPU tells drive controller “need data from this address” • Drive decodes instruction • Move read head over desired cylinder/track (seek) • Wait for desired sector to rotate under read head • Read the data as it goes under drive head

Disk Performance • Components of disk performance: • Seek time (to move the arm on the right cylinder) • Rotation time (on average ½ rotation) (time to find the right sector) • Transfer time depends on rotation time • Disk controller time. Overhead to perform an access

Disk Performance • So Disk Latency = Queuing Time + Controller time + Seek time + Rotation time + Transfer time

Seek Time • From 0 (if arm already positioned) to a maximum 15-20 ms • Note: This is not a linear function of distance (speedup + coast + slowdown + settle) • Even when reading tracks on the same cylinder, there is a minimal seek time (due to severe tolerances for head positioning) • Barracuda example: Average seek time = 8 ms, track to track seek time = 1 ms, full disk seek = 17ms

Rotation time • Rotation time: • Seagate Barracuda: 7200 RPM • (Disks these days are 3600, 4800, 5400, 7200 up to 10800 RPM) • 7200 RPM = 120 RPS = 8.33ms per rotation • Average rotational latency = ½ worst case rotational latency = 4.17ms

Transfer time • Transfer time depends on rotation time, amount of data to transfer (minimum one sector), recording density, disk/memory connection • These days, transfer time around 2MB/s to 16MB/s

Disk Controller Overhead • Disk controller contains a microprocessor + buffer memory + possibly a cache (for disk sectors) • Overhead to perform an access (of the order of 1ms) • Receiving orders from CPU and interpreting them • Managing the transfer between disk and memory (eg. Managing the DMA) • Transfer rate between disk and controller is smaller than between disk and memory, hence: • Need for a buffer in controller • This buffer might take the form of a cache (mostly for read-ahead and write-behind)

Disk Time Example • Disk Parameters: • Transfer size is 8K bytes • Advertised average seek is 12 ms • Disk spins at 7200 RPM • Transfer rate is 4MB/s • Controller overhead is 2ms • Assume disk is idle so no queuing delay • What is Average disk time for a sector? avg seek + avg rot delay + transfer time + controller overhead ____ + _____ + _____ + _____

Disk Time Example • Answer: 20ms • But! Advertised seek time assumes no locality: typically ¼ to 1/3rd advertised seek time! • 20ms->12ms • Locality is an effect of smart placement of data by the operating system

My Disk • Hitachi Travelstar 7K100 60GB ATA-6 2.5in 7200RPM Mobile Hard Drive w/8MB Buffer Interface: ATA-6 Capacity (GB)1: 60 Sector size (bytes): 512 Data heads: 3 Disks: 2 Performance Data buffer (MB): 8 Rotational speed (rpm): 7,200 Latency (average ms): 4.2 Media transfer rate (Mbits/sec): 561 Max.interface transfer rate (MB/sec): 100 Ultra DMA mode-5 16.6 PIO mode-4Command Overhead: 1ms Seek time (ms): Average: 10 R / 11 W Track to track: 1 R / 1.2 W Full stroke:18 R / 19 W Sectors per Track: 414-792Max.areal density (Gbits/sq.inch): 66 Disk to buffer data transfer: 267-629 Mb/s Buffer-host data transfer: 100 MB/s

Some other quotes Hard Drives: Notebook: Toshiba MK8026GAX 80GB, 2.5", 9.5mm, 5400 RPM, 12ms seek, 100MB/s Desktop: Seagate 250GB, 7200RPM, SATA II, 9-11ms seek Buffer to host: 300MB/s Buffer to disk: 93MB/s Server: Seagate Raptor SATA, 10000RPM, SATA Buffer to host: 150MB/s Buffer to disk: 72MB/s

Next Topic • Disk Arrays • RAID!

Technology Trends Disk Capacity now doubles every 18 months; before 1990 every 36 months • • Today: Processing Power Doubles Every 18 months • • Today: Memory Size Doubles Every 18 months(4X/3yr) • • Today: Disk Capacity Doubles Every 18 months • • Disk Positioning Rate (Seek + Rotate) Doubles Every Ten Years! • Caches in Memory and Device Controllers to Close the Gap The I/O GAP

Manufacturing Advantages of Disk Arrays Disk Product Families Conventional: 4 disk designs 14” 3.5” 5.25” 10” High End Low End Disk Array: 1 disk design 3.5”

Small # of Large Disks  Large # of Small Disks! IBM 3390 (K) 20 GBytes 97 cu. ft. 3 KW 15 MB/s 600 I/Os/s 250 KHrs $250K IBM 3.5" 0061 320 MBytes 0.1 cu. ft. 11 W 1.5 MB/s 55 I/Os/s 50 KHrs $2K 3.5”x70 23 GBytes 11 cu. ft. 1 KW 120 MB/s 3900 IOs/s ??? Hrs $150K Data Capacity Volume Power Data Rate I/O Rate MTTF Cost large data and I/O rates high MB per cu. ft., high MB per KW reliability? Disk Arrays have potential for

Array Reliability • Reliability of N disks = Reliability of 1 Disk ÷ N • 50,000 Hours ÷ 70 disks = 700 hours • Disk system MTTF: Drops from 6 years to 1 month! • • Arrays (without redundancy) too unreliable to be useful! Hot spares support reconstruction in parallel with access: very high media availability can be achieved

Media Bandwidth/Latency Demands • Bandwidth requirements • High quality video • Digital data = (30 frames/s) × (640 x 480 pixels) × (24-b color/pixel) = 221 Mb/s (27.625 MB/s) • High quality audio • Digital data = (44,100 audio samples/s) × (16-b audio samples) × (2 audio channels for stereo) = 1.4 Mb/s (0.175 MB/s) • Compression reduces the bandwidth requirements considerably • Latency issues • How sensitive is your eye (ear) to variations in video (audio) rates? • How can you ensure a constant rate of delivery? • How important is synchronizing the audio and video streams? • 15 to 20 ms early to 30 to 40 ms late tolerable

Dependability, Reliability, Availability • Reliability – a measure of the reliability measured by the mean time to failure (MTTF). Service interruption is measured by mean time to repair (MTTR) • Availability – a measure of service accomplishment Availability = MTTF/(MTTF + MTTR) • To increase MTTF, either improve the quality of the components or design the system to continue operating in the presence of faulty components • Fault avoidance: preventing fault occurrence by construction • Fault tolerance: using redundancy to correct or bypass faulty components (hardware) • Fault detection versus fault correction • Permanent faults versus transient faults

RAIDs: Disk Arrays Redundant Array of Inexpensive Disks • Arrays of small and inexpensive disks • Increase potential throughput by having many disk drives • Data is spread over multiple disks • Multiple accesses are made to several disks at a time • Reliability is lower than a single disk • But availability can be improved by adding redundant disks (RAID) • Lost information can be reconstructed from redundant information • MTTR: mean time to repair is in the order of hours • MTTF: mean time to failure of disks is tens of years

RAID: Level 0 (No Redundancy; Striping) S0,b0 S0,b1 S0,b2 S0,b3 sector number bit number • Multiple smaller disks as opposed to one big disk • Spreading the data over multiple disks – striping – forces accesses to several disks in parallel increasing the performance • Four times the throughput for a 4 disk system • Same cost as one big disk – assuming 4 small disks cost the same as one big disk • No redundancy, so what if one disk fails? • Failure of one or more disks is more likely as the number of disks in the system increases

RAID: Level 1 (Redundancy via Mirroring) S0,b0 S0,b1 S0,b2 S0,b3 S0,b0 S0,b1 S0,b2 S0,b3 redundant (check) data • Uses twice as many disks as RAID 0 (e.g., 8 smaller disks with second set of 4 duplicating the first set) so there are always two copies of the data • Still four times the throughput • # redundant disks = # of data disks so twice the cost of one big disk • writes have to be made to both sets of disks, so writes would be only 1/2 the performance of RAID 0 • What if one disk fails? • If a disk fails, the system just goes to the “mirror” for the data

RAID: Level 2 (Redundancy via ECC) S0,b0 S0,b1 S0,b2 S0,b3 1 0 1 0 1 0 1 0 3 5 6 7 4 2 1 ECC disks ECC disks 4 and 2 point to either data disk 6 or 7, but ECC disk 1 says disk 7 is okay, so disk 6 must be in error • ECC disks contain the parity of data on a set of distinct overlapping disks • Still four times the throughput • # redundant disks = log (total # of disks) so almost twice the cost of one big disk • writes require computing parity to write to the ECC disks • reads require reading ECC disk and confirming parity • Can tolerate limited disk failure, since the data can be reconstructed

RAID: Level 3 (Bit-Interleaved Parity) S0,b0 S0,b1 S0,b2 S0,b3 1 0 1 0 1 disk fails parity disk • Cost of higher availability is reduced to 1/N where N is the number of disks in a protection group • Still four times the throughput • # redundant disks = 1 × # of protection groups • writes require writing the new data to the data disk as well as computing the parity, meaning reading the other disks, so that the parity disk can be updated • Can tolerate limited disk failure, since the data can be reconstructed • reads require reading all the operational data disks as well as the parity disk to calculate the missing data that was stored on the failed disk

RAID: Level 4 (Block-Interleaved Parity) parity disk • Cost of higher availability still only 1/N but the parity is stored as blocks associated with a set of data blocks • Still four times the throughput • # redundant disks = 1 × # of protection groups • Supports “small reads” and “small writes” (reads and writes that go to just one (or a few) data disk in a protection group) • by watching which bits change when writing new information, need only to change the corresponding bits on the parity disk • the parity disk must be updated on every write, so it is a bottleneck for back-to-back writes • Can tolerate limited disk failure, since the data can be reconstructed

New data D0 D1 D2 D3 P D0 D1 D2 D3 P Block Writes • RAID 3 block writes New data D0 D1 D2 D3 P  5 writes involving all the disks D0 D1 D2 D3 P • RAID 4 small writes  2 reads and 2 writes involving just two disks 

RAID: Level 5 (Distributed Block-Interleaved Parity) • Cost of higher availability still only 1/N but the parity is spread throughout all the disks so there is no single bottleneck for writes • Still four times the throughput • # redundant disks = 1 × # of protection groups • Supports “small reads” and “small writes” (reads and writes that go to just one (or a few) data disk in a protection group) • Allows multiple simultaneous writes as long as the accompanying parity blocks are not located on the same disk • Can tolerate limited disk failure, since the data can be reconstructed

Problems of Disk Arrays: Block Writes RAID-5: Small Write Algorithm 1 Logical Write = 2 Physical Reads + 2 Physical Writes D0 D1 D2 D0' D3 P old data new data old parity (1. Read) (2. Read) XOR + + XOR (3. Write) (4. Write) D0' D1 D2 D3 P'

Distributing Parity Blocks RAID 4 RAID 5 0 1 2 3 P0 0 1 2 3 P0 4 5 6 P1 7 4 5 6 7 P1 8 9 10 11 P2 8 9 P2 10 11 12 P3 13 14 15 12 13 14 15 P3 • By distributing parity blocks to all disks, some small writes can be performed in parallel

Disks Summary • Four components of disk access time: • Seek Time: advertised to be 3 to 14 ms but lower in real systems • Rotational Latency: 5.6 ms at 5400 RPM and 2.0 ms at 15000 RPM • Transfer Time: 10 to 80 MB/s • Controller Time: typically less than .2 ms • RAIDS can be used to improve availability • RAID 0 and RAID 5 – widely used in servers, one estimate is that 80% of disks in servers are RAIDs • RAID 1 (mirroring) – EMC, Tandem, IBM • RAID 3 – Storage Concepts • RAID 4 – Network Appliance • RAIDS have enough redundancy to allow continuous operation

System Bus Computer System (Idealized) Disk Memory CPU Disk Controller

Next Topic • Buses

Processor Input Control Memory Datapath Output What is a bus? A Bus Is: • shared communication link • single set of wires used to connect multiple subsystems • A Bus is also a fundamental tool for composing large, complex systems • systematic means of abstraction

Bridge Based Bus Arch-itecture • Bridging with dual Pentium II Xeon processors on Slot 2. • (Source: http://www.intel.com.)

Buses

OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Presentation Transcript

Computing at UF

Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More

Component-based Computing implications for Application Architectures

Pervasive and Mobile Computing: A 3-tier Architecture

Secondary Storage Devices: Magnetic Disks Optical Disks Floppy Disks Magnetic Tapes

Introduction to Scientific Computing

DBMS Storage and Indexing

Optical Computing

Distributed Computing and Analysis

NEW TESTAMENT FOUNDATIONS NT 102

Protostellar Disks: Birth, Life and Death

Soft Computing

OMSE 510: Computing Foundations 8: The Address Space

RAFT FOUNDATIONS MAT FOUNDATION

Understanding and Overcoming Challenges of DRAM Refresh