1 / 29

Storage

First magnetic disks, the IBM 305 RAMAC (2 units shown) introduced in 1956. One platter shown top right. A RAMAC stored 5 million characters on 50 24-inch diameter platters. Two access arms moved up and down to select a disk, and in/out to select a track.

Download Presentation

Storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. First magnetic disks, the IBM 305 RAMAC (2 units shown) introduced in 1956. One platter shown top right. A RAMAC stored 5 million characters on 50 24-inch diameter platters. Two access arms moved up and down to select a disk, and in/out to select a track. Right: a variety of disk drives: 8”, 5.25”, 3.5”, 1.8” and 1”.

  2. Storage Anselmo Lastra

  3. Outline • Magnetic Disks • RAID • Advanced Dependability/Reliability/Availability • I/O Benchmarks, Performance and Dependability • Conclusion

  4. Disk Figure of Merit: Areal Density • Bits recorded along a track • Metric is Bits Per Inch (BPI) • Number of tracks per surface • Metric is Tracks Per Inch (TPI) • Disk metric is bit density per unit area • Metric is Bits Per Square Inch: Areal Density =BPI x TPI

  5. Historical Perspective • 1956 IBM Ramac — early 1970s Winchester • For mainframe computers, proprietary interfaces • Steady shrink in form factor: 27 in. to 14 in. • Form factor and capacity drives market more than performance • 1970s developments • 5.25 inch floppy disk form factor • Emergence of industry standard disk interfaces • Early 1980s: PCs and first generation workstations • Mid 1980s: Client/server computing • Centralized storage on file server • Disk downsizing: 8 inch to 5.25 • Mass market disk drives become a reality • industry standards: SCSI, IPI, IDE • 5.25 inch to 3.5 inch drives for PCs; End of proprietary interfaces • 1990s: Laptops => 2.5 inch drives • 2000s: 1.8” used in media players (1” microdrive didn’t do as well)

  6. Current Disks • Caches to hold recently accessed blocks • Microprocessor and command buffer to enable reordering of accesses

  7. Future Disk Size and Performance • Continued advance in capacity (60%/yr) and bandwidth (40%/yr) • Slow improvement in seek, rotation (8%/yr) • Time to read whole disk Year Sequentially Randomly (1 sector/seek) 1990 4 minutes 6 hours 2000 12 minutes 1 week(!) 2006 56 minutes 3 weeks (SCSI) 2006 171 minutes 7 weeks (SATA) • Cost has dropped by 100,000 since 1983

  8. Conventional: 4 disk designs 3.5” 5.25” 10” 14” High End Low End Disk Array: 1 disk design 3.5” Arrays of Small Disks? • Katz and Patterson asked in 1987: • Can smaller disks be used to close gap in performance between disks and CPUs?

  9. Advantages of Small Form Factor Disk Drives Low cost/MB High MB/volume High MB/watt Low cost/Actuator Cost and Environmental Efficiencies

  10. Replace Small Number of Large Disks with Large Number of Small Disks! (1988 Disks) IBM 3390K 20 GBytes 97 cu. ft. 3 KW 15 MB/s 600 I/Os/s 250 KHrs $250K x70 23 GBytes 11 cu. ft. 1 KW 120 MB/s 3900 IOs/s ??? Hrs $150K IBM 3.5" 0061 320 MBytes 0.1 cu. ft. 11 W 1.5 MB/s 55 I/Os/s 50 KHrs $2K Capacity Volume Power Data Rate I/O Rate MTTF Cost 9X 3X 8X 6X Disk Arrays have potential for large data and I/O rates, high MB per cu. ft., high MB per KW, but what about reliability?

  11. Array Reliability • Reliability of N disks = Reliability of 1 Disk ÷ N • 50,000 Hours ÷ 70 disks = 700 hours • Disk system MTTF: Drops from 6 years to 1 month! • • Arrays (without redundancy) too unreliable to be useful! Hot spares support reconstruction in parallel with access: very high media availability can be achieved

  12. Redundant Arrays of (Inexpensive) Disks • Files are "striped" across multiple disks • Redundancy yields high data availability • Availability: service still provided to user, even if some components failed • Disks will still fail • Contents reconstructed from data redundantly stored in the array  Capacity penalty to store redundant info  Bandwidth penalty to update redundant info

  13. RAID0 • Performance only • No redundancy • Stripe data to get higher bandwidth • Latency not improved

  14. Redundant Arrays of Inexpensive DisksRAID 1: Disk Mirroring/Shadowing recovery group • • Each disk is fully duplicated onto its “mirror” • Very high availability can be achieved • • Bandwidth sacrifice on write: • Logical write = two physical writes • • Reads may be optimized • • Most expensive solution: 100% capacity overhead • (RAID 2 not interesting, so skip)

  15. 10010011 11001101 10010011 . . . P 1 0 1 0 0 0 1 1 1 1 0 0 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0 0 1 1 0 1 logical record Striped physical records P contains sum of other disks per stripe mod 2 (“parity”) If disk fails, subtract P from sum of other disks to find missing information Redundant Array of Inexpensive Disks RAID 3: Parity Disk

  16. RAID 3 • Sum computed across recovery group to protect against hard disk failures, stored in P disk • Logically, a single high capacity, high transfer rate disk: good for large transfers • Wider arrays reduce capacity costs, but decreases availability • 33% capacity cost for parity if 3 data disks and 1 parity disk

  17. Inspiration for RAID 4 • RAID 3 relies on parity disk to discover errors on Read • But every sector has an error detection field • To catch errors on read, rely on error detection field vs. the parity disk • Allows independent reads to different disks simultaneously

  18. Stripe Redundant Arrays of Inexpensive Disks RAID 4: High I/O Rate Parity Increasing Logical Disk Address D0 D1 D2 D3 P Insides of 5 disks D7 P D4 D5 D6 D8 D9 D10 P D11 Example: small read D0 & D5, large write D12-D15 D12 D13 P D14 D15 D16 D17 D18 D19 P D20 D21 D22 D23 P . . . . . . . . . . . . . . . Disk Columns

  19. D0 D1 D2 D3 P D7 P D4 D5 D6 Inspiration for RAID 5 • RAID 4 works well for small reads • Small writes (write to one disk): • Option 1: read other data disks, create new sum and write to Parity Disk • Option 2: since P has old sum, compare old data to new data, add the difference to P (2 reads, 2 writes) • Small writes are limited by Parity Disk: Write to D0, D5 both also write to P disk

  20. Redundant Arrays of Inexpensive Disks RAID 5: High I/O Rate Interleaved Parity Increasing Logical Disk Addresses D0 D1 D2 D3 P Independent writes possible because of interleaved parity D4 D5 D6 P D7 D8 D9 P D10 D11 D12 P D13 D14 D15 Example: write to D0, D5 uses disks 0, 1, 3, 4 P D16 D17 D18 D19 D20 D21 D22 D23 P . . . . . . . . . . . . . . . Disk Columns

  21. Downside of Disk Arrays: Cost of Small Writes RAID-5: Small Write Algorithm 1 Logical Write = 2 Physical Reads + 2 Physical Writes D0 D1 D2 D0' D3 P old data new data old parity (1. Read) (2. Read) XOR + + XOR (3. Write) (4. Write) D0' D1 D2 D3 P'

  22. RAID 6: Recovering from 2 failures • Like the standard RAID schemes, it uses redundant space based on parity calculation per stripe • Idea is that operator may make mistake and swap wrong disk, or 2nd disk may fail while replacing 1st • Since it is protecting against a double failure, it adds two check blocks per stripe of data. • If p+1 disks total, p-1 disks have data; assume p=5

  23. Summary of RAID Techniques 1 0 0 1 0 0 1 1 1 0 0 1 0 0 1 1 • Disk Mirroring, Shadowing (RAID 1) Each disk is fully duplicated onto its "shadow" Logical write = two physical writes 100% capacity overhead 1 0 0 1 0 0 1 1 0 0 1 1 0 0 1 0 1 1 0 0 1 1 0 1 1 0 0 1 0 0 1 1 • Parity Data Bandwidth Array (RAID 3) Parity computed horizontally Logically a single high data bw disk • High I/O Rate Parity Array (RAID 5) Interleaved parity blocks Independent reads and writes Logical write = 2 reads + 2 writes

  24. HW Failures in Real Systems: Tertiary Disks • A cluster of 20 PCs in seven 7-foot high, 19-inch wide racks with 368 8.4 GB, 7200 RPM, 3.5-inch IBM disks. The PCs are P6-200MHz with 96 MB of DRAM each.

  25. Does Hardware Fail Fast? 4 disks that failed in Tertiary Disk Author says that almost all disk failures began as transient failures. The operator had to decide when to replace.

  26. Internet Archive • Good section in text about Internet Archive • In 2006, over a petabyte of disk (1015 bytes) • Growing at 20 terabytes (1012) per month • Now says ~ 3PB • Each PC was 1GHz Via, 512MB, dissipates 80 W • Each node had 4 500 GB drives • 40 nodes/rack • Petabyte takes 12 racks • PC cost $500, each disk $375, 40-port Ethernet switch $3000

  27. Capricorn PS now • AMD Athlon 64 (x2) • 4 SATA disks (1-4TB) • 92 Watts/node • 40 nodes/rack • So 160 TB/rack • 24 kW/petabyte • So 576 kWh/day, ~17,000/mo • Ave. U.S. house used 920 kWh a month in 2006 • Best housed in KY?

  28. Drives Today • Just ordered simple RAID enclosure (just levels 0, 1) • $63 • Two 1TB SATA drives • $85/ea

  29. Summary • Disks: Areal Density now 30%/yr vs. 100%/yr in 2000s • Components often fail slowly • Real systems: problems in maintenance, operation as well as hardware, software

More Related