Failure Correction Techniques for Large Disk Array

Failure Correction Techniques for Large Disk Array Garth A. Gibson, Lisa Hellerstein et al. University of California at Berkeley

What is the problem? • Disk arrays can increase I/O bandwidth and access parallelism • The chance of data loss increases with the increasing number of disk arrays Figure 1. The mean time to data loss (MTTDL) in a single-erasure-correcting array.

Types of data failure • Transient or noise-related errors: Correct by repeating the offending operation or by applying per sector error-correction facilities • Media defects: detect and mask at the factory • Catastrophic failures -- Head crashes or failures of the read/write or controller electronics

The goal of this paper • Avoid loss of user data • Recover the catastrophic disk failures • Make disk arrays as reliable as an individual disk

Concept 1 -- erasure-correcting codes and error-correcting codes • Erasure-correcting codes are designed to recover erased bits in a message word • An unreadable bit is called an erasure • The position of the erased bits are known • For a catastrophic disk failure, the bits on a failed disk can be designated as “unreadable” • Error-correcting codes are designed to correct messages in which some of the bits may have been flipped, but the positions of those bits are unknown.

Concept 2 -- Redundancy Metrics • Disk as stack of bits -ith.bit in each disk forms the ith.Codeword in the redundancy encoding • Mean time to data loss (MTTDL): measure of reliability • Check disk overhead: check disks/data disks • Update penalty: number of check disks to be updated • Group size: the information and check disk that must be accessed during the reconstruction of a failed disk form a group

1d - Parity • Single-erasure-correction scheme • For G data disks, one check disk with parity of all G disks. • Overhead: 1/G • Update penalty: 1 • Group size: G+1 G = 4

2d - Parity • Double-erasure-correction scheme • G2 data disks arranged in 2-dimensional array • For each row and each column, one check disk • stores parity for that row or column • Check disk Overhead: • 2G/G2 =2/G • Update penalty: 2 • Group size = G+1 G = 4

N-dimensional parity (Nd-parity) • N-erasure-correction scheme • Check disk overhead: NG(N-1) / GN = N/G • Update penalty: N • Group size: G+1

Linear Codes Contain the original information unmodified within each codeword and compute the check bits of each codeword as the parity of subsets of the information bits Codeword = 1 1 1 1 Parity

Parity Check Matrix H = [P | I] Fig. 4 How to compute the check parity bit? H*X = 0 First row of H = [100101 100] X = [111010 x1 x2 x3] P I H*X = 1+0+0+0+0+0+x1+0+0 = 0 x1=1

Parity Check Matrix for 1d-parity and 2d-parity Fig. 5

Properties of the parity check matrix • Express in terms of a parameter, t, whose value is between 0 and c • H will allow any t erasures to be corrected • H will allow any t errors to be detected • The minimum number of bits in which any two codewords differ, known as the distance of the code, is at least t+1 • Any set of t column selected from will be linearly independent

Implementing Reconstruction 0 1000 0 0110 0 0000 0 0001 0 0000 0 0100 0 1000 1 0011 Fig. 6(a).When 4 disks fail in a 16 information disk 2d-parity array, the controllers allow us to identify which disks need to be repaired and reconstructed.

Implementing Reconstruction cont. Fig. 6(b) Apply “elementary row operations” (the essence of Gaussian elimination) to find a matrix M, such that the product MB has the 4*4 identity matrix in its first four rows.

Elementary operation Example: x + y + z = 0 (1) x - 2y + 2z = 4 (2) x + 2y - z = 2 (3) (3) - (1) to replace (3) x + y + z = 0 (1) x - 2y + 2z = 4 (2) y - 2z = 0 (4) (2)-(1) to replace (2) x + y + z = 0 (1) - 3y +z = 4 (5) y - 2z = 0 (4) (5)+(4)*3 to replace (4) x + y + z = 0 (1) - 3y +z = 4 (5) - 5z= 10 (6) result: x=4, y=-2, z=-2 • If we interchange two equation, the new system is still equivalent to the old one. • If we multiply an equation with a nonzero number, the new system is still equivalent to the old one. • Replacing one equation with the sum of two equation, we obtain an equivalent system

Gaussian Elimination augmented matrix: 1 1 1 0 1 -2 2 4 1 2 -1 2 (3) - (1) to replace (3) 1 1 1 0 1 -2 2 4 0 1 -2 2 (2)-(1) to replace (2) 1 1 1 0 1 -3 1 4 0 1 -2 2 (5)+(4)*3 to replace (4) 1 1 1 0 0 -3 1 4 0 0 -5 10 Definition: Using elementary operation, in every step the new matrix was exactly the augmented matrix associated to the new system. Once we obtain a triangular matrix, write the associated linear system and then solve it. Example: x + y + z = 0 (1) x - 2y + 2z = 4 (2) x + 2y - z = 2 (3) The linear equation : x + y + z = 0 - 3y +z = 4 - 5z= 10

Implementing Reconstruction cont. 012 34567 89 15 11 10 0 0000000 10000000 01 00 0 0100010 00000100 01 01 1 0100010 01000100 00 00 0 0000111 00010000 10 01 0 1000100 00001000 Fig. 6 (C) The first 4 rows of MA describe the operations that must be performed to reconstruct our 4 disks.

The position for codes with t-erasure-correction • Be implemented in software • Run in an I/O processor • Software learns of failures directly from disk controllers

Conclusion • Implement the redundancy codes for disk arrays • Minimize the number of check disks that must be updated whenever an information disk is updated • Improve the reliability of disk arrays

Question • What is codeword for redundancy disk? • List three redundancy metrics • What are 1d-parity and 2d-parity schemes? • What mathematical operation to be used for recovering failed disk?

Failure Correction Techniques for Large Disk Array

Failure Correction Techniques for Large Disk Array

Presentation Transcript

“Moving From Failure to Correction”

Atacama Large Millimetre / Submillimetre Array

The Expanded Very Large Array

The Expanded Very Large Array

Failure trends in a Large Disk Drive Population

Failure Trends in a Large Disk Drive Population

VLA (Very Large Array)

The Expanded Very Large Array

Disk Array Performance Estimation

D0 Disk Array Replacement on d0ora2

Atacama Large Millimeter Array Update

LARGE LATINAMERICAN MILLIMETER ARRAY (“LLAMA”)

Large Array Astrophysics Detectors (I)

Very Large Array data

Optical Disk Maker Array

SolidWorks Large Assembly Techniques

From Failure to Correction

DS800-G25 Disk Array Pre-Sales Training