Erasure Correcting Codes for Highly Available Storage

Erasure Correcting CodesforHighly Available Storage Thomas Schwarz, S.J.

Error Control Codes • Use redundancy to correct errors • Designed for • Ease of Encoding • Decoding (Calculation of syndrome / location of error) • Error Correction Power (Burst Errors / Low Redundancy)

Error Control Codes Block Codes: Information Symbols + Parity Symbols (i1i2 i3 i4 i5 i 6 i7 i8 p1 p2 p3)

Error Control Codes Typical Applications: Communication: Deep Space “A match made in heaven” Telephone Computer Networks Streaming Audio, Video (CD, DVD) Storage (Main Memory, Magnetic & Optical Devices)

Error Correcting Codes Most applications use hardware implemented encoding and decoding.

Erasure Correcting Codes Protect against erasure of data. Simplest Erasure Correcting Code: Parity i1 i2 i3 i4 i5 i6 i7 i8 p where p = i1i2 i3 i4  i5 i6 i7 i8

Erasure Correcting Codes Some applications implement encoding and decoding in hardware (e.g. RAIDs). Software implementation is much more feasible because of the simpler decoding problem.

Erasure Correcting Codes Ideal Properties: • Systematic: Data is stored explicitly. Data updates do not change other data. • MDS: Only as much parity data is created as is necessary to reconstruct maximum level of failures • Simple encoding and decoding.

Parity Based Codes Only use parity of data (XOR operation) for ease of coding and decoding.

Parity Based Codes History: Protection for Multitrack Magnetic Recording. Prusinkiewicz & Budkowski 1976: X X X X X X X X X X Parity 1 X X X X X X X X X X Data 1 X X X X X X X X X X Data 2 X X X X X X X X X X Data 3 X X X X X X X X X X Parity 2 Horizontal and diagonal parity.

Parity Based Codes Extend the scheme by using lines of different slopes. Patel 1985: horizontal + 2 diagonals (slopes 0,1,-1) However, the code is optimal only if the data band is infinite. If not, there is (slightly) more parity than data.

Parity Based Array Codes Idea: Break up data into m symbols. Arrange the symbols in columns. Use horizontal and vertical lines to calculate parity. 1st column: horizontal parity, 2nd column: vertical parity

Parity Based Array Codes But is it not so simple! Is a legitimate code word.

Parity Based Array Codes But indistinguishable from the zero code word after failure of columns 1 and 3.

Parity Based Array Codes Number of Data Columns needs to be prime.

EvenOdd • Better version of array codes for two parity • Code words two-dimensional m-1 by m arrays with two additional parity columns

EvenOdd The EvenOdd code has as code words the m-1 by m+2 array of symbols ai,jsuch that

EvenOdd Encoding Set m=5. Start with an arbitrary 4 by 5 data array.

EvenOdd Encoding Fill in the horizontal parity lines: and calculate S to be a3,1+a2,2+a1,3+a0,4 S=0+1+0+0 = 1.

EvenOdd Encoding

EvenOdd Decoding Assume that the last two data columns have failed.

EvenOdd Decoding Use the parity columns to calculate S.

EvenOdd Decoding Use S=1 and the magenta diagonal to find the data symbol in the last column.

EvenOdd Decoding Then use the horizontal parity for one more symbol.

EvenOdd Decoding The blue diagonal now can be exploited.

EvenOdd EvenOdd requires m is a prime. Hence, for a given number n of data lines, choose m to be the smallest prime  n. Set the superfluous data columns to zero:

EvenOdd Encoding and Decoding only uses XOR operations. Given formulae suggests an iterative procedure, but the equations can be easily expanded to calculate the symbols in parallel.

Higher Array Codes There exists array codes using only XOR operations that can correct up to m erasures. The decoding process involves solution of a linear equation.

Algebraic Block Codes Interpret symbols (larger than bits) as elements of a Galois Field. Calculate parity symbols as linear combinations of the data symbols.

Galois Fields Only GF(2f) for simplicity’s sake. Elements: Bit strings of length f. Addition: XOR Multiplication: Much more complicated.

Galois Field Multiplication For GF(28). Elements are bytes. Method 1: Identify byte with a binary polynomial. E.g. (0100 1001) = x6+x3+1 Multiply to polynomials as polynomials modulo a generator polynomial. E.g. modulo 1 0001 1101 = x8+x4+x3+x2+1.

Galois Field Multiplication Combination of XORs and shifts!

Galois Field Multiplication This multiplication gives a field structure to GF(2f). Multiplicative group is cyclic: There are elements  such that all nonzero elements can be written as i , i=0,1 … 2f-1.

Galois Field Multiplication For each non-zero element x GF(2f) define log(x)=i iff i=x. Define antilog(i) = i Calculate xy = antilog(log(x)+log(y)); if x0y = 0; if x=0 or y=0.

Galois Field Multiplication Can be implemented with two tables, two zero comparisons, four additions three memory accesses. 9 elementary operations in a processor with sufficient L1 cache to store 3*(2f –1) entries.

Linear Erasure Correcting Block Codes m data symbols u = (u0,u1,u2…um-1) u0 u0’ u0’’ u0’’’ . . . u1 u1’ u1’’ u1’’’ . . . u2 u2’ u2’’ u2’’’ . . . u3 u3’ u3’’ u3’’’ . . . Code Word u’’ Bucket 0 Bucket 3

Linear Erasure Correcting Block Codes Add k=n – m parity symbols for code word a u0 u0’ u0’’ u0’’’ . . . u1 u1’ u1’’ u1’’’ . . . u2 u2’ u2’’ u2’’’ . . . u3 u3’ u3’’ u3’’’ . . . p0 p0’ p0’’ p0’’’ . . . pk-1 pk -1’ pk -1’’ pk-1’’’ . . . Parity Bucket k-1 Bucket 0 Bucket 3

Linear Erasure Correcting Block Codes Calculate the parity symbols as a linear combination of the data symbols: With “Generator Matrix” G.

Properties of a Good Generator Matrix • Systematic: Left m by m matrix is identity matrix. • MDS: All matrices formed from m different columns of G are invertible. Thus: Any m coordinates of code word a suffice to calculate data word u.

Generation of Generator Matrices • Find the largest rectangular matrix with MDS property. • Multiply from left with the inverse of the matrix formed by the first m columns. Result is still MDS and now systematic.

Large MDS Matrices • There are known families of matrices with the MDS property: • Cauchy m+n = 2f • Vandermonde n=2f–1 • Twice extended Vandermonde n =2f+1

Vandermonde Matrix

Vandermonde Generator Matrix

Vandermonde Generator Matrix • Write column m as a linear combination of the first m columns. • Multiply column i (i=0,1,…m – 1) with this coefficient (non-zero according to Cramer’s Rule. (This preserves MDS.) • Multiply with A-1, where A is the matrix consisting of columns 0 to m – 1.

Vandermonde Generator Matrix

RS Erasure Correcting Codes • The generator matrix is that of a twice extended, generalized Reed-Solomon code. • Large number of parity symbols: If symbols are bytes, then code length is 257.

RS Erasure Correcting Codes Encoding: Generation of a parity symbol costs: m multiplications with known coefficients m-1 XOR operation 7m-1 elementary operations

RS Erasure Correcting Codes Change of one data symbol in a data word: Calculate the difference d = uinew – uinew. Send d to the site maintaining the parity symbol. Multiply with coefficient gi,l of G. Add to existing parity. 7 elementary operations per parity site. 1 elementary operation at data site. 1 message.

RS Erasure Correcting Codes Erasure Correction: Typical cases: • Parity site has failed. Regenerate parity from the data sites. • Data site has failed. Use column m to regenerate the data from the other data sites and the XOR stored at this first parity site.

RS Erasure Correcting Codes Erasure Correction General Case: • Collect m survivors among data and parity sites • Invert the matrix consisting of the corresponding columns of G • Each replacement site uses this matrix and G in order to calculate a decoding matrix H

Erasure Correcting Codes for Highly Available Storage

Erasure Correcting Codes for Highly Available Storage

Presentation Transcript

StarFish: highly-available block storage

Megastore: Providing Scalable Highly Available Storage for Interactive Services.

Error correcting codes

StarFish: highly-available block storage

XORing Elephants: Novel Erasure Codes for Big Data

StarFish : highly-available block storage

Error Correcting Codes

Highly Available Cloud Storage Azure and S3

Megastore: Providing Scalable, Highly Available Storage for Interactive Services .

StarFish : highly-available block storage

Erasure Codes for Reliable Communication Protocols

Error-Correcting Codes for TLC Flash

Error Correcting Codes

Error correcting codes

Erasure Correcting Codes

Erasure Codes for Reading and Writing

Error Correcting Codes

Error correcting codes

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

Developing a Highly Available Tivoli Storage Manager Solution

Error Correcting Codes