Erasure codes for reading and writing
Download
1 / 17

Erasure Codes for Reading and Writing - PowerPoint PPT Presentation


  • 178 Views
  • Uploaded on

Erasure Codes for Reading and Writing. Mario Vodisek ( joint work with AG Schindelhauer). Agenda. Erasure (Resilient) Codes in storage networks The Read-Write-Coding-System A Lower Bound and Perfect Codes Requirements and Techniques. Erasure (Resilient) Coding.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Erasure Codes for Reading and Writing ' - fedora


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Erasure codes for reading and writing

Erasure Codes for Reading and Writing

Mario Vodisek

( joint work with AG Schindelhauer)


Agenda
Agenda

  • Erasure (Resilient) Codes in storage networks

  • The Read-Write-Coding-System

    • A Lower Bound and Perfect Codes

    • Requirements and Techniques


Erasure resilient coding
Erasure (Resilient) Coding

  • n-symbol message x with symbols from alphabet 

  • m-symbol encoding y with symbols from  (m > n)

  • erasure coding provides mapping: n!m such that

    • reading any n·r < m symbols of y are sufficient for recovery

    • (mostly: r = n )optimal for reading)

  • advantages:

    • bm-rcerasures can be tolerated

    • storage overhead is a factor of

  • Generally, erasure codes are used to guarantee information recovery for data transmission over unreliable channels (RS-, Turbo-, LT-Codes, …)

  • Lots of research in code properties such as

    • scalability

    • encoding/decoding speed-up

    • rateless-ness

  • Attractive also to storage networks: downloads (P2P) and fault-tolerance

coding


Erasure codes for storage area networks
Erasure Codes for Storage (Area) Networks

  • SANs require

  • high system availability

    • disks fail or be blocked (probability $ size)

  • efficient modification handling

    • Slow devices ) expensive I/O-operations

  • Properties:

  • a fixed set E of existing errors can be considered at encoding time

  • E can have changed to E‘ at decoding time

  • Additional requirements to erasure codes:

  • tolerate some certain number of erasures

  • ensure modification of codeword even if erasures occur

  • consider E at encoding time and E‘ at decoding time

Network


The read write coding system
The Read-Write-Coding-System

  • An (n, r, w, m)b-Read-Write-Coding System (RWC) is defined as follows:

  • The base b : b-symbol alphabet b as the set of all used items

  • n 1 blocks of information x1, …, xnb

  • mn code blocks y1, …, ym b

  • any nrmcode words sufficient to read the information

  • any nwmcode words sufficient to change the information by 1, …, n

  • (In the language of Coding Theory) : given m, n, r, w, our RW-Codes provide:

  • a (linear) code of dimension n and block length m such that for n·r,w·m:

    • the minimum distance of the code is at least m-r+1

    • any two codewords y1, y2 are within a distance of at most w from another

    • distance(x, y):=|{1· i·m: xiyi }|

m, r, w

n

coding


A lower bound for rw codes
A Lower Bound for RW-Codes

n

m

|S| = WR {n, n-1}

w

r

Theorem: For r+w<n+mand any base b there does not exist any (n, r, w, m)b-RWC

system !

Proof:

  • We know: nr,wm

  • Assume: r = w = nmn+1

  • Write and subsequent read

Index Sets (W, R):

  • |W| = w

  • |R| = r

Assume: |S| = n

  • there are bn possible change vectors to be encoded by `write` into S; only basis

    for reading with r= n (notice: R\S code words remain unchanged)

Assume: |S| <n = n-1

at most bn-1 possible change vectors for S can be encoded by `write`

´read´ will produce faulty output


Codes at lower bound perfect codes
Codes at Lower Bound: Perfect Codes

  • In the best case (n, r, w, m)b-RWC have parameters r + w = n + m (perfect Codes)

  • Unfortunately, perfect RWC do not always exist !!

    • E.g. there is no (1, 2, 2, 3)2-RWC but there exists a (1, 2, 2, 3)3-RWC !

  • But: all perfect RW-Codes exist if the alphabet is sufficiently large !

  • Notice to RAID:

  • Definition of parity RAID (RAID 4/5) corresponds to an (n, n, n+1, n+1)2-RWC

  • From the lower bounds it follows: there is no (n, n, n, n+1)2-RWC

  • ) there is no RAID-system with improved access properties !


The model operations
The Model: Operations

  • Given:

  • X=x1,…, xnthe n-symbol information vectorover a finite alphabet .

  • Y=y1,…, yn the m-symbol code over 

  • b=||.

  • P(M) : the power set of M, Pk(M):={S2 P(M): |S|=k}

  • Define [m]:={1,…,m}

  • An (n, r, w, m)b-RWC-system consists of the following operations:

  • Inital state:X02n, Y02m

  • Read function: f: Pr([m]) £r!m

  • Write function: g: Pr([m]) £r£Pw([m]) £n!w

  • Differential write function: : Pw([m]) £n!w


Initialization compute the encoding y 0
Initialization: Compute the Encoding Y0

  • RW-Codes are closely related to Reed-Solomon-Codes !

Given (in general):

  • the information vector X= x1, …, xnb

  • the encoded vector Y= y1, …, ynb

  • internal variables V = v1, …, vk for k =m-w = r-n, with no particular information

  • set of functions M=M1,…,Mn for encoding

  • Compute yi from X and V by function Mi ; define Mi as linear combination of X and V

  • yi = Mi(x1,…,xn,v1,…,vk) = j=1nxjMi,j + l=1k vlMi,l

  • ( Define M as some m£r matrix; Mi as rows. It follows: M(XV = Y )


The matrix approach n r w m b rwc
The Matrix Approach: (n, r, w, m)b-RWC

Consider:

  • the information vector X= x1, …, xnb

  • the encoded vector Y= y1, …, ymb

  • internal slack variables V = v1, …, vk for k =m-w = r-n

  • Further:

  • an mr generator matrix M: Mi,jb

  • the submatrix (Mi,j)i [m], j{n+1, …, r} is called the variable matrix

=


Efficient encoding b f b finite fields
Efficient Encoding: b = F[b] (Finite Fields)

  • RWC requires efficient arithmetic on elements of b for encoding

    • )set b = F[b] (finite field with b elements (formerly: GF(b)))

  • b = pn for some prime number p and integer n) F[pn] always exists

  • Computation of binary words of length v: b = 2v, F[2v] = {0,…,2v-1}

  • Features:

  • F[b] is closed under addition, multiplication

    • )exact computation on field elements )not more than v bits for representiation of results

  • Addition, subtraction via XOR (avoids rounding, no carryover)

  • Multiplication, division via mapping tables (analogous to logarithm tables for real numbers)

    • T : table mapping an integer to its logarithm in F[2v]

    • IT: table mapping an integer to its inverse logarithm in F[2v]

    • )multiplication, division by

      • adding/subtracting the logs

      • taking the inverse log


The vandermonde matrix
The Vandermonde Matrix

  • Consider M as m£r Vandermonde matrix Mi,j = ji-1:

  • X, Y, V 2 F[b]

  • Mi,j2 F[b] and all elements are different

  • The Vandermonde matrix is non-singular ) invertible

  • Any k‘ £k‘ submatix M‘ is also invertible

=

Consider: each device i in the SAN corresponds to a row of M and element yi


Reading or recovery
Reading (or Recovery)

  • Read: Given any r code entries from Y, compute X

  • Rearrange rows of M and Y such that first r entries of Y are available

    • (any r rows of M are linear independent in a Vandermonde matrix)

  • M!M‘ and Y!Y‘

  • The first r rows of M‘ describe an invertible r£r matrix M‘‘

  • X is computed by: (X | V)T = (M‘‘)-1Y

(X | V)

M

Y

r

M‘

Y‘

m


Differential write
Differential Write

  • Given:

    • The change vector =1,…,n and w code entries from Y

    • X‘ = X +is new information vector ) change X without reading

    • entries (XOR)

    • Compute the difference for the w code entries of Y

  • Further:

    • Only choices w < r make sense

    • Rearrange m£r matrix M and Y as follows: y1,…,yw (denote M‘ and Y‘)

    • k = r-n (slack vector V)


Differential write con t
Differential Write (con‘t)

n+1…r

n

  • Define following sub-matrices:

    • MÃ" = (M‘i,j)i2[w], j2[n]

    • M"! = (M’i,j)i2[w], j2{n+1,…,r}

    • MÃ# = (M’i,j)i2{w+1,…,m}, j2[n]

    • M#! = (M’i,j)i2{w+1,…,m}, j2{n+1,…,r}

w

M"!

MÃ"

M#!

MÃ#

w+1…m

  • M#!is k£k = m-w£r-n matrix) M#! invertible

  • The vector Y can then be updated by a vector =1,…,w:

    • = ((MÃ") – (M"!)(M#!)-1(MÃ#)) ¢


Differential write proof
Differential Write: Proof

  • Use:

  • Vector  = 1,…,kthe change of vector V

  • Vector  = 1,…,w the change of vector Y

  • X’ =X+

  • V’ =V+

  • Y’ =Y+

Correctness follows by combining:

+

M

+ M

=

= M

= M

This equation is equivalent to:

(M#!) + (MÃ#) = 0,

(MÃ") + (M"!) = 

Since  is given,  is obained as follows:

 = (M#!)-1(-MÃ#) ¢

M"!

MÃ"

M#!

MÃ#


Thank you for your attention

Thank you for your attention!

Heinz Nixdorf Institute

& Computer Science Institute

University of Paderborn

Fürstenallee 11

33102 Paderborn, Germany

Tel.: +49 (0) 52 51/60 64 51

Fax: +49 (0) 52 51/62 64 82

E-Mail: [email protected]

http://www.upb.de/cs/ag-madh


ad