1 / 57

Recovering Data in Presence of Malicious Errors

Recovering Data in Presence of Malicious Errors. Atri Rudra University at Buffalo, SUNY. The setup. C(x). x. y = C(x)+error. Mapping C Error-correcting code or just code Encoding: x  C(x) Decoding: y  X C(x) is a codeword. x. Give up. Codes are useful!. Deep-space

floria
Download Presentation

Recovering Data in Presence of Malicious Errors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

  2. The setup C(x) x y = C(x)+error • Mapping C • Error-correcting code or just code • Encoding: xC(x) • Decoding: yX • C(x) is a codeword x Give up

  3. Codes are useful! Deep-space communication Satellite Broadcast Internet Cellphones ECC Memory RAID CDs/DVDs Paper Bar-codes

  4. 1 1 1 0 0 1 0 0 0 0 1 1 Redundancy vs. Error-correction • Repetition code: Repeat every bit say 100 times • Good error correcting properties • Too much redundancy • Parity code: Add a parity bit • Minimum amount of redundancy • Bad error correcting properties • Two errors go completely undetected • Neither of these codes are satisfactory

  5. Two main challenges in coding theory • Problem with parity example • Messages mapped to codewords which do not differ in many places • Need to pick a lot of codewords that differ a lot from each other • Efficient decoding • Naive algorithm: check received word with all codewords

  6. The fundamental tradeoff • Correct as many errors as possible with as little redundancy as possible • This talk: Answer is yes Can one achieve the “optimal” tradeoff with efficient encoding and decoding ?

  7. Overview of the talk • Specify the setup • The model • What is the optimal tradeoff ? • Previous work • Construction of a “good” code • High level idea of why it works • Future Directions • Some recent progress

  8. Error-correcting codes C(x) x • Mapping C : kn • Message length k, codelength n • n≥ k • Rate R =k/n 1 • Efficient means polynomial in n • Decoding Complexity y x Give up

  9. Shannon’s world • Noise is probabilistic • Binary Symmetric Channel • Every bit is flipped w/ probability p • Benign noise model • For example, does not capture bursty errors Claude E. Shannon

  10. Hamming’s world We will consider this channel model • Errors are worst case • error locations • arbitrary symbol changes • Limit on total number of errors • Much more powerful than Shannon • Captures bursty errors Richard W. Hamming

  11. A “low level” view • Think of each symbol in  being a packet • The setup • Sender wants to send k packets • After encoding sends n packets • Some packets get corrupted • Receiver needs to recover the original k packets • Packet size • Ideally constant but can grow with n

  12. Decoding x C(x) • C(x) sent, y received • x k,y n • How much of y must be correct to recover x ? • At least k packets must be correct • At most (n-k)/n = 1-R fraction of errors • 1-R is the information-theoretic limit • : the fraction of errors decoder can handle • Information theoretic limit implies 1-R R = k/n y

  13. R 1-R c1 c2 y Can we get to the limit or 1-R ? • Not if we always want to uniquely recover the original message • Limit for unique decoding,  <(1-R)/2 (1-R)/2 (1-R)/2 1-R (1-R)/2

  14. (1-R)/2 List decoding[Elias57, Wozencraft58] Almost all the space in higher dimension. All but an exponential (in n) fraction • Always insisting on unique codeword is restrictive • The “pathological” cases are rare • “Typical” received word can be decoded beyond (1-R)/2 • Better Error-Recovery Model • Output a list of answers • List Decoding • Example: Spell Checker

  15. (1-R)/2 Advantages of List decoding • Typical received words have an unique closest codeword • List decoding will return list size of one such received words • Still deal with worst case errors • How to deal with list size greater than one ? • Declare an error; or • Use some side information • Spell checker

  16. The list decoding problem Given a code and an error parameter  For any received word y Output all codewords c such that candydisagree inat mostfractionof places • Fundamental Question • The best possible tradeoff between R and ? • With “small” lists • Can it approach information-theoretic limit 1-R ?

  17. Other applications of list decoding Cryptography Cryptanalysis of certain block-ciphers [Jakobsen98] Efficient traitor tracing scheme [Silverberg, Staddon, Walker 03] Complexity Theory Hardcore predicates from one way functions [Goldreich,Levin 89; Impagliazzo 97; Ta-Shama, Zuckerman 01] Worst-case vs. average-case hardness [Cai, Pavan, Sivakumar 99; Goldreich, Ron, Sudan 99; Sudan, Trevisan, Vadhan 99; Impagliazzo, Jaiswal, Kabanets 06] Other algorithmic applications IP Traceback [Dean,Franklin,Stubblefield 01; Savage, Wetherall, Karlin, Anderson 00] Guessing Secrets [Alon,Guruswami,Kaufman,Sudan 02; Chung, Graham, Leighton 01] May 25, 2007 17 Ph.D. Final Exam

  18. Overview of the talk • Specify the setup • The model • The optimal tradeoff between rate and fraction of errors • Previous work • Construction of a “good” code • High level idea of why it works • Future Directions • Some recent progress

  19. Unique decoding Inf. theoretic limit Frac. of Errors () Rate (R) Information theoretic limit • < 1 - R • Information-theoretic limit • Can handle twice as many errors

  20. Achieving information theoretic limit • There exist codes that achieve the information theoretic limit • ≥ 1-R-o(1) • Random coding argument • Not a useful result • Codes are not explicit • No efficient list decoding algorithms • Need explicit construction of such codes • We also need poly time (list) decodability • Requires list size to be polynomial

  21. The challenge • Explicit construction of code(s) • Efficient list decoding algorithms up to the information theoretic limit • For rate R, correct 1-R fraction of errors • Shannon’s work raised similar challenge • Explicit codes achieving the information theoretic limit for stochastic models • The challenge has been met [Forney 66, Luby-Mitzenmacher-Shokrollahi-Spielman 01, Richardson-Urbanke01] • Now for stronger adversarial model

  22. Unique decoding Inf. theoretic limit Frac. of Errors () Guruswami-Sudan Rate (R) The best until 1998 Motivating Question: Close the gap between blue and green line with explicit efficient codes •   1 -R1/2 • Reed-Solomon codes • Sudan 95, Guruswami-Sudan98 • Better than unique decoding • At R=0.8 • Unique: 10% • Inf. Th. limit: 20% • GS : 10.56 %

  23. Unique decoding Inf. theoretic limit Guruswami-Sudan Parvaresh-Vardy Frac. of Errors () Rate (R) The best until 2005 •   1-(sR)s/(s+1) • s  1 • Parvaresh,Vardy • s=2 in the plot • Based on Reed-Solomon codes • Improves GS for R < 1/16

  24. Unique decoding Inf. theoretic limit Frac. of Errors () Rate (R) Our Result •   1- R -  • > 0 • Folded RS codes • [Guruswami, R.06] Guruswami-Sudan Parvaresh-Vardy Our work

  25. Overview of the talk • Specify the setup • The model • The optimal tradeoff between rate and fraction of errors • Previous work • Our Construction • High level idea of why it works • Future Directions • Recent progress

  26. The main result • Construction of algebraic family of codes • For every rate R >0 and  >0 • List decoding algorithm that can correct 1 - R -  fraction of errors • Based on Reed-Solomon codes

  27. Algebra terminology • Fwill denote a finite field • Think of it as integers mod some prime • Polynomials • Coefficients come from F • Poly of degree 3 over Z7 • f(X) = X3 +4X +5 • Evaluate polynomials at points in F • f(2) = (8 + 8 + 5) mod 7 = 21 mod 7 =0 • Irreducible polynomials • No non-trivial polynomial factors • X2+1 is irreducible over Z7 , while X2-1 is not

  28. f(1) f(3) f(2) f(4) f(n) Reed-Solomon codes • Message: (m0,m1,…,mk-1) Fk • View as poly. f(X) = m0+m1X+…+mk-1Xk-1 • Encoding, RS(f) = ( f(1),f(2),…,f(n) ) • F={ 1,2,…,n} • [Guruswami-Sudan] Can correct up to 1-(k/n)1/2 errors in polynomial time

  29. f(4) f(1) f(2) f(3) f(n) g(3) g(4) g(1) g(2) g(n) Parvaresh Vardy codes (of order 2) g(X)=f(X)q mod E(X) f(X) g(X) • Extra information from g(X) helps in decoding • Rate, RPV = k/2n • [PV05] PV codes can correct 1 -(k/n)2/3 errors in polynomial time • 1 - (2RPV)2/3

  30. Towards our solution • Suppose g(X) = f(X)q mod E(X) = f(-X) • Let us look again at the PV codeword f(-a1) f(1) g(1) f(-a1) g(-a1) f(1)

  31. Folded Reed Solomon Codes • Suppose g(X) = f(X)q mod E(X) = f(-X) • Don’t send the redundant symbols • Reduces the length to n/2 • R = (k/2)/(n/2) = k/n • Using PV result, fraction of errors • 1 - (k/n)2/3 = 1 - R2/3 f(-a1) f(1) f(1) f(-a1)

  32. Getting to 1-R- • Started with PV code with s = 2 to get 1 - R2/3 • Start with PV code with general s • 1 - Rs/(s+1) • Pick s to be “large” enough to approach 1-R- • Decoding complexity increases from that of Parvaresh-Vardy but still polynomial

  33. What we actually do We show that for any generator g  F\{ 0 } g(X) = f(X)qmod E(X) = f(gX) Can achieve similar compression by grouping elements in orbits of g m’~n/m, R ~ (k/m)/(n/m) = k/n f(1) f(gm) f(g(m’-1)m) f(g) f(gm+1) f(g(m’-1)m+1) f(mm’-1) f(gm-1) f(g2m-1)

  34. Proving f(X)qmod E(X) = f(gX) • First use the fact f(X)q= f(Xq) overF • Need to show f(Xq) mod E(X) = f(gX) • Proving Xq mod E(X) = gX suffices • Or, E(X) divides Xq-1 - g • E(X) = Xq-1 – gis irreducible

  35. Unique decoding Inf. theoretic limit Frac. of Errors () Rate (R) Our Result • · 1- R -  • > 0 • Folded RS codes • [Guruswami, R.06] Guruswami-Sudan Parvaresh-Vardy Our work

  36. “Welcome” to the dark side…

  37. Limitations of our work • To get to 1 - R - , need s > 1/ • Alphabet size = ns > n1/ • Fortunately can be reduced to 2poly(1/) • Concatenation + Expanders [Guruswami-Indyk’02] • Lower bound is 21/ • List size (running time) > n1/ • Open question to bring this down

  38. Time to wake up

  39. Overview of the talk • List Decoding primer • Previous work on list decoding • Codes over large alphabets • Construction of a “good” code • High level idea of why it works • Codes over small alphabets • The current best codes • Future Directions • Some (very) modest recent progress

  40. Optimal Tradeoff for List Decoding • Best possible  is H-1 (1-R) • H()= - log  - (1- )log(1- ) • Exists (H-1(1-R-),O(1/ )) list decodable code • Random code of rate R has the property whp •  > H-1(1-R+) implies super poly list size • For any code • For large q, H-1 (1-R)  1-R q q q q

  41. Our Results (q=2) Optimal tradeoff H-1(1-R) [Guruswami, R. 06] “Zyablov” bound [Guruswami, R. 07] Blokh-Zyablov Previous best Optimal Tradeoff # Errors Rate Zyablov bound Blokh-Zyablov bound

  42. How do we get binary codes ? Concatenation of codes [Forney 66] C1: (GF(2k))K(GF(2k))N (“Outer” code) C2: GF(2)k(GF(2))n (“Inner” code) C1± C2: (GF(2))kK (GF(2))nN Typically k=O(log N) Brute force decoding for inner code C2(wN) C2(w1) C2(w2) m m1 m2 mK w1 w2 wN C1(m) C1± C2(m)

  43. List Decoding concatenated code C1 = folded RS code C2 = “suitably chosen” binary code Natural decoding algorithm Divide up the received word into blocks of length n Find closest C2 codeword for each block Run list decoding algorithm for C1 Loses Information!

  44. List Decoding C2 S1 S2 SN 2 GF(2)n y1 y2 yN 2 GF(2)k How do we “list decode” from lists ?

  45. The list recovery problem Given a code and an error parameter  For any set of lists S1,…,SN such that |Si|  s, for every i Output all codewords c such that ci 2 Si forat least 1-fractionof i’s List decoding is special case with s=1

  46. List Decoding C1±C2 List Recovering Algorithm for C1 y1 y2 yN List decode C2 S1 S2 SN

  47. Putting it together [Guruswami, R. 06] C1 can be list recovered from 1 and C2 can be list decoded from 2 errors C1±C2 list decoded from 12 errors Folded RS of rate R list recoverable from 1-R errors Exists inner codes of rate r list decoded from H-1 (1-r) errors Can find one by “exhaustive” search C1±C2 list decodable fr’m (1-R)H-1(1-r) errors

  48. Multilevel Concatenated Codes C1: (GF(2k))K(GF(2k))N (“Outer” code 1) C2: (GF(2k))L(GF(2k))N (“Outer” code 2) Cin: GF(2)2k(GF(2))n (“Inner” code) M1 m1 m2 M2 ML mK m M w1 v1 v2 w2 wN vN C1(m) C2(M) Cin(v1,w1) Cin(v2,w2) Cin(vN,wN) C1 and C2 are FRS

  49. Advantage over rate rR Concat Codes C1,C2 ,Cinhave rates R1, R2and r Final rate r(R1+R2)/2, choose R1< R Step 1: Just recover m List decode Cinup toH-1 (1-r) errors List recover C1 up to 1-R1 errors M1 m1 m2 M2 ML mK m M w1 v1 v2 w2 wN vN C1(m) C2(M) Cin(v1,w1) Cin(v2,w2) Cin(vN,wN) Can handle (1-R1)H-1(1-r) >(1-R)H-1(1-r) errors

  50. Advantage over Concatenated Codes Step 2: Just recover M, given m Subcode of Cinof rater/2 acts on M List decode subcode upto H-1(1-r/2) errors List recover C2upto 1-R2 errors Can handle (1-R2) H-1(1-r/2) errors M1 m1 m2 M2 mK ML M m w1 v1 v2 w2 vN wN C1(m) C2(M) Cin(v1,w1) Cin(v2,w2) Cin(vN,wN)

More Related