1 / 45

Self-repairing Homomorphic Codes for Distributed Storage Systems [1]

Self-repairing Homomorphic Codes for Distributed Storage Systems [1]. Tao He elfinhe@gmail.com Software Engineering Laboratory Department of Computer Science, Sun Yat-Sen University With thanks to Frederique Oggier , Anwitaman Datta Week 17 (Jun 15) of

denali
Download Presentation

Self-repairing Homomorphic Codes for Distributed Storage Systems [1]

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Self-repairing Homomorphic Codesfor Distributed Storage Systems[1] Tao He elfinhe@gmail.com Software Engineering Laboratory Department of Computer Science, Sun Yat-Sen University With thanks to Frederique Oggier, AnwitamanDatta Week 17 (Jun 15) of Advanced Topics on Computer Networking, Spring, 2011 Sun Yat-Sen University, Guangzhou, China [1] “Self-repairing Homomorphic Codes for Distributed Storage Systems, Frederique Oggier (Nanyang Technological University, Singapore); AnwitamanDatta (NTU, Singapore) IEEE INFOCOM 2011, Shanghai, China, April 10-15, 2010.

  2. Outline • Background • Related work • Motivation • Evaluation • Reed-Solomon Codes • Self-repairing • Evaluation Details • Discussion

  3. Background

  4. Networked storage systems • Gain prominence in recent years. • Decentralized peer-to-peer storage systems • Dedicated infrastructure based data-centers • Storage area networks

  5. Redundancy in networked storage systems • Why need redundancy? • Storage node failures • User attrition in a peer-to-peer system • Solutions • Replication • Erasure coding techniques

  6. What is erasure codes? • Erasure codes provide a storage efficient alternative to replication based redundancy in (networked) storage systems.

  7. What is erasure codes? (cont’)

  8. What is erasure codes? (cont’)

  9. What is erasure codes? (cont’)

  10. What is erasure codes? (cont’)

  11. What is erasure codes? (cont’)

  12. Related work

  13. Related work • Regenerating codes (RGC) [2] • First reconstruct the whole object • Use classical erasure codes as a black box • Hierarchical codes(HC) [3] • Encode two bits into three by XOR operation • Any two encoded bits can recover the third one [2] A. G. Dimakis, P. Brighten Godfrey, M. J.Wainwright, K. Ramchandran, “The Benefits of Network Coding for Peer-to-Peer Storage Systems”, Workshop on Network Coding, Theory, and Applications (Netcod), 2007. [3] A. G. Dimakis, P. Brighten Godfrey, Y. Wu, M. O. Wainwright, K. Ramchandran, “Network Coding for distributed Storage Systems”, available online at http://arxiv.org/abs/0803.0632.

  14. Weakness of related work • RGCs communicate a lot of nodes • RGCs need to communicate with at least k other nodes to recreate any fragment, and the minimal overhead is achieved if only one fragment is missing, and information is downloaded from all other n-1 nodes. • HCs do not have symmetric roles • Replenish a specific fragment in HCs depends on which specific fragments are missing, and not solely on how many.

  15. Motivation

  16. Motivation • Minimize the number of nodes necessary to reduce the reconstruction of a missing block • Translate into lower bandwidth consumption • Implement faster and parallel replenishment

  17. The authors’ approach • Self-repairing codes(SRC)-- AReed-Solomon Codes-like approach • Encoded fragments can be repaired directly from other subsets of encoded fragments without having to reconstruct first the original data. • A fragment is repaired from a fixed number of encoded fragments • Depending only on how many encoded blocks are missing • Independent of which specific blocks are missing

  18. Evaluation

  19. Evaluation • Static resilience of SRCs with respect to traditional erasure codes, and observe that SRCs incur marginally larger storage overhead. • Advantages • Low communication overheads for reconstruction • Lower latency by facilitating repairs in parallel

  20. Reed-Solomon Codes[4] [4] I. S. Reed and G. Solomon, “Polynomial Codes Over Certain Finite Fields”, Journal of the Society for Industrial and Appl. Mathematics, no 2, vol. 8, SIAM, 1960.

  21. Reed-Solomon Codes

  22. Reed-Solomon Codes (cont’) • Key idea: Utilize equations to keep redundancy Generation Encoding

  23. Reed-Solomon Codes – An example Division Generation Encoding

  24. Reed-Solomon Codes – An example Division Generation Encoding

  25. Reed-Solomon Codes – An example Division Generation Encoding

  26. Reed-Solomon Codes – An example Division Generation Encoding

  27. Reed-Solomon Codes – An example • To decode, based on the linear algebra, we can solve linearequations: • By reconstructing the values of a and b • We get

  28. Self-repairing

  29. Self-repairing • Now for n = 7, and say, 1,w1,w2,w4,w5,w8,w10, after encoding, we get: • (p(1), p(w), p(w2), p(w4), p(w5), p(w8), p(w10)) , storing in each node. (wis a root of an irreducible monic polynomial of degree m over F2)

  30. Suppose node 5 which stores p(w5) goes offline. A new comer can get p(w5) by asking for p(w2) and p(w), since Self-repairing (cont’)

  31. Self-repairing (cont’) • Table I shows other examples of missing fragments and which pairs can reconstruct them, depending on if 1, 2, or 3 fragments are missing at the same time.

  32. Evaluation Details

  33. Static Resilience Analysis • Static resilience of a distributed storage system is defined as the probability that an object, once stored in the system, will continue to stay available without any further maintenance, even when a certain fraction of individual member nodes of the distributed system become unavailable.

  34. A network matrix representation • Recall that using the above coding strategy, an object o of length M is decomposed into k fragments of length M/k: • which are further encoded into n fragments of same length: • We thus have n nodes each possessing a binary vector of length M/k, which can be represented as an n × M/k binary matrix:

  35. EvaluationMetrics • R(x, d, r) as the number of x × d sub-matrices with rank r, voluntarily including all the possible permutations of the rows in the counting. • ρ(x, d, r) be the fraction of sub-matrices of dimension x×d with rank r out of all possible sub-matrices of the same dimension. Then

  36. EvaluationMetrics(cont’) • Using an HSRC(n, k), the probability pobj of recovering the object is • If we use a (n, k) erasure code, then the probability that the object is recoverable is:

  37. Experiment results

  38. Experiment results(cont’)

  39. Experiment results(cont’)

  40. Discussion

  41. Bandwidth usage • Figure 2 shows the average amount of network traffic to transfer encoded fragments per lost fragment when the various lazy variants of repair are used, namely parallel and sequential repairs with SRC, and (by default, sequential) repair when using EC.

  42. Repairs of in parallel • A final advantage of SRC which we further showcase next is the possibility to carry out repairs of different fragments independently and in parallel (and hence, quickly) • SRC allows for fast reconstruction of missing blocks. Orchestration of such distributed reconstruction to fully utilize this potential in itself poses interesting algorithmic and systems research challenges

  43. Conclusion • The authors propose a new family of codes, called self-repairing codes • The characteristics of distributed networked storage systems • Low-bandwidth consumption for repairs • Parallel and independent replenishment of lost redundancy

  44. Q & A

  45. Thank you!Contact me via elfinhe@gmail.com

More Related