1 / 57

Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding. Yuchong Hu, Yinlong Xu, Xiaozhao Wang, Cheng Zhan and Pei Li IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL . 28, NO. 2, FEBRUARY 2010. Outline. Introduction Problem Statement

wirt
Download Presentation

Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cooperative Recovery ofDistributed StorageSystems from Multiple Losses withNetwork Coding Yuchong Hu, Yinlong Xu, Xiaozhao Wang, Cheng Zhan and Pei Li IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 28, NO. 2, FEBRUARY 2010

  2. Outline • Introduction • Problem Statement • Mutually Cooperative Recovery(MCR) • MCR Transmission And Coding Schemes • Conclusion

  3. Introduction • The recovery from multiple node failures in distributed storage systems. • We design a mutually cooperative recovery (MCR) mechanism for multiple node failures.

  4. Introduction • Dimakis et al.[8][10] prove that the filereconstructionproblem in distributed storage systems is equivalent to themulticasting problem. • Two symmetric mechanisms for maintaining redundancy • Minimum-storageregenerating (MSR) codes • Minimum-bandwidth regenerating(MBR) codes

  5. Introduction • In MCR all the new nodes repair thelost data cooperatively and simultaneously. • Will provethat there exists a random linear coding with the minimalmaintenance bandwidth in MCR while keeping the strong-MDS Property.

  6. Outline • Introduction • Problem Statement • Mutually Cooperative Recovery(MCR) • MCR Transmission And Coding Schemes • Conclusion

  7. Problem Statement • An identical storage capability. • Communications between any two nodes are symmetric in a distributed storage system. • Original file is (n, k) MDS encoded • The n encoded fragments are stored evenly atn nodes chosen from the system. • When r nodes becomeunavailable, the system chooses another r nodes to repair

  8. Problem Statement • As time goes by, the source node may leave thestorage network. • define a virtual source (V S) • Initial set of n nodes, . . ., • Each node stores oneencoded fragment. • The destination node Dcan connect toany k nodesto download k fragmentsfor the file reconstruction. • When r storage nodesbecomeunavailable • , . . . , (named“old nodes”) • , . . . , (named “new nodes”)

  9. Outline • Introduction • Problem Statement • Mutually Cooperative Recovery(MCR) • Model based on MCR • Assumptions in MCR • Information flow graph G(n, k, r, β) • Lower bound of maintenance bandwidth • Comparisons of MCR, MSR and MBR • MCR Transmission And Coding Schemes • Conclusion

  10. Mutually Cooperative Recovery(MCR) • Our study bases on the assumption that all the new nodescan mutually cooperatively complete their fragment reconstructions. • DSN(n, k, r).

  11. Outline • Introduction • Problem Statement • Mutually Cooperative Recovery(MCR) • Model based on MCR • Assumptions in MCR • Information flow graph G(n, k, r, β) • Lower bound of maintenance bandwidth • Comparisons of MCR, MSR and MBR • MCR Transmission And Coding Schemes • Conclusion

  12. Model based on MCR • The repair process of our mutually cooperative recovery inDSN(n, k, r) is specified as follows.

  13. Model based on MCR

  14. Outline • Introduction • Problem Statement • Mutually Cooperative Recovery(MCR) • Model based on MCR • Assumptions in MCR • Information flow graph G(n, k, r, β) • Lower bound of maintenance bandwidth • Comparisons of MCR, MSR and MBR • MCR Transmission And Coding Schemes • Conclusion

  15. Assumptions in MCR • Assuming that βi,jand β´j,jis the same as β. • The total bandwidth overhead for the recovery is

  16. Outline • Introduction • Problem Statement • Mutually Cooperative Recovery(MCR) • Model based on MCR • Assumptions in MCR • Information flow graph G(n, k, r, β) • Lower bound of maintenance bandwidth • Comparisons of MCR, MSR and MBR • MCR Transmission And Coding Schemes • Conclusion

  17. Information flow graph G(n, k, r, β) • An information flow graphG(n, k, r, β), a similar idea in [8]. • Nodes in G(n, k, r, β) • Edges in G(n, k, r, β)

  18. Information flow graph G(n, k, r, β)

  19. Outline • Introduction • Problem Statement • Mutually Cooperative Recovery(MCR) • Model based on MCR • Assumptions in MCR • Information flow graph G(n, k, r, β) • Lower bound of maintenance bandwidth • Comparisons of MCR, MSR and MBR • MCR Transmission And Coding Schemes • Conclusion

  20. Lower bound of maintenance bandwidth • Finding the lower bound of β by studying the capacity of the min-cut of G(n, k, r, β). • To keep the (n, k) MDS property in DSN(n, k, r) • each of the capacities of min-cuts in all the possible information flow graphs must be ≥ the original file size M bytes.

  21. Lower bound of maintenance bandwidth • Lemma 1 • To keep (n, k) MDS property in DSN(n, k, r), min-cuts separating V S from D in all possible information flow graphs G(n, k, r, β) must be not smaller than M bytes.

  22. Lower bound of maintenance bandwidth • Lemma 2 • Let (S, ) be the cut of G(n, k, r, β), where V S ∈ S, D ∈ . • If β ≥ M/[k(n − k)], each of the capacities of min-cuts of all possible G(n, k, r, β) is not smaller than M bytes.

  23. Lower bound of maintenance bandwidth • Proof Lemma 2 • Assume that D connects to tnew nodes and k −told nodes for the file reconstruction. • Case 1 : is in . () • Case 2 : , .() • Case 3 : , but are in S.()

  24. Lower bound of maintenance bandwidth

  25. Lower bound of maintenance bandwidth Part 1: The sum of capacities of edges from to for in S and is (M/k).

  26. Lower bound of maintenance bandwidth Part 2: The sum of capacities of edges from to for in S andis β(n−r−)

  27. Lower bound of maintenance bandwidth Part 3: The sum of capacities of edges between and for in S andis β(r−).

  28. Lower bound of maintenance bandwidth • Part 4: The sum of capacities of edges from to for in S andis (M/k).

  29. Lower bound of maintenance bandwidth We analyze c(S,) for two cases of (M/k − β.

  30. Lower bound of maintenance bandwidth • Case 1 : M/k − β≥ 0

  31. Lower bound of maintenance bandwidth • Case 1 : M/k − β≥ 0 • It means that there are some cases where the capacity of the min-cut is equal to M for β = M/[k(n−k)].

  32. Lower bound of maintenance bandwidth • Case 2 : M/k − β≤ 0 • n-r ≥ kimplies c(S,) ≥ M

  33. Lower bound of maintenance bandwidth • By Case 1 and Case 2, Lemma 2 concludes • Let (S, ) be the cut of G(n, k, r, β), where V S ∈ S, D ∈ . • If β ≥ M/[k(n − k)], each of the capacities of min-cuts of all possible G(n, k, r, β) is not smaller than M bytes.

  34. Lower bound of maintenance bandwidth • Lemma 3: • there exists a random linear network codingscheme guaranteeing that D can reconstruct the original filefor any connection choice. • with aprobability that can be driven arbitrarily to 1 by increasing thefield size of F.

  35. Lower bound of maintenance bandwidth • Theorem 1: • the (n, k) MDSproperty is still kept after the recovery if β is not smallerthan M/[(n − k)k]. • Proof by Lemma 2 and Lemma 3.

  36. Outline • Introduction • Problem Statement • Mutually Cooperative Recovery(MCR) • Model based on MCR • Assumptions in MCR • Information flow graph G(n, k, r, β) • Lower bound of maintenance bandwidth • Comparisons of MCR, MSR and MBR • MCR Transmission And Coding Schemes • Conclusion

  37. Comparisons of MCR, MSR and MBR • MCR • Each node stores α = M/kbytes. • Andr nodes become unavailable. • Each of rnew node downloads βbytes from each of any davailable nodes. • The storage cost = (M/k)*n • Maintenance bandwidth = [(n − 1)/(n − k)]*(M/k)*r

  38. Comparisons of MCR, MSR and MBR • = (n-r)/k, = n/k. • (i.e. n = 6k and r = 4k), when 4k of 6k original nodes become unavailable, the multi-loss recovery is triggered • Setting das the maximum d = n − r to compare MCR with the best tradeoff of MSR and MBR.

  39. Comparisons of MCR, MSR and MBR • Compared with MSR • maintenance bandwidth : 22%. • storage cost : same . • Compared with MBR • maintenance bandwidth : same. • storage cost : 23%.

  40. Comparisons of MCR, MSR and MBR • Compared with MSR • maintenance bandwidth : 23%. • storage cost : same . • Compared with MBR • maintenance bandwidth : 11%. • storage cost : 23%.

  41. Comparisons of MCR, MSR and MBR • We can conclude that MCR has a better performance in the storage cost and maintenance bandwidth in multi-loss recovery of distributed storage systems compared with other non-cooperative recovery mechanisms.

  42. Outline • Introduction • Problem Statement • Mutually Cooperative Recovery(MCR) • MCR Transmission And Coding Schemes • Transmission scheme in MCR • Coding scheme in MCR • Conclusion

  43. MCR Transmission And Coding Schemes • Theorem 1 gives a lower bound of maintenance bandwidth with β = M/[k(n − k)]. • Constructing a recovery transmission scheme in MCR based on β = M/[k(n − k)] and a linear coding scheme based on Strong-MDS code.

  44. Outline • Introduction • Problem Statement • Mutually Cooperative Recovery(MCR) • MCR Transmission And Coding Schemes • Transmission scheme in MCR • Coding scheme in MCR • Conclusion

  45. Transmission scheme in MCR • To satisfy β = M/[k(n − k)], the original file of size Mis represented as k(n − k) packets in MCR, each of sizeM/[k(n − k)]. • Assumptions: • The entries of , , are randomly selectedfrom a finite field F in the following scheme. • The file can be viewed as k(n − k) packets, each of size z.

  46. Transmission scheme in MCR • (1) File distribution • 1.1) Choose a set of n nodes , . . . , from idle nodesfor a file distribution. • 1.2)Encode the k(n − k) packets of the original file into n(n − k) packets. • 1.3) Each initial node stores n − k packets as a fragment

  47. Transmission scheme in MCR • (2) Data recovery from r failed nodes • 2.1) Choose a set of r new nodes , . . . , from idle nodes for repairing. • 2.2) Each old node transmits one encoded packet to new node • 2.3) Each new node transmits one encoded packet to each of the other new nodes. • 2.4) Each new node encodes n − 1 accepted packets into n − k linearly independent packets • 2.5) Each new node stores n − k packets as a fragment.

  48. Outline • Introduction • Problem Statement • Mutually Cooperative Recovery(MCR) • MCR Transmission And Coding Schemes • Transmission scheme in MCR • Coding scheme in MCR • Conclusion

  49. Coding scheme in MCR • (n, k) Strong-MDS code: • An original file is divided into k(n − k) packets and encoded into n(n − k) packets. • The n(n − k) encoded packets are stored at nnodes. • each node storing n − k encoded packets. • Each node can select encoded packets such that the original file can be reconstructed from the k(n − k) selected packets. 0 ≤ ≤ n − k,

  50. Coding scheme in MCR • Coding scheme in MCR via linear coding is based on the following Lemma 4. • Lemma 4 (Schwartz-Zippel Theorem)[11] • Theorem 2 to show that if a file in a distributed storage system is Strong-MDS encoded, it will still satisfy Strong-MDS Property after multilossrecovery in MCR. [11] R. Motwani and P. Raghavan, Randomized Algorithms. Cambridge University Press, 1995.

More Related