1 / 15

236601 - Coding and Algorithms for Memories Lecture 13

236601 - Coding and Algorithms for Memories Lecture 13. Large Scale Storage Systems. Big Data Players: Facebook, Amazon, Google, Yahoo,… Cluster of machines running Hadoop at Yahoo! (Source: Yahoo!) Failures are the norm. 2. Node failures at Facebook. Date.

krista
Download Presentation

236601 - Coding and Algorithms for Memories Lecture 13

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 236601 - Coding and Algorithms for MemoriesLecture 13

  2. Large Scale Storage Systems • Big Data Players: Facebook, Amazon, Google, Yahoo,… Cluster of machines running Hadoop at Yahoo! (Source: Yahoo!) • Failures are the norm 2

  3. Node failures at Facebook Date XORingElephants: Novel Erasure Codes for Big Data M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, VLDB 2013 3

  4. Problem Setup • Disks are stored together in a group (rack) • Disk failures should be supported • Requirements: • Support as many disk failures as possible • And yet… • Optimal and fast recovery • Low complexity

  5. Reed Solomon Codes • A code with parity check matrix of the form Where is a primitive element at some extension field and O() > n-1 Claim: Every sub-matrix of size dxd has full rank

  6. Reed Solomon Codes • Advantages: • Support the maximum number of disk failures • Are very comment in practice and have relatively efficient encoding/decoding schemes • Disadvantages • Require to work over large fieldsSolution: EvenOdd Codes • Need to read all the disks in order to recover even a single disk failure – not efficient rebuildSolution: ZigZag Codes

  7. The Repair Problem RS code • Facebook’s storage Scheme: • 10 data blocks • 4 parity blocks • Can tolerate any four disk failures 1 2 3 4 5 6 7 8 9 10 P1 P2 P3 P4 • A disk is lost – Repair job starts • Access, read, and transmit data of disks! • Overuse of system resources during single repair • Goal: Reduce repair cost in a single disk repair 7

  8. ZigZag Codes • Designed by ItzhakTamo, Zhiying Wang, and JehoshuaBruck • The goal: construct codes correcting the max number of erasures and yet allow efficient reconstruction if only a single drive fails

  9. ZigZag Codes • Lower bound: The min amount of data required to be read to recover a single drive failure • (n,k) code: n drives, k information, and n-k redundancy • M- size of a single drive in bits • For (n,n-2) code it is required to read at least 1/2 from the remaining drives, that is at least (1/2)(n-1)M bits • The last example is optimal • In general, for (n,n-r) code it required to read at least 1/r from the remaining drives (1/r)(n-1)M

  10. ZigZag Codes • Example

  11. Network Coding for Distributed Storage • Goal – show the following:In general, for (n,n-r) code it required to read at least 1/r from the remaining drives (1/r)(n-1)M • Network Coding for Distributed StorageDimakis, Godfrey, Wu, Wainwright, Ramchandran • File of size M is partitioned into k pieces of size M/k • The k pieces are encoded into n encoded pieces using an (n,k) MDS code

  12. Network Coding for Distributed Storage • File of size M is partitioned into k pieces of size M/k • The k pieces are encoded into n encoded pieces using an (n,k)MDS code x1 x2 y1 x3 y2 x4

  13. Network Coding for Distributed Storage • File of size M is partitioned into k pieces of size M/k • The k pieces are encoded into n encoded pieces using an (n,k)MDS code x1 β=? β x2 y1 x3 β y2 x5 x4

  14. Network Coding for Distributed Storage • File of size M is partitioned into k pieces of size M/k • The k pieces are encoded into n encoded pieces using an (n,k)MDS code α=1 x1in x1out β=? ∞ ∞ α=1 x2in x2out ∞ β DC S ∞ α=1 x3in x3out β ∞ ∞ x5in x5out α=1 x4in x4out

  15. ZigZag Codes • Example

More Related