Network coding for distributed storage systems
Download
1 / 41

Network Coding for Distributed Storage Systems - PowerPoint PPT Presentation


  • 143 Views
  • Uploaded on

Network Coding for Distributed Storage Systems. IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2010 Alexandros G. Dimakis Brighten Godfrey Yunnan Wu Martin J. Wainwright Kannan Ramchandran. Outline. Introduction Background Analysis Evaluation Conclusion. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Network Coding for Distributed Storage Systems' - emma


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Network coding for distributed storage systems

Network Coding for Distributed Storage Systems

IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2010

Alexandros G. Dimakis

Brighten Godfrey

Yunnan Wu

Martin J. Wainwright

KannanRamchandran


Outline
Outline

  • Introduction

  • Background

  • Analysis

  • Evaluation

  • Conclusion


Introduction
Introduction

  • Distributed storage systems provide reliable access to data through redundancy spread over individually unreliable nodes.

  • Storing data in distributed storage systems

    • the encoded data are spread across nodes.

    • require less redundancy than replication.

    • replace stored data periodically.


Introduction1
Introduction

  • Key issue in distributed storage systems.

    • repair bandwidth

    • storage space

  • How to generate encoded data in a distributed way as little data as possible ?


Mds codes
MDS Codes

  • A common practice to repair from a single node failure for an erasure coded system.

    • a new node to reconstruct the whole encoded data object.

    • then, generate just one encoded block.

  • Maximum Distance Separable (MDS) code.

    • (n, k)-MDS property

    • recover original file by any k set of encoded data.


Mds codes1
MDS Codes

M/k

M/k

MDS encode

M/k

store at n nodes

File

divide

encode

M/k


Introduction2
Introduction

  • Redundancy must be continually refreshed as nodes fail in distributed storage systems.

    • large data transfers across the network.


Introduction3
Introduction

  • The erasure codes can be repaired without communicating the whole data object.

  • (4, 2)-MSR example when node is fail.

    • generate smaller parity packets of their data.

    • forward them to the newcomer.

    • the newcomer mix packets to generate two new packets.

0.5

0.5

0.5

0.5

0.5

0.5

0.5


Introduction4
Introduction

  • This paper identifies that there is a optimal tradeoff curve between storage and repair bandwidth.

    • smaller storage space => less redundancy => more repair bandwidth

  • This paper calls codes that lie on this optimal tradeoff curve regenerating codes.


Introduction5
Introduction

  • Minimum-Storage Regenerating (MSR) codes.

    • can be efficiently repaired.

  • Minimum-Bandwidth Regenerating (MBR) codes.

    • storage node stores slightly more than M/k .

    • the repair bandwidth can be reduced.


Outline1
Outline

  • Introduction

  • Background

  • Analysis

  • Evaluation

  • Conclusion


Erasure codes
Erasure Codes

  • Classical coding theory focuses on the tradeoff between redundancy and error tolerance.

  • In terms of the redundancy-reliability tradeoff, the Maximum Distance Separable (MDS) codes are optimal.

    • the most well-known is Reed-Solomon codes.


Network coding
Network Coding

  • Network coding allows

    • the intermediate nodes to generate output data by encoding previously received input data.

    • information to be “mixed” at intermediate nodes.

  • This paper investigates the application of network coding for the repair problem in distributed storage.

    • tradeoff between storage and repair network bandwidth


Distributed storage systems
Distributed Storage Systems

  • Erasure codes could reduce bandwidth use by an order of magnitude compared with replication.

  • Hybrid strategy:

    • one special storage node maintains one full replica.

    • multiple erasure encoded data.

    • transfer only M / kbytes for a new encoded data by replica node.

    • there is the problem when replica data lost.


Outline2
Outline

  • Introduction

  • Background

  • Analysis

  • Evaluation

  • Conclusion



Storage bandwidth tradeoff
Storage-Bandwidth Tradeoff

  • The normal redundancy we want to maintain requires active storage nodes

    • each storing αbits

    • β bits each from any d surviving nodes

    • total repair bandwidth is γ = dβ

  • For each set of parameters (n, k, d, α,γ), there is a family of information flow graphs, each of which corresponds to a particular evolution of node failures / repairs.


Storage bandwidth tradeoff1
Storage-Bandwidth Tradeoff

  • Denote this family of directed acyclic graphs by

    • (4, 2, 3, 1 Mb, 1.5 Mb) is feasible.


Storage bandwidth tradeoff2
Storage-Bandwidth Tradeoff

  • Theorem 1 : For any α≥ α*(n, k, d, γ), the points are feasible.



Theorem proof 2 4
Theorem Proof (2/4)

  • .

  • .

  • .

  • .


Theorem proof 3 4
Theorem Proof (3/4)

  • .

  • .


Theorem proof 4 4
Theorem Proof (4/4)

  • .

  • .


Storage bandwidth tradeoff3
Storage-Bandwidth Tradeoff

  • Code repair can be achieved if and only if the underlying information flow graph has sufficiently large min-cuts.


Storage bandwidth tradeoff4
Storage-Bandwidth Tradeoff

  • Optimal tradeoff curve between storage α and repair bandwidth γ

    • (γ = 1, α= 0.2) (γ = 1, α= 0.1)


Special cases 1 2
Special Cases (1/2)

  • Minimum-Storage Regenerating (MSR) Codes

    • .

    • .


Special cases 2 2
Special Cases (2/2)

  • Minimum-Bandwidth Regenerating (MBR) Codes

    • .

    • .


Outline3
Outline

  • Introduction

  • Background

  • Analysis

  • Evaluation

    • Node Dynamics and Objectives

    • Model

    • Quantitative Results

  • Conclusion


Node dynamics and objectives 1 2
Node Dynamics and Objectives (1/2)

  • A permanent failure

    • the permanent departure of a node from the system

    • a disk failure resulting in loss of the data stored on the node

  • A transient failure

    • node reboot

    • temporary network disconnection


Node dynamics and objectives 2 2
Node Dynamics and Objectives (2/2)

  • A file is available

    • it can be reconstructed from the data stored on currently available nodes.

  • A file is durability

    • after permanent node failures, it may be available at some point in the future.


Model 1 5
Model (1/5)

  • The model has two key parameters, fand a.

    • a fraction f of the nodes storing file data fail permanently per unit time.

    • at any given time, the node storing data is available with some probability a.

  • The expected availability and maintenance bandwidth of various redundancy schemes can be computed to maintain a file of M bytes.


Model 2 5
Model (2/5)

  • Replication

    • redundancyR replicas

    • store total R M bytes

    • replace f RM bytes per unit time

    • the file is unavailable if no replica is available

      • probability

  • Ideal Erasure Codes

    • n = kR, redundancy Rn / k

    • transfer just M / k bytes each packet

    • replace fRM bytes per unit time

    • unavailability probability


Model 3 5
Model (3/5)

  • Hybrid

    • n = k(R− 1)

    • store total RM bytes

    • transfer fRM bytes per unit time

    • The file is unavailable if the replica is unavailable and fewer than k erasure-coded packets are available

      • probability


Model 4 5
Model (4/5)

  • Minimum-Storage Regenerating Codes

    • store total RM bytes

    • redundancy Rn / k

    • replace fRMbytes per unit time

    • extra amount of information

    • unavailability


Model 5 5
Model (5/5)

  • Minimum-Bandwidth Regenerating Codes

    • store total M n bytes

    • redundancy Rn / k

    • replace f M n bytes per unit time

    • extra amount of information

    • unavailability


Estimating f and a
Estimating f and a


Quantitative results 1 2
Quantitative Results (1/2)


Quantitative results 2 2
Quantitative Results (2/2)


Quantitative comparison
Quantitative Comparison

  • Comparison With Hybrid

    • Disadvantage : asymmetric design

  • MBR codes

    • Disadvantage :

      • reconstruct the entire file, requires communication with n1 nodes

      • if the reading frequency of a file is sufficiently high and kis sufficiently small, this inefficiency could become unacceptable.


Outline4
Outline

  • Introduction

  • Background

  • Analysis

  • Evaluation

  • Conclusion


Conclusion
Conclusion

  • This paper presented a general theoretic framework that can determine the information.

    • communicate to repair failures in encoded systems.

    • identify a tradeoff between storage and repair bandwidth.

  • One potential application area for the proposed regenerating codes is distributed archival storage or backup.

    • regenerating codes potentially can offer desirable tradeoffs in terms of redundancy, reliability, and repair bandwidth.


ad