Erasure Code Replication

Erasure Code Replication Presenter: W.K Lin (The Chinese University of Hong Kong)

Why we need replication? • Storage devices can fail to function. • Use replication to increase data availability, e.g. RAID • The basic idea of replication: • Place more data in different places and increase the chance of finding a data. • P2P systems often provide replication.

Server-less VoD Architecture • No centralized video server to provide the video streaming. • Each client in the system store a partial video blocks. • Store the video blocks by erasure code. • Not necessary to stream from all peers for complete video playback. • The clients can stream the video from other clients.

Some Terminologies • Peers are the computers/ storage devices that store the data. • Peer availabilityμ is a measure to indicate the portion of time that the peer is up/ online. • File availabilityA is the probability to recover the file from the duplicated copies of data. • Storage overheadS is the ratio of storage required for replication to the storage required before replication

Whole File Replication • Whole file replication replicates the complete file. • If the storage overhead is S, then there are S copies of data in the system. • File availability Aw:

Whole File Replication • It is not storage effective: Adopted from : Replication Strategies for Highly Available Peer to Peer Networks, Ranjita Bhagwan et. al,

Erasure Code Replication • Instead of replicating the whole file, replicate a portion of the file. • Principle: • A file is divided into b blocks. • Use erasure code to add redundancy to these b blocks. We then have n blocks in total. • Make the n file blocks dependent to each other – each file block has partial information of other blocks. • Any b out of the n blocks are enough to recover the original file.

Erasure Code Replication • Storage overhead S = n/b; or n = S*b. • Since we need any b out of the S*b copies to recover the file, the file availability Aw is: • Notice that whole file replication is a special case of erasure code replication with b = 1.

Erasure Code Replication • Erasure code replication is more storage effective Adopted from : Replication Strategies for Highly Available Peer to Peer Networks, Ranjita Bhagwan et. al,

Effectiveness of Erasure Code Replication • The effectiveness of erasure code replication is determined by two factors: • combinatorial effect, i.e. SbCb >> SC1 • peer availability factorμb(1-μ)Sb-b • Erasure code replication depends on S, b, and μ.

Effectiveness of Erasure Code Replication

How Erasure Code Replication Performs? • File availability A (Aw or Ab) by varying μ and S:

A Related Problem • Lee and Liew paper: “Parallel Communications for ATM Network Control and Management” points out a similar problem: • An information string is divided into b parts, then encoded into n parts. • Any b out of the n parts is enough to recover the original information. • Very similar to our problem! • They prove a necessary bound Sμ > 1 for reliable communication.

Erasure Code Bound (Sμ > 1) • The area above the curve define the region that erasure code replication is preferred for large b.

Erasure Code Replication Sensitivity Analysis • We need to use a large b in order to benefit from erasure code replication. • If the system is operating at a level Sμ ~ 1, a little fluctuation of system parameter will harm the system.

Erasure Code Replication Sensitivity Analysis • The system is targeted to operate at S = 3, μ = 0.35. • Sμ > 1 • 10% measurement error of μ.

Related Work I: • Markov chain model for a simple birth/ death model: Adopted from : Design and Analysis of a Fault-Tolerant Mechanism for a Server-Less Video-On-Demand System Lee and Yeung

Related Work I: • Mean time to failure of the model: • Result:

Related Work II: • Another Markov model: c: connected state, mean time to stay = λ u: disconnected state, mean time to stay = μ . d: dead state α : the probability of going to disconnected state d. Adopted from : Data Durability in Peer to Peer Storage Systems Gil Utard, Antoine Vernois

Related Work II: Storage overhead S=3

Conclusion • Traditionally, erasure code replication has been very successful, e.g. RAID • A strict bound Sμ > 1, has to be satisfied for replication to gain from erasure code replication. • Erasure code replication is sensitive to system measurement errors. • Partly explain why erasure code replication is not seen in P2P systems.

Future Directions • Most analysis are based on the assumption that all peers have the same availability level. • In real system, a peer might have different failure and recovery rates. • The replica distribution, discovery are opened for research: • How to place/ locate the replicas if the peers are having different availabilities? • If the system fail, how to recover the lost replicas from the system?

~ End of presentation ~

Appendix • Proof: Let X be a binomial random variable having mean μ’=Sbμ and variance σ2 =Sbμ(1-μ).

Appendix • Similarly,

Erasure Code Replication

Erasure Code Replication

Presentation Transcript

REPLICATION

REPLICATION

Introduction to Erasure coding

Erasure Coding vs. Replication: A Quantiative Comparison

Replication

Replication

Replication

An Ensemble of Replication and Erasure Codes for CFS

Replication

Replication

Simulation of Finite Geometry LDPC code on the Packet Erasure channel

Erasure Correcting Codes

Reliable Broadband Communication Using a Burst Erasure Correcting Code

Replication

REPLICATION

Erasure coding

Replication

Replication

Replication

Replication

Replication

Reliable Broadband Communication Using a Burst Erasure Correcting Code