Rateless codes and random walks for P2P resource discovery in Grids

Rateless codes and random walks for P2P resource discovery in Grids IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, NOV. 2012. Valerio Bioglio Rossano Gaeta Marco Grangetto Matteo Sereno

Outline • Introduction • Related Work • Proposed System • Analysis • Simulation Results • Conclusion

Introduction • The system is presented as a set of nodes connected to form a P2P network. • each node contains a piece of information. • all nodes may leave or join dynamically. • A peer to obtain a local view of global information defined on all peers of a P2P unstructured network. • Every node must communicate to all the participants so as to obtain the information of other peers.

Introduction • Many proposals exploiting unstructured P2P systems share a common characteristic : • The interface peers • have one administrative domain • connect to other interface peers • maintain data of their local nodes • This paper assume • each peer holds a piece of information. • any peer requires to access the data of all other peers at rate λ queries/sec.

Introduction • The goals to be achieved are threefold : • The complete global information can be collect by every node. • The communication overhead must be limited. • The processing power of each node must be used parsimoniously.

Contribution • A continuous flow of control packets exchanged among the nodes using the random walk principle. • The information combined by each node has to be the same version. • The proposed solution is suitable for large size data held by each node.

Related Work(1/2) • The flow control used by [6] on the maximum rate at which a participant can submit updates without creating a backlog and devises content reconciliation mechanisms to reduce message redundancy. • Algebraic Gossip, proposed in [11], in this paper a gossip algorithm based on Network Coding is presented, and it is proved that the spreading time of this algorithm is O(K). [6] “Efficient reconciliation and flow control for anti-entropy protocols,” in Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware, LADIS ’08. ACM, 2008. [11] “Algebraic gossip: a network coding approach to optimal multiple rumor mongering,” IEEE Transactions on Information Theory, vol. 52, no. 6, pp. 2486–2507, JUN 2006.

Related Work (2/2) • In [13] distributed fountain codes are proposed for networked storage. To create a new encoded packet, each storage node asks information to a randomly selected node of the network. • A similar algorithm is proposed in [14], but the coded packet formation mechanism is reversed. • The nodes cope with the information gathering and the encoding operations; in [16] this responsibility is assigned to the packets. [13] “Bistributed fountain codes for networked storage,” in IEEE ICASSP, 2006. [14] “Data persistence in large-scale sensor networks with decentralized fountain codes,” in IEEE Infocom, 2007. [16] “Rateless packet approach for data gathering in wireless sensor networks,” Selected Areas in Communications, IEEE Journal on, vol. 28, no. 9, pp. 1169–1179, Sep. 2010.

System Description (1/3) • This paper models the interface peers of a Grid system and the connections among them as a graph G(V, E). • Vare the set of interface peers • Eare the set of edges • is node ID • is time-stamp, i.e. generations • is information • m bits each information

System Description (2/3) • To realize a concurrent broadcasting of all the information collected by all the nodes in the network. • all nodes should communicate with each other. • This paper proposes a fully distributed solution based on random walks. • each node starts a limited number ωof packets. • those packets are propagated by random walk in the network. • all the nodes use the packets to solve a system of linear equations.

System Description (3/3) • The shortcomings of network coding • The added computational complexity • Solution • using simple combinations XOR • using ratelesscodes, known as LT codes • The impossibility of asynchronous updating • Solution • asynchronous updating Node A Node B

Random Walk and LT Coding t4 v4 di Header t3 t2 eq2 v2 v3 c t1 eq1 dF v1

Random Walk and LT Coding • When a packet approaches the maximum dimension DIM, the eldest equation carried by it is deleted. • When the acknowledgement timer reaches 0 the receiving node acknowledges the originator that its random walker is still alive.

Asynchronous Update and LT Coding (1/3) • The information spread by the random walkers can be recovered by any node as soon as the number of equations has been collected. • The decoder task can be formulated as the solution of the following system of linear equations Gx = c. • G is an N×N binary matrix. • rows : N possible independent equations collected by the node • x is N×1 column vectors . • N unknown pieces of information • c is the corresponding buffered linear combinations.

Asynchronous Update and LT Coding (2/3) • The nodes are allowed to update their information only when a new generation is initiated. • the vector x is extended to the (ν+1)·N×1 vector ˜x • ˜G turns to be a (ν + 1)N×(ν + 1)N extended decoding matrix • The information collected in the network with a sliding window mechanism including the (ν+1) most recent generations for the information.

Asynchronous Update and LT Coding (3/3) • The idea is to keep the decoding as updated as possible aiming at reconstructing the last N elements of ˜x. • This paper proposes a strategy to manage the extended decoding matrix ˜Gin order to make the decoding process robust to asynchronous updates of the information.

Asynchronous Update Algorithm [21] V. Bioglio, M. Grangetto, R. Gaeta, and M. Sereno, “An optimal partial decoding algorithm for rateless codes,” in IEEE International Symposium on Information Theory (ISIT), aug 2011, pp. 2731 –2735.

Recovery Time (1/6) • The time required to spread all the local information to all the participants in the network is defined as recovery time. • Model the recovery time as a function of • the size of the local information m • the number of random walkers generated per node ω • the number of nodes in the network N • the maximum size of the random walk packets DIM.

Recovery Time (2/6) • Given the size DIM(in bits) of the transmission packet. • header size is h • other size is f= DIM − h • the pair (vl, tl) size is g • combined message ci size is m • So, the size of a single equation : • Coded:eC= dig + m, di= 2lnN • Uncoded : eU= g+ m

Recovery Time (3/6) • We can know that nUand nC the maximum number of equations storable in an uncoded and encoded packet are : • . • .

Recovery Time (4/6) • It is possible to predict the number of hops TC required to distribute a certain number of equations RC using the coded approach.

Recovery Time (5/6) • N= 1000 nodes

Recovery Time (6/6) • N = 1000, Nneigh= 50, ω = 1 • 95% confidence interval

Simulation Results (1/4) • In order to simulate the real P2P circumstances in networks : • at each time slot 30 random nodes shuffle their neighborhood by exchanging one random neighbor. • when a node joins it connects to a random set of neighboring nodes. when a node leaves its neighbors replace it through the described shuffling mechanism. • keep constant the overall number of packets in the network ideal signaling is assumed

Simulation Results (2/4) • For each node vlwe calculate the percentage of overall information retrieved by that node as a function of time T :

Simulation Results (3/4) • The average value of the previous index computed on the set of nodes A(T) that are active. • All the numerical results based on the previous definitions have been averaged over 30 independent trials so as to guarantee statistically meaningful values.

Simulation Results (4/4)

Conclusion • The design of a novel decoder for rateless codes that is robust to asynchronous updates of the information. • The development of a simple analytical model for the estimation of the time required to spread the information. • The encoded system scales better than the uncoded one when the number of nodes in the distributed system increases.

Rateless codes and random walks for P2P resource discovery in Grids

Rateless codes and random walks for P2P resource discovery in Grids

Presentation Transcript

Random Walks

Quantum Random Walks

Random walks in complex networks

Rateless Feedback Codes

Random Walks for Image segmentation

10.2 Random Walks

Using Mobile Agents for Network Resource Discovery in P2P Network

Parallel random walks

Random walks in complex networks

Accelerating Random Walks

Accelerating Random Walks

UEP Rateless Codes and LT Parameters

Random Walks for Mesh Denoising

Random Walks

Random Walks

Quantum random walks

Random Walks

Random Walks

Random Walks for Mesh Denoising