1 / 33

Rateless codes and random walks for P2P resource discovery in Grids

Rateless codes and random walks for P2P resource discovery in Grids. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, NOV . 2012 . Valerio Bioglio Rossano Gaeta Marco Grangetto Matteo Sereno. Outline. Introduction Related Work Proposed System Analysis Simulation Results

sakina
Download Presentation

Rateless codes and random walks for P2P resource discovery in Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rateless codes and random walks for P2P resource discovery in Grids IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, NOV. 2012. Valerio Bioglio Rossano Gaeta Marco Grangetto Matteo Sereno

  2. Outline • Introduction • Related Work • Proposed System • Analysis • Simulation Results • Conclusion

  3. Introduction • The system is presented as a set of nodes connected to form a P2P network. • each node contains a piece of information. • all nodes may leave or join dynamically. • A peer to obtain a local view of global information defined on all peers of a P2P unstructured network. • Every node must communicate to all the participants so as to obtain the information of other peers.

  4. Introduction • Many proposals exploiting unstructured P2P systems share a common characteristic : • The interface peers • have one administrative domain • connect to other interface peers • maintain data of their local nodes • This paper assume • each peer holds a piece of information. • any peer requires to access the data of all other peers at rate λ queries/sec.

  5. Introduction • The goals to be achieved are threefold : • The complete global information can be collect by every node. • The communication overhead must be limited. • The processing power of each node must be used parsimoniously.

  6. Contribution • A continuous flow of control packets exchanged among the nodes using the random walk principle. • The information combined by each node has to be the same version. • The proposed solution is suitable for large size data held by each node.

  7. Outline • Introduction • Related Work • Proposed System • Analysis • Simulation Results • Conclusion

  8. Related Work(1/2) • The flow control used by [6] on the maximum rate at which a participant can submit updates without creating a backlog and devises content reconciliation mechanisms to reduce message redundancy. • Algebraic Gossip, proposed in [11], in this paper a gossip algorithm based on Network Coding is presented, and it is proved that the spreading time of this algorithm is O(K). [6] “Efficient reconciliation and flow control for anti-entropy protocols,” in Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware, LADIS ’08. ACM, 2008. [11] “Algebraic gossip: a network coding approach to optimal multiple rumor mongering,” IEEE Transactions on Information Theory, vol. 52, no. 6, pp. 2486–2507, JUN 2006.

  9. Related Work (2/2) • In [13] distributed fountain codes are proposed for networked storage. To create a new encoded packet, each storage node asks information to a randomly selected node of the network. • A similar algorithm is proposed in [14], but the coded packet formation mechanism is reversed. • The nodes cope with the information gathering and the encoding operations; in [16] this responsibility is assigned to the packets. [13] “Bistributed fountain codes for networked storage,” in IEEE ICASSP, 2006. [14] “Data persistence in large-scale sensor networks with decentralized fountain codes,” in IEEE Infocom, 2007. [16] “Rateless packet approach for data gathering in wireless sensor networks,” Selected Areas in Communications, IEEE Journal on, vol. 28, no. 9, pp. 1169–1179, Sep. 2010.

  10. Outline • Introduction • Related Work • Proposed System • Analysis • Simulation Results • Conclusion

  11. System Description (1/3) • This paper models the interface peers of a Grid system and the connections among them as a graph G(V, E). • Vare the set of interface peers • Eare the set of edges • is node ID • is time-stamp, i.e. generations • is information • m bits each information

  12. System Description (2/3) • To realize a concurrent broadcasting of all the information collected by all the nodes in the network. • all nodes should communicate with each other. • This paper proposes a fully distributed solution based on random walks. • each node starts a limited number ωof packets. • those packets are propagated by random walk in the network. • all the nodes use the packets to solve a system of linear equations.

  13. System Description (3/3) • The shortcomings of network coding • The added computational complexity • Solution • using simple combinations XOR • using ratelesscodes, known as LT codes • The impossibility of asynchronous updating • Solution • asynchronous updating Node A Node B

  14. Random Walk and LT Coding t4 v4 di Header t3 t2 eq2 v2 v3 c t1 eq1 dF v1

  15. Random Walk and LT Coding • When a packet approaches the maximum dimension DIM, the eldest equation carried by it is deleted. • When the acknowledgement timer reaches 0 the receiving node acknowledges the originator that its random walker is still alive.

  16. Asynchronous Update and LT Coding (1/3) • The information spread by the random walkers can be recovered by any node as soon as the number of equations has been collected. • The decoder task can be formulated as the solution of the following system of linear equations Gx = c. • G is an N×N binary matrix. • rows : N possible independent equations collected by the node • x is N×1 column vectors . • N unknown pieces of information • c is the corresponding buffered linear combinations.

  17. Asynchronous Update and LT Coding (2/3) • The nodes are allowed to update their information only when a new generation is initiated. • the vector x is extended to the (ν+1)·N×1 vector ˜x • ˜G turns to be a (ν + 1)N×(ν + 1)N extended decoding matrix • The information collected in the network with a sliding window mechanism including the (ν+1) most recent generations for the information.

  18. Asynchronous Update and LT Coding (3/3) • The idea is to keep the decoding as updated as possible aiming at reconstructing the last N elements of ˜x. • This paper proposes a strategy to manage the extended decoding matrix ˜Gin order to make the decoding process robust to asynchronous updates of the information.

  19. Asynchronous Update Algorithm [21] V. Bioglio, M. Grangetto, R. Gaeta, and M. Sereno, “An optimal partial decoding algorithm for rateless codes,” in IEEE International Symposium on Information Theory (ISIT), aug 2011, pp. 2731 –2735.

  20. Outline • Introduction • Related Work • Proposed System • Analysis • Simulation Results • Conclusion

  21. Recovery Time (1/6) • The time required to spread all the local information to all the participants in the network is defined as recovery time. • Model the recovery time as a function of • the size of the local information m • the number of random walkers generated per node ω • the number of nodes in the network N • the maximum size of the random walk packets DIM.

  22. Recovery Time (2/6) • Given the size DIM(in bits) of the transmission packet. • header size is h • other size is f= DIM − h • the pair (vl, tl) size is g • combined message ci size is m • So, the size of a single equation : • Coded:eC= dig + m, di= 2lnN • Uncoded : eU= g+ m

  23. Recovery Time (3/6) • We can know that nUand nC the maximum number of equations storable in an uncoded and encoded packet are : • . • .

  24. Recovery Time (4/6) • It is possible to predict the number of hops TC required to distribute a certain number of equations RC using the coded approach.

  25. Recovery Time (5/6) • N= 1000 nodes

  26. Recovery Time (6/6) • N = 1000, Nneigh= 50, ω = 1 • 95% confidence interval

  27. Outline • Introduction • Related Work • Proposed System • Analysis • Simulation Results • Conclusion

  28. Simulation Results (1/4) • In order to simulate the real P2P circumstances in networks : • at each time slot 30 random nodes shuffle their neighborhood by exchanging one random neighbor. • when a node joins it connects to a random set of neighboring nodes. when a node leaves its neighbors replace it through the described shuffling mechanism. • keep constant the overall number of packets in the network ideal signaling is assumed

  29. Simulation Results (2/4) • For each node vlwe calculate the percentage of overall information retrieved by that node as a function of time T :

  30. Simulation Results (3/4) • The average value of the previous index computed on the set of nodes A(T) that are active. • All the numerical results based on the previous definitions have been averaged over 30 independent trials so as to guarantee statistically meaningful values.

  31. Simulation Results (4/4)

  32. Outline • Introduction • Related Work • Proposed System • Analysis • Simulation Results • Conclusion

  33. Conclusion • The design of a novel decoder for rateless codes that is robust to asynchronous updates of the information. • The development of a simple analytical model for the estimation of the time required to spread the information. • The encoded system scales better than the uncoded one when the number of nodes in the distributed system increases.

More Related