Performance Comparison of Scheduling Algorithms for Peer-to-Peer Collaborative File Distribution

1 / 24

# Performance Comparison of Scheduling Algorithms for Peer-to-Peer Collaborative File Distribution - PowerPoint PPT Presentation

## Performance Comparison of Scheduling Algorithms for Peer-to-Peer Collaborative File Distribution

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Performance Comparison of Scheduling Algorithms for Peer-to-Peer Collaborative File Distribution Presented by: Chan Siu Kei, Jonathan Supervisors: Prof. VOK Li, Dr. KS Lui

2. Overview • Introduction • Communication Model • Analysis • Scheduling Algorithms - Rarest Piece First - Most Demanding Node First - Maximum-Flow Algorithms • Simulation Results • Future Work • Conclusion

3. Introduction • P2P file sharing applications are highly popular in the Internet, e.g. BitTorrent, Gnutella, Kazaa, Napster, etc. • More scalable (faster) compared with traditional client/server approach (e.g. FTP) • Former research focuses on topics like overlay topology formation, peer discovery, content search, fairness and incentive issues, etc. But seldom look into the data distribution scheduling problem • We present the first effort and propose a novel Maximum-Flow algorithm to better solve the problem

4. Communication Model • Synchronous Scheduling - same transmission time for every pair of nodes • Asymmetric Bandwidth - send p pieces out, receive q pieces in for each cycle

5. Notations and Definitions • N = no. of peers, M = no. of file pieces • F = {F1, F2, …, FM} • P = NxM possession matrix, Pij = 1iff node i possesses file piece Fj, otherwise Pij = 0 • Pt =possession matrix at time t • p = {p1,p2,…,pN} (upload limit vector), q = {q1,q2,…,qN} (download limit vector) p = {1,1,2,2,2}, q = {2,3,2,3,3}

6. Schedule (1) • Specifies which file pieces each peer has to send out and to whom • A possible schedule for P0 with p={1,1,2,2,2}, q={2,3,2,3,3} - Node 1: send piece 3 to node 2 - Node 2: send piece 4 to node 1 - Node 3: send piece 5 to node 1 send piece 5 to node 2 - Node 4: send piece 6 to node 2 send piece 6 to node 3 - Node 5: send piece 2 to node 4 send piece 7 to node 4 • Formally, we use NxM matrix Sk to represent the schedule at cycle k. From Sk, we can derive transmission matrix Tk (NxM) e.g. Node 1 receives piece 4 from Node 2, piece 5 from Node 3 => and

7. Schedule (2) • Given Pk-1 and the schedule Sk-1, Tk-1, the possession matrix at next cycle k is Pk = Pk-1 + Tk-1(k > 0) • The distribution terminates after certain, say k0 cycles, until • Our goal is to minimize k0, which is the time needed for complete distribution

8. Analysis on Lower Bound (1) • Let p = {p1,p2,…,pN}, q = {q1,q2,…,qN} be the upload and download limit vectors. , , • Let ri be the total no. of 0s across row i, i.e. , the min. value of k0 is given by • Let cj be the total no. of 1s along column j, i.e. , we can find the minimum no. of 1s along all columns, , the min. value of k0 is given by • Let z be the total no. of 0s in P, i.e. , the min. value of k0 is given by (1) (2) (3)

9. Analysis on Lower Bound (2) • Combining (1),(2),(3), the lower bound k0 is given by (4) From (1), From (2), From (3),

10. Rarest Piece First (RPF) • Borrowed from the Rarest Element First algorithm employed in BitTorrent • Rarity cj of piece j is the no. of peers who have piece j, i.e. RPF – Node-Oriented: (p={1,1,2,2,2}, q={2,3,2,3,3}) … RPF – Piece-Oriented: (p={1,1,2,2,2}, q={2,3,2,3,3}) …

11. Most Demanding Node First (MDNF) • Demand di of node i is the no. of un-received pieces for node i, i.e. • When choosing recipients, prefer sending to the node with largest di MDNF – Node-Oriented: (p={1,1,2,2,2}, q={2,3,2,3,3}) 6 6 … 4 4 5 MDNF – Piece-Oriented: (p={1,1,2,2,2}, q={2,3,2,3,3}) 6 6 … 4 4 5

12. Problem with RPF and MDNF • The max. no. of transmissions for each cycle cannot be achieved Using MDNF – Piece-Oriented: (p={2,2,2,1}, q={2,1,2,2}) only 6 transmissions can be scheduled (but the max. is 7) MDNF (only 6 transmissions) Maximum is 7 transmissions

13. Maximum-Flow (MaxFlow) Let G = (V,E) to be the flow network graph L = {L1, L2, …, LN} R = {R1, R2, …, RN}

14. Maximum-Flow (MaxFlow) • Edmonds-Karp Algorithm: • Find augmenting paths using BFS • Guarantee to find maximum # of transmissions in each cycle • Complexity =

15. MaxFlow – Counter Example • Pure MaxFlow performance is unsatisfactory, as it does not consider whether we can match more in subsequent cycles Using MaxFlow, total 3 cycles are needed: (p={2,2,2,2,2}, q={3,3,3,3,3}) … Using RPF – Node-Oriented, only 2 cycles are needed: (p={2,2,2,2,2}, q={3,3,3,3,3})

16. MaxFlow - Weighted • Put weights on both sides to give priorities to some nodes during searching • Weights on Li = (sum of the no. of 0s in other peers for those pieces that peer i has) • Weights on Bij =δij (sum of the no. of 0s across row i and column j) • E.g. δ42 = 7

17. MaxFlow – WeightedCounter Example For p={2,2,2,2,2}, q={3,3,3,3,3} Using MaxFlow – Weighted, total 3 cycles are needed: … P3 = 1 Using MDNF – Piece-Oriented, only 2 cycles are needed: P2 = 1

18. MaxFlow – Dynamically-Weighted • Allows the weights to be dynamically varied within each scheduling cycle γ = {15,14,25,13,15,10,16,16} and δ43 = 9 which is the greatest value among all δij

19. Simulation Results (1) Fig. 1 Performance comparison of various scheduling algorithms (All) with varying peer sizes (file size = 100, pi = 2, qi = 3, equal probability for 1s and 0s)

20. Simulation Results (2) Fig. 2 Performance comparison of various scheduling algorithms (Representative) with varying peer sizes (file size = 100, pi = 2, qi = 3, equal probability for 1s and 0s)

21. Simulation Results (3) Fig. 3 Performance comparison of various scheduling algorithms (Representative) with varying file sizes (peer size = 10, pi = 2, qi = 3, equal probability for 1s and 0s)

22. Future Work • Study the case of asynchronous scheduling, where the transmission time is different for different pairs of nodes • Study the case when the network is dynamic in nature, where peers can come and go at any instant and they may shift to communicate with different sets of peers during the distribution process

23. Conclusion • The data distribution problem in P2P networks is not well studied in previous research • We formally define the collaborative file distribution problem with the possession and transmission matrix formulations • We also deduce a theoretical bound for the minimum distribution time required • We develop several types of algorithms (RPF, MDNF, MaxFlow) for solving the problem • Our novel dynamically-weighted max-flow algorithm outperforms all other algorithms by simulations

24. Thank You! Q&A