Performance Comparison of Scheduling Algorithms for Peer-to-Peer Collaborative File Distribution

Performance Comparison of Scheduling Algorithms for Peer-to-Peer Collaborative File Distribution Presented by: Chan Siu Kei, Jonathan Supervisors: Prof. VOK Li, Dr. KS Lui

Overview • Introduction • Communication Model • Analysis • Scheduling Algorithms - Rarest Piece First - Most Demanding Node First - Maximum-Flow Algorithms • Simulation Results • Future Work • Conclusion

Introduction • P2P file sharing applications are highly popular in the Internet, e.g. BitTorrent, Gnutella, Kazaa, Napster, etc. • More scalable (faster) compared with traditional client/server approach (e.g. FTP) • Former research focuses on topics like overlay topology formation, peer discovery, content search, fairness and incentive issues, etc. But seldom look into the data distribution scheduling problem • We present the first effort and propose a novel Maximum-Flow algorithm to better solve the problem

Communication Model • Synchronous Scheduling - same transmission time for every pair of nodes • Asymmetric Bandwidth - send p pieces out, receive q pieces in for each cycle

Notations and Definitions • N = no. of peers, M = no. of file pieces • F = {F1, F2, …, FM} • P = NxM possession matrix, Pij = 1iff node i possesses file piece Fj, otherwise Pij = 0 • Pt =possession matrix at time t • p = {p1,p2,…,pN} (upload limit vector), q = {q1,q2,…,qN} (download limit vector) p = {1,1,2,2,2}, q = {2,3,2,3,3}

Schedule (1) • Specifies which file pieces each peer has to send out and to whom • A possible schedule for P0 with p={1,1,2,2,2}, q={2,3,2,3,3} - Node 1: send piece 3 to node 2 - Node 2: send piece 4 to node 1 - Node 3: send piece 5 to node 1 send piece 5 to node 2 - Node 4: send piece 6 to node 2 send piece 6 to node 3 - Node 5: send piece 2 to node 4 send piece 7 to node 4 • Formally, we use NxM matrix Sk to represent the schedule at cycle k. From Sk, we can derive transmission matrix Tk (NxM) e.g. Node 1 receives piece 4 from Node 2, piece 5 from Node 3 => and

Schedule (2) • Given Pk-1 and the schedule Sk-1, Tk-1, the possession matrix at next cycle k is Pk = Pk-1 + Tk-1(k > 0) • The distribution terminates after certain, say k0 cycles, until • Our goal is to minimize k0, which is the time needed for complete distribution

Analysis on Lower Bound (1) • Let p = {p1,p2,…,pN}, q = {q1,q2,…,qN} be the upload and download limit vectors. , , • Let ri be the total no. of 0s across row i, i.e. , the min. value of k0 is given by • Let cj be the total no. of 1s along column j, i.e. , we can find the minimum no. of 1s along all columns, , the min. value of k0 is given by • Let z be the total no. of 0s in P, i.e. , the min. value of k0 is given by (1) (2) (3)

Analysis on Lower Bound (2) • Combining (1),(2),(3), the lower bound k0 is given by (4) From (1), From (2), From (3),

Rarest Piece First (RPF) • Borrowed from the Rarest Element First algorithm employed in BitTorrent • Rarity cj of piece j is the no. of peers who have piece j, i.e. RPF – Node-Oriented: (p={1,1,2,2,2}, q={2,3,2,3,3}) … RPF – Piece-Oriented: (p={1,1,2,2,2}, q={2,3,2,3,3}) …

Most Demanding Node First (MDNF) • Demand di of node i is the no. of un-received pieces for node i, i.e. • When choosing recipients, prefer sending to the node with largest di MDNF – Node-Oriented: (p={1,1,2,2,2}, q={2,3,2,3,3}) 6 6 … 4 4 5 MDNF – Piece-Oriented: (p={1,1,2,2,2}, q={2,3,2,3,3}) 6 6 … 4 4 5

Problem with RPF and MDNF • The max. no. of transmissions for each cycle cannot be achieved Using MDNF – Piece-Oriented: (p={2,2,2,1}, q={2,1,2,2}) only 6 transmissions can be scheduled (but the max. is 7) MDNF (only 6 transmissions) Maximum is 7 transmissions

Maximum-Flow (MaxFlow) Let G = (V,E) to be the flow network graph L = {L1, L2, …, LN} R = {R1, R2, …, RN}

Maximum-Flow (MaxFlow) • Edmonds-Karp Algorithm: • Find augmenting paths using BFS • Guarantee to find maximum # of transmissions in each cycle • Complexity =

MaxFlow – Counter Example • Pure MaxFlow performance is unsatisfactory, as it does not consider whether we can match more in subsequent cycles Using MaxFlow, total 3 cycles are needed: (p={2,2,2,2,2}, q={3,3,3,3,3}) … Using RPF – Node-Oriented, only 2 cycles are needed: (p={2,2,2,2,2}, q={3,3,3,3,3})

MaxFlow - Weighted • Put weights on both sides to give priorities to some nodes during searching • Weights on Li = (sum of the no. of 0s in other peers for those pieces that peer i has) • Weights on Bij =δij (sum of the no. of 0s across row i and column j) • E.g. δ42 = 7

MaxFlow – WeightedCounter Example For p={2,2,2,2,2}, q={3,3,3,3,3} Using MaxFlow – Weighted, total 3 cycles are needed: … P3 = 1 Using MDNF – Piece-Oriented, only 2 cycles are needed: P2 = 1

MaxFlow – Dynamically-Weighted • Allows the weights to be dynamically varied within each scheduling cycle γ = {15,14,25,13,15,10,16,16} and δ43 = 9 which is the greatest value among all δij

Simulation Results (1) Fig. 1 Performance comparison of various scheduling algorithms (All) with varying peer sizes (file size = 100, pi = 2, qi = 3, equal probability for 1s and 0s)

Simulation Results (2) Fig. 2 Performance comparison of various scheduling algorithms (Representative) with varying peer sizes (file size = 100, pi = 2, qi = 3, equal probability for 1s and 0s)

Simulation Results (3) Fig. 3 Performance comparison of various scheduling algorithms (Representative) with varying file sizes (peer size = 10, pi = 2, qi = 3, equal probability for 1s and 0s)

Future Work • Study the case of asynchronous scheduling, where the transmission time is different for different pairs of nodes • Study the case when the network is dynamic in nature, where peers can come and go at any instant and they may shift to communicate with different sets of peers during the distribution process

Conclusion • The data distribution problem in P2P networks is not well studied in previous research • We formally define the collaborative file distribution problem with the possession and transmission matrix formulations • We also deduce a theoretical bound for the minimum distribution time required • We develop several types of algorithms (RPF, MDNF, MaxFlow) for solving the problem • Our novel dynamically-weighted max-flow algorithm outperforms all other algorithms by simulations

Thank You! Q&A

Performance Comparison of Scheduling Algorithms for Peer-to-Peer Collaborative File Distribution

Performance Comparison of Scheduling Algorithms for Peer-to-Peer Collaborative File Distribution

Presentation Transcript

Comparing Peer to Peer File Sharing Technologies

Seed Scheduling for Peer-to-Peer Networks

A Survey of Peer-to-Peer Content Distribution Technologies

Peer to Peer File Sharing

Forensics Investigation of Peer-to-Peer File Sharing Networks

Peer-to-Peer (P2P) File Systems

A Survey of Peer-to-Peer Content Distribution Technologies

Peer-to-Peer Based Multimedia Distribution Service

Hybrid Peer-to-Peer Media Distribution Systems: a Performance Study

Performance Analysis of Peer-to-Peer File Transfer Network

Scalable Overlay Network for Peer-to-Peer File Sharing

Analysis and Design of Algorithms for Peer-to-Peer Networks

Peer to peer and file sharing applications

Peer-to-Peer Supported Cache System for File Transfer

File sharing in peer to peer Netwoks

Simple PEer to peER File System (SPEERFS)

Peer-to-Peer Based Multimedia Distribution Service

Peer-to-peer Grids and Collaborative Environments

Peer To Peer File Transfer with Torrent

Collaborative Web Services and Peer-to-peer Grids

Collaborative Peer-to-peer Grids for Education and Research

Peer-to-Peer Search Algorithms