- 96 Views
- Uploaded on
- Presentation posted in: General

Performance Comparison of Scheduling Algorithms for Peer-to-Peer Collaborative File Distribution

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Performance Comparison of Scheduling Algorithms for Peer-to-Peer Collaborative File Distribution

Presented by: Chan Siu Kei, Jonathan

Supervisors: Prof. VOK Li, Dr. KS Lui

- Introduction
- Communication Model
- Analysis
- Scheduling Algorithms
- Rarest Piece First

- Most Demanding Node First

- Maximum-Flow Algorithms

- Simulation Results
- Future Work
- Conclusion

- P2P file sharing applications are highly popular in the Internet, e.g. BitTorrent, Gnutella, Kazaa, Napster, etc.
- More scalable (faster) compared with traditional client/server approach (e.g. FTP)
- Former research focuses on topics like overlay topology formation, peer discovery, content search, fairness and incentive issues, etc. But seldom look into the data distribution scheduling problem
- We present the first effort and propose a novel Maximum-Flow algorithm to better solve the problem

- Synchronous Scheduling
- same transmission time for every pair of nodes

- Asymmetric Bandwidth
- send p pieces out, receive q pieces in for each cycle

- N = no. of peers, M = no. of file pieces
- F = {F1, F2, …, FM}
- P = NxM possession matrix,
Pij = 1iff node i possesses file piece Fj, otherwise Pij = 0

- Pt =possession matrix at time t
- p = {p1,p2,…,pN} (upload limit vector),
q = {q1,q2,…,qN} (download limit vector)

p = {1,1,2,2,2}, q = {2,3,2,3,3}

- Specifies which file pieces each peer has to send out and to whom
- A possible schedule for P0 with p={1,1,2,2,2}, q={2,3,2,3,3}
- Node 1: send piece 3 to node 2

- Node 2: send piece 4 to node 1

- Node 3: send piece 5 to node 1

send piece 5 to node 2

- Node 4: send piece 6 to node 2

send piece 6 to node 3

- Node 5: send piece 2 to node 4

send piece 7 to node 4

- Formally, we use NxM matrix Sk to represent the schedule at cycle k. From Sk, we can derive transmission matrix Tk (NxM)

e.g. Node 1 receives piece 4 from Node 2, piece 5 from Node 3 => and

- Given Pk-1 and the schedule Sk-1, Tk-1, the possession matrix at next cycle k is Pk = Pk-1 + Tk-1(k > 0)
- The distribution terminates after certain, say k0 cycles, until
- Our goal is to minimize k0, which is the time needed for complete distribution

- Let p = {p1,p2,…,pN}, q = {q1,q2,…,qN} be the upload and download limit vectors. , ,
- Let ri be the total no. of 0s across row i, i.e. , the min. value of k0 is given by
- Let cj be the total no. of 1s along column j, i.e. , we can find the minimum no. of 1s along all columns, , the min. value of k0 is given by
- Let z be the total no. of 0s in P, i.e. , the min. value of k0 is given by

(1)

(2)

(3)

- Combining (1),(2),(3), the lower bound k0 is given by

(4)

From (1),

From (2),

From (3),

- Borrowed from the Rarest Element First algorithm employed in BitTorrent
- Rarity cj of piece j is the no. of peers who have piece j, i.e.

RPF – Node-Oriented: (p={1,1,2,2,2}, q={2,3,2,3,3})

…

RPF – Piece-Oriented: (p={1,1,2,2,2}, q={2,3,2,3,3})

…

- Demand di of node i is the no. of un-received pieces for node i, i.e.
- When choosing recipients, prefer sending to the node with largest di

MDNF – Node-Oriented: (p={1,1,2,2,2}, q={2,3,2,3,3})

6

6

…

4

4

5

MDNF – Piece-Oriented: (p={1,1,2,2,2}, q={2,3,2,3,3})

6

6

…

4

4

5

- The max. no. of transmissions for each cycle cannot be achieved

Using MDNF – Piece-Oriented: (p={2,2,2,1}, q={2,1,2,2})

only 6 transmissions can be scheduled (but the max. is 7)

MDNF (only 6 transmissions)

Maximum is 7 transmissions

Let G = (V,E) to be the flow network graph

L = {L1, L2, …, LN}

R = {R1, R2, …, RN}

- Edmonds-Karp Algorithm:
- Find augmenting paths using BFS
- Guarantee to find maximum # of transmissions in each cycle
- Complexity =

- Pure MaxFlow performance is unsatisfactory, as it does not consider whether we can match more in subsequent cycles

Using MaxFlow, total 3 cycles are needed: (p={2,2,2,2,2}, q={3,3,3,3,3})

…

Using RPF – Node-Oriented, only 2 cycles are needed: (p={2,2,2,2,2}, q={3,3,3,3,3})

- Put weights on both sides to give priorities to some nodes during searching
- Weights on Li = (sum of the no. of 0s in other peers for those pieces that peer i has)
- Weights on Bij =δij (sum of the no. of 0s across row i and column j)
- E.g.
δ42 = 7

For p={2,2,2,2,2}, q={3,3,3,3,3}

Using MaxFlow – Weighted, total 3 cycles are needed:

… P3 = 1

Using MDNF – Piece-Oriented, only 2 cycles are needed:

P2 = 1

- Allows the weights to be dynamically varied within each scheduling cycle

γ = {15,14,25,13,15,10,16,16} and δ43 = 9 which is the greatest value among all δij

Fig. 1 Performance comparison of various scheduling algorithms (All) with varying peer sizes (file size = 100, pi = 2, qi = 3, equal probability for 1s and 0s)

Fig. 2 Performance comparison of various scheduling algorithms (Representative) with varying peer sizes (file size = 100, pi = 2, qi = 3, equal probability for 1s and 0s)

Fig. 3 Performance comparison of various scheduling algorithms (Representative) with varying file sizes (peer size = 10, pi = 2, qi = 3, equal probability for 1s and 0s)

- Study the case of asynchronous scheduling, where the transmission time is different for different pairs of nodes
- Study the case when the network is dynamic in nature, where peers can come and go at any instant and they may shift to communicate with different sets of peers during the distribution process

- The data distribution problem in P2P networks is not well studied in previous research
- We formally define the collaborative file distribution problem with the possession and transmission matrix formulations
- We also deduce a theoretical bound for the minimum distribution time required
- We develop several types of algorithms (RPF, MDNF, MaxFlow) for solving the problem
- Our novel dynamically-weighted max-flow algorithm outperforms all other algorithms by simulations

Q&A