1 / 31

Looking at the Server-side of P2P Systems

Looking at the Server-side of P2P Systems. Yi Qiao , Dong Lu, Fabian E. Bustamante and Peter A. Dinda Department of Computer Science Northwestern University www.cs.northwestern.edu. What is the Server-side?. No architecture distinction between “client” and “server” for a P2P system

colum
Download Presentation

Looking at the Server-side of P2P Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Looking at the Server-side of P2P Systems Yi Qiao, Dong Lu, Fabian E. Bustamante and Peter A. Dinda Department of Computer Science Northwestern University www.cs.northwestern.edu

  2. What is the Server-side? • No architecture distinction between “client” and “server” for a P2P system • Heterogeneity of peers • Some peers act more like servers – Server Side • Some act more like clients – Client Side • Server-side is important for P2P performance • Little attention has been given

  3. Outline • Background and Motivation • Why scheduling the server-side? • Traces Collection and Study • Scheduling Methodology • Evaluation • Conclusions

  4. Background • Peers in a P2P data-sharing system • Example - Gnutella • Query, query answer – Phase 1 • download, upload – Phase 2 • Role as a client • Send queries, downloading objects • Role as a server • Answer queries, uploading objects • Little research attention

  5. Background (Cont.) Phase 1:Queries and query replies in the P2P file-sharing system Query Query “Shark Tale” ? “Taxi” ? P2 Peer 3 got it! P3 Query Reply No idea! P1 Query Reply P4

  6. Job Queue Background (Cont.) Phase 2:Download/Upload shared files P2 Give me “Taxi” P3 P1 Give me “Shark Tale” P4 Little attention given to the server-side so far…

  7. Motivation • Server-side is a key performance bottleneck of P2P data-sharing system • 80% of download requests get rejected due to saturation of server capacity [Saroiu 2002] • User-limited capacity, particularly, number of server threads • 50% of all objects downloads take more than one day [Gummadi 2003] • Our goal • Server load characterization and analysis • New scheduling policies to shorten average response time for each download

  8. Challenge • Introduction of SRPT into web server scheduling has been very successful, but are more tricky for P2P server side… • Requests are often not for whole objects • P2P servers are conservative with resource consumption • Popular P2P servers often operate under overloaded conditions • Fetch-at-most-once behavior makes object popularity NOT Zipf distribution [Gummadi 2003] • New scheduling policies based on P2P’s own characteristics are needed

  9. Outline • Background and Motivation • Why scheduling the server-side? • Traces Collection and Study • Scheduling Methodology • Evaluation • Conclusions

  10. Trace Collection and Study • Trace Collection Methodology • Build “honey pots” • Passive monitoring of query strings • Download hot contents based on query popularity • Run “honey pots” • Make collected objects available to the community • Record incoming download requests • Arrival time, object name, requested size, downloaded size, service time, … • Findings reported here based Gnutella traces

  11. Traces in the Study Different connection type, server thread number, shared object number, request number

  12. Server Workload • Distribution of job interarrival time? • Distribution of job size? • What is the performance bottleneck? • Why scheduling?

  13. Job Interarrivals • Job interarrivals can be well modeled by an exponential distribution • Coefficient of determination • Almost straight line in the semi-log plot

  14. Job Arrivals are Independent • Effectively nil • Jobs arrivals are independent of each other • Significant difference with web server

  15. Job Sizes • Three different job sizes • Full object size • Requested data chunk size • Unique for P2P server • A request typically only for a small chunk size • Served data chunk size • Unique for P2P server • Abort transfer, switch to another one • Known only after job is done

  16. Job Sizes (Cont.) Object Size Requested Chunk Size Served Chunk Size • Three different job sizes • Differs by several orders of magnitude • Approximated by Bounded Pareto distribution

  17. Server Resource Utilization • Resource utilization are conservative • Only run at background of normal computers • Set upper-bound for • Number of server threads • Aggregate bandwidth usage for upload • For our busiest honey-pot • 1.2% to 20.0% CPU utilization • Up to 20MBytes memory usage • Bottleneck resource • The set of server threads for uploading

  18. Our Scheduling Problem Given the total number of concurrent jobs that a server can take, how to schedule incoming jobs so that the mean response time is minimized?

  19. Outline • Background and Motivation • Why scheduling the server-side? • Traces Collection and Study • Scheduling Methodology • Evaluation • Conclusions

  20. Scheduling Policies • Shortest Remaining Processing Time (SRPT) • Always choose the process with the shortest remaining processing time to serve • First-Come-First-Served (FCFS) • Serve incoming download requests based on arrival order • Used by Gnutella for its job scheduling • Processor Sharing (PS) • Each job gets equal amount of service time in turn

  21. SRPT • Studied since the 1960s [Schrage 1968] • Used for various applications • Packet network scheduling [Bux 1983] • Scheduling for web servers [Harchol-Balter 2001] • Optimal for mean response time of jobs for a general G/G/1 queuing system • Problem • In most cases, service time is unknown until the job is done

  22. SRPT for P2P Servers • Main Challenge • How to estimate service time for a request is not that clear! File size / Requested Chunk size / Served chunk size? • One possible approach • Use request chunk size as the scheduling metric • SRPT-CS – Uses requested chunk size • Two optimal approaches • Use served chunk size as the scheduling metric • SRPT-SS – Uses served chunk size • Ideal SRPT • How well can they do?

  23. Approximating ideal SRPT • Depends on the correlations between Requested Chunk Size, Served Chunk Size and Service time • But these correlations are weak • Why? • Client can exit anytime during transmission • Client can switch to other servers for a data chunk • Bandwidth bottlenecks exist somewhere else

  24. Outline • Background and Motivation • Why scheduling the server-side? • Traces Collection and Study • Scheduling Methodology • Evaluation • Conclusions

  25. Evaluation • Evaluation Setup • Using a general purpose queuing simulator • Various scheduling policies • Trace driven simulations • Queue capacity 500 • System load between 0.1 and 10 • Time slice of 0.01 seconds for PS scheduling • Metric • Mean response time • Rejection rate • Mean slowdown

  26. Improved Mean Response Time • Ideal SRPT is the best • SRPT-CS does much better than FCFS and PS FCFS PS SRPT-CS SRPT-SS SRPT

  27. With Lowest Rejection Rate • SRPT-based scheduling policies actually reject less jobs than FCFS and PS FCFS SRPT SRPT-CS & SRPT-SS

  28. Without Compromising Fairness • SRPT-based scheduling policies don’t starve large jobs Mean slowdown for 10% largest jobs

  29. Summary • Server-side of P2P is critical to overall system performance • Not much can be learned from web server scheduling • SRPT-based scheduling policies can help • Lowest mean response time • Lowest rejection rate • Without compromising fairness • Chunk size is a reasonable estimator for service time • SRPT-CS outperforms FCFS and PS

  30. Ongoing Work • Large performance gaps between SRPT-CS, SRPT-SS, and SRPT • Only SRPT-CS can be directly implemented • Possible solution – predicting served chunk size and service time using time series analysis • Traces representativeness • Performance in real implementation • Cooperative downloading/uploading? • Better estimator

  31. For more information www.aqualab.cs.northwestern.edu • Please also see our related work Dong Lu, Huanyuan Sheng, Peter Dinda. "Size-Based Scheduling Policies with Inaccurate Scheduling Information”. In Proc. of MASCOTS, 2004. Dong Lu, Peter A. Dinda, Yi Qiao, Huanyuan Sheng and Fabián E. Bustamante. “Applications of SRPT Scheduling with Inaccurate Information”. in Proc. of MASCOTS, 2004.

More Related