1 / 45

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload. K.P. Gummadi, R. J. Dunn, et al SOSP’03 Presented by Lu-chuan Kung kung@uiuc.edu. Outline. Trace methodology and analysis User characteristics Client activities Object dynamics

RexAlvis
Download Presentation

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload K.P. Gummadi, R. J. Dunn, et al SOSP’03 Presented by Lu-chuan Kung kung@uiuc.edu

  2. Outline • Trace methodology and analysis • User characteristics • Client activities • Object dynamics • Analyze why Kazaa workload is not Zipf • A model of P2P file-sharing workloads • A study of bandwidth-saving techniques • Conclusion

  3. Trace Methodology • Passively collect Kazaa traffic at the border of campus network and internet • Query traffic was not captured b/c of encryption. File transfers are HTTP transfers w/ Kazaa-specific header • Summary statistics of the trace:

  4. Kazaa Users Are Patient • Transfer time: the difference between the start time and the end time of a request • Small objects: <10MB (mostly audio files) • Large objects: >100MB (typically video files)

  5. User Slow Down As They Age • Do people become hungrier for content as they gain experience with Kazaa? • Older clients requested fewer bytes b/c: • Attrition: population declines as clients age • Slowing down: older clients ask for less

  6. Client Activity • It’s difficult to quantify the availability of clients in a p2p system • Client activity includes: • Activity fraction: time spent in transfers / duration of lifetime. Lower bound on availability • Average session length: typical duration length

  7. Object Characteristics • Kazaa is not one workload • Kazaa is a blend of workloads of different properties • 3 ranges of objects: small (<10MB), medium (10MB~100GB), and large (>100GB) • Majority of requests are for smaller objects • Most bytes transferred are due to large objects

  8. Kazaa Object Dynamics • Multimedia objects are immutable, therefore affect object dynamics • Kazaa clients fetch objects at most once • Kazaa client requests an object once: 94% of time • Kazaa client requests an object twice: 99% of time • Most requests are for old (repeated) objects • An object is old if at least one month has passed since the first request of the object • 72% of requests for large objects are old • 52% of requests for small objects are old

  9. Kazaa Object Dynamics • The popularity of Kazaa objects is often short-lived • The most popular pages remains stable for the Web • Popularity is fleeting in Kazaa • Audio files lose popularity faster than popular video files • The most popular Kazaa objects tend to be recently born objects • Newly born objects: did not receive any requests during the first month of the trace

  10. Kazaa Is Not Zipf • Zipf’s law: • The popularity of ith-most popular object is proportional to i-α, α: Zipf coefficient • Kazaa is not Zipf • Most popular objects are less popular than Zipf would predict

  11. Why Kazaa Is Not Zipf • Fetch-repeatly vs. fetch-at-most-once • Simulate the two cases based on the same Zipf distribution • The result of fetch-at-most-once is similar to Kazaa. • Non-Zipf workloads are also observed in web proxy caches and VoD servers

  12. A Model of P2P File-Sharing Workloads • Hypothesis: underlying popularity of objects in a fetch-at-most-once system is driven by Zipf’s law • A client requests 2 objects per day. Choose which object to fetch from Zipf(1) • An object is born with rate λo , its popularity rank is selected from Zipf(1) • Total object population cannot be observed from the trace. Use back-inference: given 18,000 distinct objects are requested in the trace, what’s the total number of objects? Ans: 40,000

  13. Model Structure and Notation • Parameter value are chosen to reflect the measured data from the trace

  14. File-Sharing Effectiveness • How should organization exepect bandwidth demand to change over time, given a shared proxy server? • Hit rate of the proxy cache decreases in the fetch-at-most-once case • Fetch-at-most-once clients consume the most popular objects early

  15. New Object Arrivals Improve Hit Rate • Object updates in Web lower the hit rate • New objects arrivals are beneficial in P2P system • Arrivals of popular objects increase hit rate • If no arrivals, clients are forced to choose from the remaining unpopular objects

  16. New Clients Cannot Stabilize Performance • The infusion of new clients at a constant rate cannot compensate for the increasing number of old clients • If we want to keep hit rate as a constant, we need exponential client arrival rate

  17. Model Validation • Underlying Zipf assumption cannot be validated directly. • Use the proposed model to replicate the object popularity distribution in the trace • Estimate various parameters • Arrival rate of new objects is chosen to fit the measured data. λo = 5,475 objects per year

  18. Exploring Locality-aware Request Routing • A significant fraction of Internet bandwidth is consumed by Kazaa • How would exploitation of locality help to save bandwidth? • Different ways to exploit locality: • A centralized proxy cache placed at organization border • Request redirection: favor organization-internal peers • Centralized request redirection • Decentralized request redirection

  19. An Ideal Proxy Cache • Assume an ideal proxy: infinite capacity and bandwidth • 86% of external bandwidth would be saved • However, some may not want to store P2P file-sharing content in a proxy server due to legal issues

  20. Benefits of Locality-Awareness • Trace-based simulation • Infinite storage capacity • At most 12 concurrent downloads • Upload bandwidth 500 Kb/s • External bandwidth 100 Kb/s • Clients are available only when they’re transferring (a very conservative assumption) • Cold misses: objects cannot be found in peers • Busy misses: objects found but the peer is unavailable due to concurrent transfers

  21. Benefits of Locality-Awareness • Locality awareness obtained 68% byte hit rate for large objects and 37% byte hit rate for small objects • A substantial number of miss bytes (62% of large objects, 43% of small objects) are due to unavailable clients

  22. Benefits of Increased Availability • Most of bytes served and consumed come from highly available peers • Adding availability to the most available hosts earns a higher hit rate than adding to the least available host

  23. Conclusion • P2P file-sharing workloads are different to Web workloads • User are patient • Aged clients demand less • Fetch-at-most once • The proposed model suggests that client births and object births are the fundamental forces driving P2P workloads • There’s significant locality in the Kazaa workload • Locality-aware peers would save 63% external transfers even under conservation assumption

  24. Comments • Some of the observed characteristic may be related to the design of Kazaa and the measuring methodology and thus cannot be generalized • The lack of portal sites in P2P system may also be a reason that most popular objects in P2P are less popular than Zipf’s law would predict

  25. Assessing the Quality of Voice Communications Over Internet Backbones A.P. Markopoulou, F.A. Tobagi, M.J. Karam Tran. on Networking v11 no5 Oct 2003 Presented by Lu-chuan Kung

  26. Outline • VoIP System • Playout schemes • Voice Impairment in Networks • Internet measurements • Numerical results • Discussion

  27. VoIP System

  28. VoIP System • Speech signal • Talkspurts have mean ~ 352ms • Silence periods have mean ~ 650ms • Encoding schemes • Packetizer: add headers for different protocols • Playout buffer: packets are held for a later playout time in order to smooth playout • Decoder: reconstruct the speech signal

  29. Playout Schemes • Two types: fixed and adaptive • Fixed playout scheme: • End-to-end delay p is the same for all packets • Large delay decreases packet loss due to late arrivals, but also decreases interactivity • Adaptive playout scheme: • Estimate p based on delay dav and delay variation v • p = dav + 4v • Estimate p • Talkspurt by talkspurt • Packet by packet

  30. Voice Impairment in Networks • Quality of voice is affected by • Encoding • Packet loss • Network delay jitter • End-to-end delay • Echo • End-to-end delay consists of • Encoding delay • Packetization delay • Network delay • Playout bufferring delay • Decoding delay

  31. Assessment of Voice Communication in Packet Networks • Mean Opinion Score (MOS): a subjective rating given by listeners, given on a scale of 1-5 • Intrinsic quality MOSintr: quality after compression

  32. Degradation Due to Loss • PLC: Packet Loss Concealment • Convert loss rate to MOS

  33. Loss of Interactivity • Loss of interactivity due to large end-to-end delay • NTT study • 6 conversation modes (tasks), task 1 is the hardest, task 6 is the most relaxed type

  34. Echo Impairment • Echo can cause major quality degradation • The effect of echo is a function of delay and echo losses

  35. Emodel • Published by ITU-T. Provide formulas to predict MOS of voice quality • R = (R0 – Is) – Id – Ie + A • R0 : basic SNR • Is : impairment of signal, eg. sidetone and PCM • Id: impairment due to delay (echo + interactivity) • Ie : impairment due to distortion (loss) • A : advantage factor (lenient users)

  36. Internet Measurements • Probe measurement • 5 major U.S. cities • 43 paths in total • 7 providers: P1,P2,…,P7 • 50 bytes probes sent every 10 ms

  37. Observations on the Traces • Duration of the trace: 3 days • Network loss • 6 out of 7 providers have outages • Outages happened at least once per day • Delay characteristics • Delay spikes • Alternation between high and low states • Periodic clustered delay spikes

  38. Delay Characteristics

  39. Consistent Characteristics Per Provider

  40. One Example Call • Apply emodel to the traces using different playout buffer scheme • Example of a 15-min call

  41. One Example Call • Fixed playout incurs many losses in the last 5 mins

  42. How to Choose p for Fixed Scheme • Tradeoff between loss and delay • There is a optimal value of delay to achieve maximum MOS value

  43. Example Path – Many Calls • Random calls uniformly spread over an hour • 150 short (3.5-min) and 50 long (10-min) calls • Plot CDF vs. MOS Fixed Playout Adaptive Playout

  44. Discussion • Backbone networks have a wide range of performance • Some are already able to support high quality voice communications • Some are barely able to provide acceptable VoIP service (MOS >3.6) • Reliability problems are more serious than QoS service mechanisms

  45. Comments • How representative are the chosen paths among the typical paths on Internet? • End host to end host paths have larger delay

More Related