Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload K.P. Gummadi, R. J. Dunn, et al SOSP’03 Presented by Lu-chuan Kung kung@uiuc.edu

Outline • Trace methodology and analysis • User characteristics • Client activities • Object dynamics • Analyze why Kazaa workload is not Zipf • A model of P2P file-sharing workloads • A study of bandwidth-saving techniques • Conclusion

Trace Methodology • Passively collect Kazaa traffic at the border of campus network and internet • Query traffic was not captured b/c of encryption. File transfers are HTTP transfers w/ Kazaa-specific header • Summary statistics of the trace:

Kazaa Users Are Patient • Transfer time: the difference between the start time and the end time of a request • Small objects: <10MB (mostly audio files) • Large objects: >100MB (typically video files)

User Slow Down As They Age • Do people become hungrier for content as they gain experience with Kazaa? • Older clients requested fewer bytes b/c: • Attrition: population declines as clients age • Slowing down: older clients ask for less

Client Activity • It’s difficult to quantify the availability of clients in a p2p system • Client activity includes: • Activity fraction: time spent in transfers / duration of lifetime. Lower bound on availability • Average session length: typical duration length

Object Characteristics • Kazaa is not one workload • Kazaa is a blend of workloads of different properties • 3 ranges of objects: small (<10MB), medium (10MB~100GB), and large (>100GB) • Majority of requests are for smaller objects • Most bytes transferred are due to large objects

Kazaa Object Dynamics • Multimedia objects are immutable, therefore affect object dynamics • Kazaa clients fetch objects at most once • Kazaa client requests an object once: 94% of time • Kazaa client requests an object twice: 99% of time • Most requests are for old (repeated) objects • An object is old if at least one month has passed since the first request of the object • 72% of requests for large objects are old • 52% of requests for small objects are old

Kazaa Object Dynamics • The popularity of Kazaa objects is often short-lived • The most popular pages remains stable for the Web • Popularity is fleeting in Kazaa • Audio files lose popularity faster than popular video files • The most popular Kazaa objects tend to be recently born objects • Newly born objects: did not receive any requests during the first month of the trace

Kazaa Is Not Zipf • Zipf’s law: • The popularity of ith-most popular object is proportional to i-α, α: Zipf coefficient • Kazaa is not Zipf • Most popular objects are less popular than Zipf would predict

Why Kazaa Is Not Zipf • Fetch-repeatly vs. fetch-at-most-once • Simulate the two cases based on the same Zipf distribution • The result of fetch-at-most-once is similar to Kazaa. • Non-Zipf workloads are also observed in web proxy caches and VoD servers

A Model of P2P File-Sharing Workloads • Hypothesis: underlying popularity of objects in a fetch-at-most-once system is driven by Zipf’s law • A client requests 2 objects per day. Choose which object to fetch from Zipf(1) • An object is born with rate λo , its popularity rank is selected from Zipf(1) • Total object population cannot be observed from the trace. Use back-inference: given 18,000 distinct objects are requested in the trace, what’s the total number of objects? Ans: 40,000

Model Structure and Notation • Parameter value are chosen to reflect the measured data from the trace

File-Sharing Effectiveness • How should organization exepect bandwidth demand to change over time, given a shared proxy server? • Hit rate of the proxy cache decreases in the fetch-at-most-once case • Fetch-at-most-once clients consume the most popular objects early

New Object Arrivals Improve Hit Rate • Object updates in Web lower the hit rate • New objects arrivals are beneficial in P2P system • Arrivals of popular objects increase hit rate • If no arrivals, clients are forced to choose from the remaining unpopular objects

New Clients Cannot Stabilize Performance • The infusion of new clients at a constant rate cannot compensate for the increasing number of old clients • If we want to keep hit rate as a constant, we need exponential client arrival rate

Model Validation • Underlying Zipf assumption cannot be validated directly. • Use the proposed model to replicate the object popularity distribution in the trace • Estimate various parameters • Arrival rate of new objects is chosen to fit the measured data. λo = 5,475 objects per year

Exploring Locality-aware Request Routing • A significant fraction of Internet bandwidth is consumed by Kazaa • How would exploitation of locality help to save bandwidth? • Different ways to exploit locality: • A centralized proxy cache placed at organization border • Request redirection: favor organization-internal peers • Centralized request redirection • Decentralized request redirection

An Ideal Proxy Cache • Assume an ideal proxy: infinite capacity and bandwidth • 86% of external bandwidth would be saved • However, some may not want to store P2P file-sharing content in a proxy server due to legal issues

Benefits of Locality-Awareness • Trace-based simulation • Infinite storage capacity • At most 12 concurrent downloads • Upload bandwidth 500 Kb/s • External bandwidth 100 Kb/s • Clients are available only when they’re transferring (a very conservative assumption) • Cold misses: objects cannot be found in peers • Busy misses: objects found but the peer is unavailable due to concurrent transfers

Benefits of Locality-Awareness • Locality awareness obtained 68% byte hit rate for large objects and 37% byte hit rate for small objects • A substantial number of miss bytes (62% of large objects, 43% of small objects) are due to unavailable clients

Benefits of Increased Availability • Most of bytes served and consumed come from highly available peers • Adding availability to the most available hosts earns a higher hit rate than adding to the least available host

Conclusion • P2P file-sharing workloads are different to Web workloads • User are patient • Aged clients demand less • Fetch-at-most once • The proposed model suggests that client births and object births are the fundamental forces driving P2P workloads • There’s significant locality in the Kazaa workload • Locality-aware peers would save 63% external transfers even under conservation assumption

Comments • Some of the observed characteristic may be related to the design of Kazaa and the measuring methodology and thus cannot be generalized • The lack of portal sites in P2P system may also be a reason that most popular objects in P2P are less popular than Zipf’s law would predict

Assessing the Quality of Voice Communications Over Internet Backbones A.P. Markopoulou, F.A. Tobagi, M.J. Karam Tran. on Networking v11 no5 Oct 2003 Presented by Lu-chuan Kung

Outline • VoIP System • Playout schemes • Voice Impairment in Networks • Internet measurements • Numerical results • Discussion

VoIP System

VoIP System • Speech signal • Talkspurts have mean ~ 352ms • Silence periods have mean ~ 650ms • Encoding schemes • Packetizer: add headers for different protocols • Playout buffer: packets are held for a later playout time in order to smooth playout • Decoder: reconstruct the speech signal

Playout Schemes • Two types: fixed and adaptive • Fixed playout scheme: • End-to-end delay p is the same for all packets • Large delay decreases packet loss due to late arrivals, but also decreases interactivity • Adaptive playout scheme: • Estimate p based on delay dav and delay variation v • p = dav + 4v • Estimate p • Talkspurt by talkspurt • Packet by packet

Voice Impairment in Networks • Quality of voice is affected by • Encoding • Packet loss • Network delay jitter • End-to-end delay • Echo • End-to-end delay consists of • Encoding delay • Packetization delay • Network delay • Playout bufferring delay • Decoding delay

Assessment of Voice Communication in Packet Networks • Mean Opinion Score (MOS): a subjective rating given by listeners, given on a scale of 1-5 • Intrinsic quality MOSintr: quality after compression

Degradation Due to Loss • PLC: Packet Loss Concealment • Convert loss rate to MOS

Loss of Interactivity • Loss of interactivity due to large end-to-end delay • NTT study • 6 conversation modes (tasks), task 1 is the hardest, task 6 is the most relaxed type

Echo Impairment • Echo can cause major quality degradation • The effect of echo is a function of delay and echo losses

Emodel • Published by ITU-T. Provide formulas to predict MOS of voice quality • R = (R0 – Is) – Id – Ie + A • R0 : basic SNR • Is : impairment of signal, eg. sidetone and PCM • Id: impairment due to delay (echo + interactivity) • Ie : impairment due to distortion (loss) • A : advantage factor (lenient users)

Internet Measurements • Probe measurement • 5 major U.S. cities • 43 paths in total • 7 providers: P1,P2,…,P7 • 50 bytes probes sent every 10 ms

Observations on the Traces • Duration of the trace: 3 days • Network loss • 6 out of 7 providers have outages • Outages happened at least once per day • Delay characteristics • Delay spikes • Alternation between high and low states • Periodic clustered delay spikes

Delay Characteristics

Consistent Characteristics Per Provider

One Example Call • Apply emodel to the traces using different playout buffer scheme • Example of a 15-min call

One Example Call • Fixed playout incurs many losses in the last 5 mins

How to Choose p for Fixed Scheme • Tradeoff between loss and delay • There is a optimal value of delay to achieve maximum MOS value

Example Path – Many Calls • Random calls uniformly spread over an hour • 150 short (3.5-min) and 50 long (10-min) calls • Plot CDF vs. MOS Fixed Playout Adaptive Playout

Discussion • Backbone networks have a wide range of performance • Some are already able to support high quality voice communications • Some are barely able to provide acceptable VoIP service (MOS >3.6) • Reliability problems are more serious than QoS service mechanisms

Comments • How representative are the chosen paths among the typical paths on Internet? • End host to end host paths have larger delay

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload