210 likes | 215 Views
Analyzing Peer-to-Peer Traffic Across Large Networks. Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research. P2P applications. Distributed file sharing Napster, Gnutella, FastTrack, EDonkey, DirectConnect… Searching v.s. data fetching phases
E N D
Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research
P2P applications • Distributed file sharing • Napster, Gnutella, FastTrack, EDonkey, DirectConnect… • Searching v.s. data fetching phases • All the communications occur over default ports • SuperNodes and Hubs • Why is this interesting? • Large and growing traffic volume Analyzing peer-to-peer traffic accoss large networks
Outline • Methodology • Data collection • Characterization metrics • Analysis results • Traffic volume and overlay topology • System dynamics • Traffic characterization • P2P vs Web Analyzing peer-to-peer traffic accoss large networks
Methodology • Challenges • Decentralized system • Transient peer membership • Some popular close proprietary protocols • Large-scale passive measurement • Flow-level data from routers across a large tier-1 ISP backbone • Analyze both signaling and data fetching traffic • 3 levels of granularity: IP, Prefix, AS • P2P protocols • FastTrack:1214 (including Morpheus) • Gnutella:6346/6347 • DirectConnect:411/412 Analyzing peer-to-peer traffic accoss large networks
Methodology Discussion • Advantages • Requires minimal knowledge of P2P protocols: port number • Large scale non-intrusive measurement • More complete view of P2P traffic • Allows localized analysis • Limitations • Flow-level data: no application-level details • Incomplete traffic flows • Other issues • DHCP, NAT, proxy • Host IP • Asymmetric IP routing Analyzing peer-to-peer traffic accoss large networks
Measurements • Characterization • Overlay network topology • Traffic distribution • Dynamic behavior • Metrics • Host distribution • Host connectivity • Traffic volume • Mean bandwidth usage • Traffic pattern over time • Connection duration and on-time Analyzing peer-to-peer traffic accoss large networks
Data cleaning • Invalid IPs • 10.0.0.0-10.255.255.255 • 172.16.0.0-172.31.255.255.255 • 192.168.0.0-192.168.255.255 • No matched prefixes in routing tables • Invalid AS numbers • > 64512 • Removed 4% flows Analyzing peer-to-peer traffic accoss large networks
Overview of P2P traffic • Total 800 million flow records • FastTrack is the most popular one Analyzing peer-to-peer traffic accoss large networks
Host distribution Analyzing peer-to-peer traffic accoss large networks
Host connectivity FastTrack (9/14/2001) Connectivity is very small for most hosts, very high for few hosts Distribution is less skewed at prefix and AS levels Analyzing peer-to-peer traffic accoss large networks
Traffic volume distribution FastTrack (9/14/2001) • Significant skews in traffic volume across granularities • Few entities source most of the traffic • Few entities receive most of the traffic Analyzing peer-to-peer traffic accoss large networks
Mean bandwidth usage FastTrack (9/14/2001) • Upstream usage < downstream usage. Possible causes are • Asymmetric available BW, e.g., DSL, cable • Users/ISPs rate-limiting upstream data transfers Analyzing peer-to-peer traffic accoss large networks
Time of day effect FastTrack (9/14/2001 GMT) • Traffic volume exhibits very strong time-of-day effect • Milder time-of-day variation for # hosts in the system Analyzing peer-to-peer traffic accoss large networks
Host connection duration & on-time FastTrack (9/14/2001) thd=30min • Substantial transience: most hosts stay in the system for a short time • Distribution less skewed at the prefix and AS levels • Using per-cluster or per-AS indexing/caching nodes may help Analyzing peer-to-peer traffic accoss large networks
Traffic characterization • The power law • May not be a suitable model for P2P traffic • Relationship between metrics • Traffic volume • Number of IPs • On-time • Mean bandwidth usage Analyzing peer-to-peer traffic accoss large networks
Traffic volume vs. on-time FastTrack (9/14/2001): top 1% hosts (73% volume) 1 2 Volume heavy hitters tend to have long on-times Hosts with short on-times contribute small traffic volumes Analyzing peer-to-peer traffic accoss large networks
Connectivity vs. on-time FastTrack (9/14/2001): top 1% hosts (73% volume) 1 2 Hosts with high connectivity have long on-times Hosts with short on-times communicate with few other hosts Analyzing peer-to-peer traffic accoss large networks
P2P vs Web • Observations • 97% of prefixes contributing P2P traffic also contribute Web traffic • Heavy hitter prefixes for P2P traffic tend to be heavy hitters for Web traffic • Prefix stability – the daily traffic volume (in %) from the prefix does not change over days • Experiments: 0.01%, 0.1%, 1%, 10% heavy hitters => 10%, 30%, 50%, 90% of the traffic volume Analyzing peer-to-peer traffic accoss large networks
Traffic stability March 2002 Top 0.01% prefixes Top 1% prefixes P2P traffic contributed by the top heavy hitter prefixes is more stable than either Web or total traffic Analyzing peer-to-peer traffic accoss large networks
Summary • Measure and characterize P2P traffic across a large network • Three popular P2P systems • Significant increase in both number of users and traffic volume • Traffic distributions are highly skewed • High level system dynamics • P2P is significant, but stable component of the Internet traffic Analyzing peer-to-peer traffic accoss large networks
Acknowledgement • AT&T Labs • Matt Grossglauser, Carsten Lund, Jennifer Rexford, Matt Roughan, Fred True • External • Steve Gribble Analyzing peer-to-peer traffic accoss large networks