Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y.
KaZaA/FastTrack Operation • Top file sharing system • 3 million active nodes • four clients: KaZaA, KaZaA-lite, Grokster and iMesh • Good availability and scalability • Proprietary protocol; signaling traffic encrypted • in contrast with Gnutella and e-mule
Purpose of Measurement Study • Try to understand highly successful file-sharing system • Overlay topology and dynamics • Peer selection • Index management • Utilize the KaZaA as a test-bed for further research. • Content pollution research (another paper)
Existing Tools and Projects • FastTrack encryption algorithm • available from a Web site: http://gift-fasttrack.berlios.de/ • KaZaA Media Desktop (KMD) software architecture • http://kazaasearch.narod.ru/
Big Picture of Overlay • Two layer hierarchy • Ordinary Node (ON) • Super Node (SN)
Measurement Apparatus • KaZaA Sniffing Platform • KaZaA Probing Tool
KaZaA Sniffing Platform • Poly (Ethernet) • Home (cable modem)
KaZaA Probing Tool • Campus & home based probing • Node list • Workload
Signaling Protocol ON-SN session initial SN-SN session initial
TCP Connections Evolution Poly campus 4 – 6 hour measurement Cable modem 7-11 hour measurement
SN Workload 7 - 11 hours TCP connections evolution 7 - 11 hours workload values evolution
Port Dynamic and NAT • 19,637 unique SN addresses collected • Found only 707 SNs (3.6%) use the default 1214 port number. • 18,887 SNs (96.3%) use non-default port numbers. • Of total unique 64834 peers (SN + ON), 21269 peers (ON) use private IP.
Summary of Results • 20,000 ~ 40,000 active super nodes • Each SN connects to approx. 0.1% of other SNs • Highly dynamic connections: over 35% SN-SN durations are less than 30 sec.
Summary of results • Peer selection uses IP prefix match, workload, RTT and freshness • No index exchange between SNs but query forwarding • Skewed content distribution: 20% peers provide 70% metadata for sharing.
Design Principles forUnstructured P2P Overlays • Distributed design • No infrastructure • Avoiding legal attacks. • Exploit heterogeneity • Hierarchy • Self organization • Load balancing - workload balancing. • Explicit locality awareness • Shuffle connections in core overlay
Design Principles forUnstructured P2P Overlays • Properly designed gossip mechanisms • peers have a fresh list of SNs • Firewall circumvention • dynamic port numbers • improves availability • NAT circumvention