350 likes | 512 Views
Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star?. Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il http://www.eng.tau.ac.il/~shavitt. Credits. Talk is based on the papers:
E N D
Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il http://www.eng.tau.ac.il/~shavitt
Credits Talk is based on the papers: • Static and dynamic characterization of the Gnutella network [Shaked-Gish, S, Tankel, IPTPS 2007] • How to predict the next pop star? [Koenigstein, S, Tankel, KDD 2008]
What are Peer-to-Peer Networks? client client • The common computing paradigm is client-server • Server waits for requests (on a known port) • Client sends a request • Server serves the client • Examples: WWW, FTP, SMTP (e-mail), ….. • Peer-to-peer networks: • Each end-point is both client and server server client client client client client client
The Gnutella Network • Gnutella: The most popular sharing network on the Internet • According to the Digital Music News Research Group40% market share in Q4 2007 • Limewire: The most popular file sharing client in the world. Dominates the Gnutella network.
The Gnutella Protocol • Originally: a flat peer-to-peer distributed protocol. • Churn caused instability • Today: a 2-level tiered system • Stable nodes are promoted to become ultrapeers • Queries carry OOB address: The originator’s address or in most cases when the client is firewalled, this is the ultrapeer’s address
Locating the Origin IP address UP UP listener UP IP resolution Process: • Detect the U.P. IP • Discard queries with more than 2 hops • Discard queries with 2 hops and same IP • Intercept queries with 2 hops and different IPs peer peer peer Cancels the bias for rare queries Introduces bias against firewalled clients
Data Sets • First study: • Jul 2006 - Nov 2006 • 665,000,000 world-wide geo-identified queries • Second study • Oct 2006 – Jul 2007, Sundays only • 310,000,000 USA geo-identified queries • A network crawl of 24 hours • 1.2M users • 533,000 different songs Largest studies ever performed in length and depth
How to Predict Artist’s Success? Noam Koenigstein, Y. Shavitt, and Tomer Tankel. Spotting Out Emerging Artists Using Geo-Aware Analysis of P2P Query Strings. The 2008 ACM SIGKDD Conference, August 2008, Las Vegas, NV, USA.
The Word of Mouth Effect The Divergence can be used to predict a new product success probability [Garber et al., Marketing Science 2004]
The divergence • When measured against the uniform distribution, maximum is achieved when P is a function. • True for both Kullback-Leiblar and Jensen-Shannon • This is the case when emerging artists are considered • Non uniform distribution of potential adopters:
Party Like a Rockstar in 2007 Week 6: The string “party like a rockstar” is detected by the algorithm Week 8: Atlanta’s popularity chart in (Feb 18th) Week 15: Atlanta based Shop Boyz sign contract with Universal Recordings Week 18: The song first enters the Billboard Hot 100 on (80th position) Week 23: Reached 2nd position on Billboard Hot 100 Ranked only 10,156 on the global chart
Party Like a Rockstar Shop Boyz related queries in February 2007 Shop Boyz Popularity and Divergence in 2007
Soulja Boy • Detected by our alg: already in 2006. • The string “soulja boy” entered the “Atlanta queries top 100” already in October 2006 • Entered the Bubbling Under R&B/Hip-Hop Singles in the 23rd of June 2007 • Later ranked first in the following Billboard charts:Hot 100, Hot Rap Tracks, Hot Videoclip, Hot RingMasters and Hot Ringtones
Yung Berg • Active in LA • Week 2: Entered LA top 100 • Week 15: First appeared on the Billboard charts • Week 32: Reached 18 on the Billboard Top 100
The Detection Algorithm • Input: A list of Geo-identified P2P Query stringsOutput: A list of locally popular query string with high probability to become globally popular • Build local and global popularity charts • local popularity is detected using local and global popularity thresholds • Looking for local popularity growth trends from week to week • Filtering:Non-music related content, and already familiar artists are characterized by uniform distribution
Local Popularity • Not all queries are “products”, thus divergence is not effective (e.g., rare typos) • Detection is based on local popularity:
ATPL - All Times Popular List • Initialization: All the strings that reached global popularity in 2006 • Weekly aggregation • Filters non-volatile string: • adult related, e.g., “porn” • well established artists, e.g., “madonna”, “avril lavigne” • Movies, software, etc.
Correlation Measurements • Modified time series correlation • P2P correlation with the Billboard:
Prediction Results • Example:When a song enters the Billboard will it reach “top 20”? • Precision: 89%, Recall: 80%On average songs pass the threshold 2.83 weeks before reaching top Billboard rank • More details:Koenigstein, Shavitt, and Zilberman, AdMIRe2009
Summary • Following activity in the Internet can help up detect trends before they are visible • P2P networks • Social networks • Blogs • Talk-backs • Searches • More at http://www.eng.tau.ac.il/~shavitt