1 / 35

Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star?

Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star?. Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il http://www.eng.tau.ac.il/~shavitt. Credits. Talk is based on the papers:

maitland
Download Presentation

Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il http://www.eng.tau.ac.il/~shavitt

  2. Credits Talk is based on the papers: • Static and dynamic characterization of the Gnutella network [Shaked-Gish, S, Tankel, IPTPS 2007] • How to predict the next pop star? [Koenigstein, S, Tankel, KDD 2008]

  3. What are Peer-to-Peer Networks? client client • The common computing paradigm is client-server • Server waits for requests (on a known port) • Client sends a request • Server serves the client • Examples: WWW, FTP, SMTP (e-mail), ….. • Peer-to-peer networks: • Each end-point is both client and server server client client client client client client

  4. The Gnutella Network • Gnutella: The most popular sharing network on the Internet • According to the Digital Music News Research Group40% market share in Q4 2007 • Limewire: The most popular file sharing client in the world. Dominates the Gnutella network.

  5. The Gnutella Protocol • Originally: a flat peer-to-peer distributed protocol. • Churn caused instability • Today: a 2-level tiered system • Stable nodes are promoted to become ultrapeers • Queries carry OOB address: The originator’s address or in most cases when the client is firewalled, this is the ultrapeer’s address

  6. Locating the Origin IP address UP UP listener UP IP resolution Process: • Detect the U.P. IP • Discard queries with more than 2 hops • Discard queries with 2 hops and same IP • Intercept queries with 2 hops and different IPs peer peer peer Cancels the bias for rare queries Introduces bias against firewalled clients

  7. Data Sets • First study: • Jul 2006 - Nov 2006 • 665,000,000 world-wide geo-identified queries • Second study • Oct 2006 – Jul 2007, Sundays only • 310,000,000 USA geo-identified queries • A network crawl of 24 hours • 1.2M users • 533,000 different songs Largest studies ever performed in length and depth

  8. Query Classification in Gnutella 2nd

  9. Top Coutries

  10. Queries Per Day

  11. Queries Per Hour Per User

  12. Top Queries (constant)

  13. Top Volatile Queries

  14. Temporal Ranking Drift

  15. How to Predict Artist’s Success? Noam Koenigstein, Y. Shavitt, and Tomer Tankel. Spotting Out Emerging Artists Using Geo-Aware Analysis of P2P Query Strings. The 2008 ACM SIGKDD Conference, August 2008, Las Vegas, NV, USA.

  16. The Word of Mouth Effect The Divergence can be used to predict a new product success probability [Garber et al., Marketing Science 2004]

  17. The divergence • When measured against the uniform distribution, maximum is achieved when P is a  function. • True for both Kullback-Leiblar and Jensen-Shannon • This is the case when emerging artists are considered • Non uniform distribution of potential adopters:

  18. Party Like a Rockstar in 2007 Week 6: The string “party like a rockstar” is detected by the algorithm Week 8: Atlanta’s popularity chart in (Feb 18th) Week 15: Atlanta based Shop Boyz sign contract with Universal Recordings Week 18: The song first enters the Billboard Hot 100 on (80th position) Week 23: Reached 2nd position on Billboard Hot 100 Ranked only 10,156 on the global chart

  19. Party Like a Rockstar Shop Boyz related queries in February 2007 Shop Boyz Popularity and Divergence in 2007

  20. Soulja Boy • Detected by our alg: already in 2006. • The string “soulja boy” entered the “Atlanta queries top 100” already in October 2006 • Entered the Bubbling Under R&B/Hip-Hop Singles in the 23rd of June 2007 • Later ranked first in the following Billboard charts:Hot 100, Hot Rap Tracks, Hot Videoclip, Hot RingMasters and Hot Ringtones

  21. Yung Berg • Active in LA • Week 2: Entered LA top 100 • Week 15: First appeared on the Billboard charts • Week 32: Reached 18 on the Billboard Top 100

  22. Madonna

  23. The Detection Algorithm • Input: A list of Geo-identified P2P Query stringsOutput: A list of locally popular query string with high probability to become globally popular • Build local and global popularity charts • local popularity is detected using local and global popularity thresholds • Looking for local popularity growth trends from week to week • Filtering:Non-music related content, and already familiar artists are characterized by uniform distribution

  24. Local Popularity • Not all queries are “products”, thus divergence is not effective (e.g., rare typos) • Detection is based on local popularity:

  25. ATPL - All Times Popular List • Initialization: All the strings that reached global popularity in 2006 • Weekly aggregation • Filters non-volatile string: • adult related, e.g., “porn” • well established artists, e.g., “madonna”, “avril lavigne” • Movies, software, etc.

  26. Algorithm's Flow

  27. Detection Time

  28. Local Threshold

  29. Local Threshold

  30. Manual inspection of the Atlanta data

  31. Correlation Between Billboard and downloads

  32. Correlation Measurements • Modified time series correlation • P2P correlation with the Billboard:

  33. Finding The Optimal Time Shift

  34. Prediction Results • Example:When a song enters the Billboard will it reach “top 20”? • Precision: 89%, Recall: 80%On average songs pass the threshold 2.83 weeks before reaching top Billboard rank • More details:Koenigstein, Shavitt, and Zilberman, AdMIRe2009

  35. Summary • Following activity in the Internet can help up detect trends before they are visible • P2P networks • Social networks • Blogs • Talk-backs • Searches • More at http://www.eng.tau.ac.il/~shavitt

More Related