220 likes | 690 Views
TU Berlin Deutsche Telekom Lab. Flash Floods and Ripples:. Meeyoung Cha. Juan A. Navarro Max Planck Institute for Software Systems (MPI-SWS). Hamed Haddadi. The Spread of Media Content through the Blogosphere . ICWSM Data Challenge 2009. Motivation. How does content spread in blogs?
E N D
TU Berlin Deutsche Telekom Lab Flash Floods and Ripples: Meeyoung Cha Juan A. Navarro Max Planck Institute for Software Systems (MPI-SWS) Hamed Haddadi The Spread of Media Content through the Blogosphere ICWSM Data Challenge 2009
Motivation How does content spread in blogs? What kinds of content are shared? • Blogs play a significant role in today’s Internet culture • Blogs are used for information propagation purposes • Discuss political issues • Review new products and online contents • Form communities and special interest groups • Increasingly, media content is shared through blogs
Our goal • Characterizehow the structure of the blogosphere influences the patterns of content spreading • 1. Understand the structure of the blogosphere • Is the structure ideal for content dissemination? • 2. Understand the spreading patterns of content • What types of content spread? • How quickly does content spread?
Part2. Analysis of network properties Part3. Analysis of spreading patterns Part1. Measurementmethodology
Spinn3r dataset • Extracted post URL, site, host, language, timestamps, etc. • Step1: Focus on top 15 blog domains • Step2: Scrape content to find embedded HTML links • Code available at http://www.mpi-sws.org/~jnavarro/tools/ • Limitations • Comments and blogrolls missing • Some blogs only post summaries • Only used dataset with numbered ‘tiers’
Step1: Top 15 blog sites … Total
Step2: Extracting HTML links Links tomedia content Links toother blogs
Part2. Analysis of network properties Part3. Analysis of spreading patterns Part1. Measurementmethodology
Network of blogs Directed network of 85,013 nodes and 129,079 edges A B
Network structure [ 73% of blogs in the largest connected component ] Average node degree 1.5 Power-law degree distribution 6% of links are reciprocal 35% of links cross blog domains 7% of links cross language boundaries
Network structure – 2 Network structure is more sparse than social networks Density = Ratio of observed links, out of all possible links
Insights for information propagation • Sparse structure & power-law degree distribution • Clear preference for bloggers to particular topics or sources • Trend setters (high in-degree) and recommenders (high out-degree) • Potential factors that can limit spreading • Blog domains had no visible effect on linking • Language barriers inhibit the flow of information
Part2. Analysis of network properties Part3. Analysis of spreading patterns Part1. Measurementmethodology
Spreading of media content media • What types of content are shared? • How quickly does information spread?
Types of content shared Popular sharing of user-generated content
Popularity of YouTube videos • Video popularity follows a power-law distribution: • Very large diffusion processes exist • Preferential attachment may drive linking
Popular video categories Musicmost popular Still spread! Keen onpolitics We downloaded metadata of top 10,000 videos
Time lag in the spread of videos Flash floods Ripples
Example spreading pattern Blogs linking the same video are connected = Diffusion through the blogosphere Other McCain’s political campaignlinked by 79 blogs
Insights from spreading patterns • Videos in different genres spread with very different patterns • Flash floods: found quickly and spread rapidly • Ripples: took longer to spread, re-discovered years after upload • Diffusion through links in the blogosphere • 24% of videos had any spreading in the blog graph • Other spreading factors: featuring and search
Part2. Analysis of network properties Part3. Analysis of spreading patterns Part1. Measurementmethodology
Conclusion • Identified spreading patterns and factors that limit spreading • Blogs serve as a medium to filter and spread media content • Potential implication: Recommendation systems can take into account and exploit different spreading patterns • Future work: spreading patterns of other types of content