Download
on the bursty evolution of blogspace n.
Skip this Video
Loading SlideShow in 5 Seconds..
On the Bursty Evolution of Blogspace PowerPoint Presentation
Download Presentation
On the Bursty Evolution of Blogspace

On the Bursty Evolution of Blogspace

92 Views Download Presentation
Download Presentation

On the Bursty Evolution of Blogspace

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. On the Bursty Evolution of Blogspace Ravi Kumar, Jasmine Novak, Prabhakar Raghavan and Andrew Tomkins IBM Almaden Research Center, Verity Inc. WWW 2003

  2. Main contributions • Time graph and blog graph • Communities in Blogspace • Temporal bursts : from a sequence of document to sets of blogs • Link blogs topically and temporally focused • Blogspace evolution

  3. Community Extraction of Blogspace • Communities are collections of pages which provide information on a similar topic or share a point of view. • Kleinberg (2000), co-citation, dense bipartite subgraph (signature) • Flake (2000) network flow

  4. Bursts • Event: model bursts • A large number of short spurious bursts vs. fragmenting long bursts into many smaller bursts • E.g. email: NSF grant (Kleinberg 2002) • Relevant events and irrelevant events • Bursty: fraction of relevant events from large to small

  5. Bursty communities of blogs • A given topic within a community: within a time interval • One member of blog poets posts a series of daily poems about other bloggers • A blogger Dawn hosts a poll to determine the funniest and sexiest blogger

  6. Approach • Community Extraction • Burst Analysis

  7. Time Graph • A set V of nodes where each node v 2 V has an associated interval D(v) on the time axis (called the durationof v) • A set E of edges where each e 2 E is a triple (u; v; t) where u and v are nodes in V and t is a point in time in the interval D(u) D(v) • Gt = (Vt , Et)

  8. Community Extraction NP-hard to find dense subgraph • 1.Preprocessing: remove all pages that contain more than a certain number of in-links (too famous) • 2.Pruning: degree 1,2 are removed, degree3 are checked (K3). They are seeds • 3.Expansion: determines the vertex that contains most links to the current community by tk threshold.

  9. Burst analysis • Arrival of edges in the blog graph as an event stream • Kleinberg algorithm, obtain the weight of every burst in C • Apply on each extracted community in the graph

  10. Data acquisition • From 7 blog sites: • http://www.blogger.com • http://www.memepool.com • http://www.globeofblogs .com • http://www.metafilter.com • http://blogs.salon.com • http://www.blogtree.com • Web_Logs subtree of Yahoo

  11. Resulting blog graph • 750 K links among 25K blogs • 22,299 nodes, 70,472 unique edges, 777,653 multiple edges, average 11 multiple edges every blog • Generate time graph

  12. Results – Degree Distribution

  13. Results - Connectivity • Strongly connected components

  14. Results – Distribution of community sizes

  15. Results – Community Evolution

  16. Conclusion • Present a detailed picture of a web publishing phenomenon • Around the end of 2001, Blogspace began a dramatic increase in connectedness, and in local-scale community structure • Dramatic increases of bursty link creation behavior • Tools are applicable to other evolving hyperlinked corpora.