Create Presentation
Download Presentation

Download Presentation
## On the Bursty Evolution of Blogspace

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**On the Bursty Evolution of Blogspace**Ravi Kumar, Jasmine Novak, Prabhakar Raghavan and Andrew Tomkins IBM Almaden Research Center, Verity Inc. WWW 2003**Main contributions**• Time graph and blog graph • Communities in Blogspace • Temporal bursts : from a sequence of document to sets of blogs • Link blogs topically and temporally focused • Blogspace evolution**Community Extraction of Blogspace**• Communities are collections of pages which provide information on a similar topic or share a point of view. • Kleinberg (2000), co-citation, dense bipartite subgraph (signature) • Flake (2000) network flow**Bursts**• Event: model bursts • A large number of short spurious bursts vs. fragmenting long bursts into many smaller bursts • E.g. email: NSF grant (Kleinberg 2002) • Relevant events and irrelevant events • Bursty: fraction of relevant events from large to small**Bursty communities of blogs**• A given topic within a community: within a time interval • One member of blog poets posts a series of daily poems about other bloggers • A blogger Dawn hosts a poll to determine the funniest and sexiest blogger**Approach**• Community Extraction • Burst Analysis**Time Graph**• A set V of nodes where each node v 2 V has an associated interval D(v) on the time axis (called the durationof v) • A set E of edges where each e 2 E is a triple (u; v; t) where u and v are nodes in V and t is a point in time in the interval D(u) D(v) • Gt = (Vt , Et)**Community Extraction**NP-hard to find dense subgraph • 1.Preprocessing: remove all pages that contain more than a certain number of in-links (too famous) • 2.Pruning: degree 1,2 are removed, degree3 are checked (K3). They are seeds • 3.Expansion: determines the vertex that contains most links to the current community by tk threshold.**Burst analysis**• Arrival of edges in the blog graph as an event stream • Kleinberg algorithm, obtain the weight of every burst in C • Apply on each extracted community in the graph**Data acquisition**• From 7 blog sites: • http://www.blogger.com • http://www.memepool.com • http://www.globeofblogs .com • http://www.metafilter.com • http://blogs.salon.com • http://www.blogtree.com • Web_Logs subtree of Yahoo**Resulting blog graph**• 750 K links among 25K blogs • 22,299 nodes, 70,472 unique edges, 777,653 multiple edges, average 11 multiple edges every blog • Generate time graph**Results - Connectivity**• Strongly connected components**Conclusion**• Present a detailed picture of a web publishing phenomenon • Around the end of 2001, Blogspace began a dramatic increase in connectedness, and in local-scale community structure • Dramatic increases of bursty link creation behavior • Tools are applicable to other evolving hyperlinked corpora.