Attention and Event Detection • Identifying, attributing and describing spatial bursts • Early online identification of attention items in social media Louis Gong firstname.lastname@example.org www.louisgong.com
Identifying, attributing and describing spatial bursts Michael Mathioudakis, Nilesh Bansal, and Nick Koudas. 2010. Identifying, attributing and describing spatial bursts. Proc. VLDB. 3, 1-2 (September 2010), 1091-1102. • Problem Description • Related Works • Solution • Experiment & Result • Q&A
BlogScope • Automatically collect information. (blogosphere, news sources, social network, online forums.) • Advanced information retrieval tasks with data mining and language processing. • Warehouses metadata about the content (time of creation , demographic profile of author).
Problem Description • User generated content that appears on blogs, microblogging websites, wikis and social networks proliferates at profound rates. • Automating the process of information discovery given the vast collection of information. • Example: Barack Obama, 2008, Bin laden, recently
Related Works • 1. J. Kleinberg. Bursty and hierarchical structure in streams. In KDD, 2002. • proposed a model for burst identification over document streams. • 2. J. M. Kleinberg and E. Tardos. Approximation algorithms for classification problems with pairwise relationships: metric labeling and markov random fields. J. ACM, 2002 • provides a 2-approximation linear programming algorithm to spatial burst detection problem. • 3. Statistical discrepancy functions are used to quantify the difference between distributions and are commonly used to identify regions where two spatial distributions differ significantly. Such regions can be interpreted as areas where one spatial distribution exhibits a burst in comparison with the other.
Solution • Identify spatial burst • Burst attribution • Keywords based description
Spatial Bursts • G: grid; for a suitable choice of granularity, geographical entities of interest(cities) correspond to a cell. • Rs: the spatial distribution of related documents published within t. • Ds: the spatial distribution of all the documents published within t. • Spatial bursts are identified as cells for which the value of Rs is large in comparison with Ds.
Burst Attribution • Attribute the burst to profile features. • 1. Focus on a specific set of bursty cells and ask what are the demographic factors in the absence of which no burst would have been detected. (eg. “Toronto Film Festival”) • 2. Compare a bursty region with a non-bursty region and get the demographic factors that make the difference.
Keyword based description of bursts • Query Expansion: • Identify the keywords highly related to q (bursts for a query q). q U wi . • Curve Estimation: • the keywords w that occur frequently together with q often exhibit a burst themselves over the same interval. • q0[t]est = (1 + ) minfb(q)[t]; b(wi)[t]g
Experiment & Result • Average running time of the algorithms
Experiment & Result • Queries q were submitted to BlogScope, with temporal interval qt set as the first 10 days of March 2009. • Retrieving distributions Rs and Ds for a query.
Experiment & Result • Parameter Sensitivity
Summary • Scalable method to identify spatial information bursts. • Efficient techniques to attribute bursts to specific demographic factors. • Techniques to analyze bursts and effectively identify sets of keywords that describe the burst.
Early online identification of attention items in social media Michael Mathioudakis, Nick Koudas, and Peter Marbach. 2010 In Proceedings of the third ACM international conference on Web search and data mining (WSDM '10). ACM, New York, NY, USA, 301-310. • Problem Description • ISIS Model • Experiment • Result • Q&A
Problem Description • Activity in social media is manifested via interaction that involve text, images, links and other information items. • Naturally, some items attract more attention than others, expressed with large volumes of linking, commenting or tagging activity. • Being able to identify information items that gather much attention in such a real time information collective is a challenging task.
Comparison (traditional & social media) • Traditional webpages – Graph Model (PageRank) • diff: • 1. Social media is associated with individual documents, pictures, news articles. So it is reasonable to separate the measures for the importance or attention gathering potential of different items. • 2.Linking activity in social media is the product of continuous interaction between participating individuals. Dynamic aspects of this process are not captured by graph model.
Comparison (traditional & social media) • 3.Linking is not the only action by which structure arises in social media, as individuals also interact by commenting, sharing, recommending or rating.
Subject • Proposed the first formal definition and analysis of such a model and use it as a basis to identify attention gathering items in online fashion. • Identify individual items that attract a significant number of actions and its main focus is ‘early identification’ of such items.
ISIS Model • An abstraction of social media activity. • Information units(units) – items such as blog posts status messages, photos, etc. in social media stream. • Information sources(sources) – individuals contributing information. • A source participate in two sets of stochastic processes: • 1. The process of emitting information units in a streaming fashion. • 2. Processes of interaction with other sources.
ISIS Model • Each unit is associated with a timestamp tp and a validity period dp. • The validity periods of units emitted by the same source might overlap.
ISIS Model • Source interaction
ISIS Model • Source interaction
ISIS Model • Source interaction
Result • Interaction weights of posts in • (a) engadget.com • (b) techcrunch.com
Result • Attention Gathering Posts
Result • Quality vs Efficiency Trade-offs
Summary • ISIS Model : a general stochastic model for interacting streaming information sources. • Measure for the attention gathering potential of information units. • Experimental results on real data collected form a period of blogging activity.
Thank You Louis Gong email@example.com www.louisgong.com