1 / 14

Hashtags as Milestones in Time

Hashtags as Milestones in Time. Stewart Whiting University of Glasgow Omar Alonso Microsoft/Bing Time Aware Information Access Workshop, SIGIR Oregon, 2012 . (Work done while on internship at Microsoft). Identifying the hashtags for meaningful

edric
Download Presentation

Hashtags as Milestones in Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashtags as Milestones in Time Stewart Whiting University of Glasgow Omar Alonso Microsoft/Bing Time Aware Information Access Workshop, SIGIR Oregon, 2012. (Work done while on internship at Microsoft) Identifying the hashtagsfor meaningful events using Twitter search logs and Wikipedia data

  2. Alright… Outline • Hashtags as milestones in time • Introduction • Why milestones • Why hashtags? Can they useful as milestones? • Motivation • Approach • Data preparation • Approach steps • Constructing a timeline – examples • Preliminary conclusions

  3. Abstract: Hashtagsas milestones in time What we want to do: • Identify event-based hashtags, for timeline creation • Currently using historic/past data • Filter out junk • Find most temporally significant hashtags • Use multiple signals: Twitter search logs + related Wikipedia article popularity • We are not doing topic detection/tracking! Why? • A good way to express (anchor) a topic on a timeline… • Help users make sense of/navigate temporal information #what?

  4. Introduction • Hashtags used by authors to explicitly denote the relevant topic(s) in message • “Great passing, great game #euro2012” • Used by authors and searchers • Broadcast a consume a specific topic • Especially useful in short text retrieval where bag of words/language modelling are challenging • Reflect mainstream events (or memes!) in real-time • See trending topics right now • Timelines are very good for displaying events • But you need to express the events as a meaningful marker, or milestone!

  5. Introduction to the data • Two crowds of people • Authors/searchers on Twitter • Editors/browsers on Wikipedia • Correlation between signals from the two crowds • People search for what is happening • People edit Wikipedia with what is happening • Two very distinctive signals!

  6. Twitter hashtag signals (in search logs) • But plenty of memes too… • #20PeopleWhoIWantToMeet • #PresentingInTheBatCave • #whiteppldoitbutblackppldont

  7. Wikipedia signals • Whitney Houston • TV appearances • Her death in February 2012 • Events were reflected by discussion with hashtags in Twitter, e.g. • #ripwhitney • #bgtwhitney (BGT = Britain’s got Talent)

  8. Motivation • Both signals have large coverage • Celebrities, news, weather, people, science, movies etc. • Two robust signals coming from large crowds • Difficult to influence by individuals (spam?) • Not so reliant on single signal analysis (i.e. wavelets or burst detection etc) • Discard memes by looking for associated Wikipedia articles. • Meaningful milestones in timelines provide strong features to navigate temporal content • Alonso et al. (2010), Matthews et al. (2010), From et al. (2003)

  9. Data Preparation – HashtagData • Extracted from Bing Social and IE8 query logs • Provides hashtag use, aggregated per day • (Proprietary, but could be extracted from other sources) • Hashtags are mostly a mix of unigrams and bigrams! • We also want the words in the hashtag • Need to use a word breaker… • We used Microsoft Web N-Gram Services • Breaks #crosstownshootout into ‘cross town shoutout’ and #basketballwivesla into ‘basketball wives la’

  10. Data Preparation – Wikipedia Data • Created a Lucene index using the Wikipedia Extraction (WEX) data. • Wikipedia article viewing popularity statistics • Dump available for each hour since Dec 2007 • Published near real-time, for the past hour (on the hour) • Huge number of data points! • So we sampled 8am/8pm each day • Transformed into a daily aggregated time-series (therefore comparable with hashtag signals) • Smoothed with exponential smoothing (alpha = 0.2) • Over 2 billion data points!

  11. Approach Outline • For each hashtags from the logs, use word breaker service to extract hashtag terms. • Use separated terms to query Wikipedia index – maps each hashtag to a set of possibly associated articles. • For each article/hashtag, prepare a same-length comparable time-series of popularity • Frequency of hashtag over time • Popularity of article over time • Pearson correlation co-efficient computed. • Measures association between temporality of the hashtag occurrence and the Wikipedia article popularity.

  12. Example Correlations

  13. Constructing a Timeline

  14. Conclusions • Early work, but correlating the signals does yield high-profile temporal events • Hashtag can therefore be used to anchor events on a timeline • Occasional spurious correlation (need better hashtag frequency data to improve this) • Correlation does not imply causation! • Future work… • Automatic construction of timelines • Improving correlation quality – examine time windows • Designing an evaluation framework to assess overall timeline quality

More Related