1 / 23

Detecting Newsworthy Topics in Twitter Steven Van Canneyt and Matthias Feys April 8 th , 2014

Detecting Newsworthy Topics in Twitter Steven Van Canneyt and Matthias Feys April 8 th , 2014. Methodology. Input. News publisher detection. time interval i. Topic detection. Topic ranking. topic 2. topic 1. 1. 2. 3. topic 3. time interval i. Topic enrichment. 1. topic 3.

claire-odom
Download Presentation

Detecting Newsworthy Topics in Twitter Steven Van Canneyt and Matthias Feys April 8 th , 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DetectingNewsworthy Topics in TwitterSteven Van Canneyt and Matthias FeysApril 8th, 2014

  2. Methodology

  3. Input News publisher detection time interval i

  4. Topic detection Topic ranking topic 2 topic 1 1. 2. 3. topic 3 time interval i

  5. Topic enrichment 1. topic 3 headline 3, tweets, pictures… 2. topic 2 headline 2, tweets, pictures… 3. topic 1 headline 1, tweets, pictures… time interval i

  6. News publisher detection • Bayesian Network Classifier • Estimates probability that a user is a ‘news publisher’ • Only use tweets from users with probability > α (0.04) • Training set: 10,000 manually annotated users which tweets contains newsworthy content? which tweets are posted by users who mostly talk about newsworthy stories?

  7. News publisher detection • Features

  8. Topic detection • DBSCAN clustering algorithm • Cosine similarity • Boosted tf-idf representation of the tweets

  9. Topic detection standard term frequency-inverse document frequency of word w boosting of proper nouns and verbs (1.5) boosting of bursty words during time interval i

  10. Topic ranking • SVM Classifier • Estimates probability that a topic is ‘newsworthy’ • Only use topics with probability > β (0.5) • Training set: 116 manually annotated topics retrieved from the ‘2012 US elections’ training dataset

  11. Topic ranking • Features • Tweet features • eg. #tweets • User features • eg. %users with ‘news publisher’ prob. > 0.9 • Topical coherence • eg. %tweets containing most informative word in the cluster • Non duplicate features • eg. highest cosine similarity between the cluster and the newsworthy topics detected in previous time intervals

  12. Topic enrichment • Objective: enrich detected newsworthy topic s • Headline • Split tweets in cluster s in sentences • Select sentence with highest cosine similarity with the center of the cluster • Rule based approach to clean sentence • eg. removing URLs, emoticons, ‘#’-symbol, ‘@’-symbol

  13. Topic enrichment • Keywords • words in headline which are in the top 50% of the most important words of topic s

  14. Topic enrichment • Representative tweets • Select all tweets posted during time interval i • Also tweets not posted by news publishers • Discard tweets with cosine similarity to the center of s < λ (0.6) • Sort obtained tweets based on their relevance to s • relevance = cosine similarity between tweet en topic center, multiplied by user_factor • user_factor= 1.5 if user is ‘news publisher’, 1 otherwise • Remove near-duplicates • ‘near-duplicate’ if cosine similarity > μ (0.7)

  15. Topic enrichment • Representative pictures • Select all tweets posted during time interval i • Also tweets not posted by news publishers • Discard tweets with cosine similarity to the center of s < λ (0.6) • Select picture URLs from the tweets • Sort picture URLs based on the sum of the relevance values of their associated tweets

  16. Results

  17. tags Sofia, monument, makeover, provokes tweets Pro-Ukraine paint job - Sofia monument's latest makeover provokes protest from Russia http://bbc.in/1frf9UN Kijow w Sofii. RT: @BBCWorld Pro-Ukraine paint job in Sofia provokes protest from Russia http://bbc.in/1frf9UN Pro-#Ukraine paint job-Sofia monument's latest makeover provokes #protest from R http://bbc.in/1frf9UN via @BBCWorld

  18. tags Jubilant, protesters, driving, vehicle, Museum, Parliament tweets Jubilant protesters driving military vehicle from a Kiev Museum around Parliament building #Kiev #Ukraine Another #Russia|n armored vehicles spotted in #Sevastopol in #Crimea. #Ukraine http://qn.quotidiano.net/esteri/2014...

  19. tags Jubilant, protesters, driving, vehicle, Museum, Parliament tweets Jubilant protesters driving military vehicle from a Kiev Museum around Parliament building #Kiev #Ukraine Another #Russia|narmored vehicles spotted in #Sevastopol in #Crimea. #Ukraine http://qn.quotidiano.net/esteri/2014...

  20. Steven.VanCanneyt@intec.ugent.be Questions?

More Related