Challenges in Mining Social Media Sparsity and Quality - PowerPoint PPT Presentation

geordi
challenges in mining social media sparsity and quality n.
Skip this Video
Loading SlideShow in 5 Seconds..
Challenges in Mining Social Media Sparsity and Quality PowerPoint Presentation
Download Presentation
Challenges in Mining Social Media Sparsity and Quality

play fullscreen
1 / 16
Download Presentation
Challenges in Mining Social Media Sparsity and Quality
108 Views
Download Presentation

Challenges in Mining Social Media Sparsity and Quality

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining Thomas Gottrongottron@uni-koblenz.de

  2. Social Media • Definition from Wikipedia: Social media are media for social interaction, using highly accessible and scalable communication techniques. Social media is the use of web-based and mobile technologies to turn communication into interactive dialogue. … Social authority is developed when an individual or organization establishes themselves as an "expert" in their given field or area, thereby becoming an influencerin that field or area. • Prominent examples: • Media sharing, collaboration, reviews, communication • MicroBlogging • Twitter & Co. (also Facebook)

  3. Microblogging – Twitter RT @janedoe: My dear @johndoe had troubles to wake up this #morning Followers @janedoe My dear @johndoe had troubles to wake up this #morning

  4. Microblogging – Sparsity • Twitter: 140 characters, few terms 85% of all tweets do not contain any term more than once

  5. Microblogging – Quality • Facets/aspects of quality: • Question: Which is the best Online RSS Reader? I need some recommendations, cheers everyone :) • My kitten is pretending to be a laptop • imon the phone rite now • Interesting timeline of major events in the history of information retrieval http://tinyurl.com/ya7rcqt Purpose (interaction, news propagation, etc.) Presentation (humor, irony, etc.) Language (writing style) Interestingness

  6. Measuring Quality? • (Social) Network measures • PageRank • Clustering coefficients • Centrality measures. • Quality of people, not messages!

  7. Retweets • Sign of quality • interesting for wider audience • Depends on • Content  • Social network  • Number of followers • Activity of followers • Content based retweet prediction Odds of retweet as sign of quality

  8. Retweets – Features

  9. Retweets – Prediction Model • Logistic regression • Model parameters learned on training data

  10. Feature Weights

  11. Feature Weights – Topics

  12. Application: Tweet retrieval • Query: „beer“

  13. Application: Tweet retrieval • Rerank top-100 according to retweet-odds

  14. Application: Tweet retrieval – Evaluation

  15. Summary & Outlook • Microblogging • Data sparsity in short messages • Quality is an issue • Interestingness: (one) sign of quality • Use Retweet odds for better ranking • Notion of content quality • Influence / Potential influence of users.

  16. Thank you! Contact: WeST – Institute for Web Science and Technologies Universität Koblenz-Landau gottron@uni-koblenz.de