Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining Thomas Gottrongottron@uni-koblenz.de
Social Media • Definition from Wikipedia: Social media are media for social interaction, using highly accessible and scalable communication techniques. Social media is the use of web-based and mobile technologies to turn communication into interactive dialogue. … Social authority is developed when an individual or organization establishes themselves as an "expert" in their given field or area, thereby becoming an influencerin that field or area. • Prominent examples: • Media sharing, collaboration, reviews, communication • MicroBlogging • Twitter & Co. (also Facebook)
Microblogging – Twitter RT @janedoe: My dear @johndoe had troubles to wake up this #morning Followers @janedoe My dear @johndoe had troubles to wake up this #morning
Microblogging – Sparsity • Twitter: 140 characters, few terms 85% of all tweets do not contain any term more than once
Microblogging – Quality • Facets/aspects of quality: • Question: Which is the best Online RSS Reader? I need some recommendations, cheers everyone :) • My kitten is pretending to be a laptop • imon the phone rite now • Interesting timeline of major events in the history of information retrieval http://tinyurl.com/ya7rcqt Purpose (interaction, news propagation, etc.) Presentation (humor, irony, etc.) Language (writing style) Interestingness
Measuring Quality? • (Social) Network measures • PageRank • Clustering coefficients • Centrality measures. • Quality of people, not messages!
Retweets • Sign of quality • interesting for wider audience • Depends on • Content • Social network • Number of followers • Activity of followers • Content based retweet prediction Odds of retweet as sign of quality
Retweets – Prediction Model • Logistic regression • Model parameters learned on training data
Application: Tweet retrieval • Query: „beer“
Application: Tweet retrieval • Rerank top-100 according to retweet-odds
Summary & Outlook • Microblogging • Data sparsity in short messages • Quality is an issue • Interestingness: (one) sign of quality • Use Retweet odds for better ranking • Notion of content quality • Influence / Potential influence of users.
Thank you! Contact: WeST – Institute for Web Science and Technologies Universität Koblenz-Landau firstname.lastname@example.org