Semantic Modelling of User Interests Based on Cross-Folksonomy Analysis Martin Szomszor, Harith Alani, Kieron O’Hara, Nigel Shadbolt University of Southampton Iván Cantador Universidad Autonoma de Madrid TAGora: Semiotic Dynamics of Online Social Communities EU-IST-2006-034721
Outline • Introduction and Motivation • Why is your folksonomy interaction useful? • How could it be exploited? • Architecture • Matching user accounts • Collecting Data • Tag Filtering • Profile Building • Experiment and Evaluation • Conclusions and Future Work
Introduction http://news.bbc.co.uk/ http://slashdot.org/ Dream Theater Metallica Rush delicious.com
Increasing number ofonline identities • Recent Ofcom study found that UK adults have on average 1.6 profiles. 39% of those that have one profile have at least 2 • Many predict that in the near future, individuals will have in excess of 10 profiles • [Ofcom 2008] Social Networking: A quantative and qualitative research report into attitudes, behaviours, and use.
The Big Picture Profile of Interests delicious.com
Personalisation Profiles could be exported to other sites to improve recommendation quality Profile of Interests Better user experience Profiles could be used to support personalised searching delicious.com
Consolidation and Integration cuba cuba hotels holiday travel 2008 currency http://dbpedia.org/resource/Cuba http://dbpedia.org/resource/Travel http://dbpedia.org/resource/Holiday http://dbpedia.org/resource/Category:Tourism
User Tagging delicious.com
Tag Clouds delicious.com
Tagging Variation Filtered Tags Raw Tags  Szomszor, M., Cantador, I. and Alani, H. (2008). Correlating User Proﬁles from Multiple Folksonomies. In: ACM Conference on Hypertext and Hypermedia, 2008 , Pittsburgh, Pennsylvania.
Account Correlation • Using Google’s Social Graph API http://users.ecs.soton.ac.uk/mns2 account homepage delicious.com
Data Collection • Delicious • Custom python scripts • Flickr • Using public API • Only public information is harvested
Creating User Profiles • Three stage process: • Identify Wikipedia page • London is matched with http://en.wikipedia.org/wiki/London • Extract Category list • Host cities of the Summer Olympic Games | Host cities of the Commonwealth Games | London | 1st century establishments | British capitals | Capitals in Europe | Port cities and towns in the United Kingdom • Select representative Categories • Only choose categories that match the tag string • Excludes spurious categories such as: • Host cities of the Summer Olympic Games • Needs more sources
Experiment Setup • Bootstrapped using 667,141 delicious profiles obtained in previous work • Only accounts with a matching Flickr profile and > 50 distinct tags were added • Final list contains 1,392 users
Evaluation • Four evaluation procedures: • The performance of the tag filtering and matching to Wikipedia Entries • The difference between the most common categories found in delicious and Flickr • The amount learnt from merging profiles from the two folksonomies • The accuracy of matching tags to Wikipedia categories
Global Category View • What are the differences in the interests that are learnt from each domain?
Learning More About Users • How much more can we learn by using multiple profiles?
Category Matching • How good is the category matching? • Take 100 random users and choose 1 Delicious tag and 1 Flickr tag • Classify tag into one of 3 classes: • Correct • Unresolved (not matched to any category) • Ambiguous (Disambiguation required)
Conclusions • We have proposed a novel method for the creation of Profiles of Interest by exploiting an individual’s tagging activities across two popular folksonomy sites • Frequently used tags often specify areas of interest but not always! • Common delicious tags are daily, toread, howto • Flickr tags often include names of people • Expanding the analysis across folksonomies increases the amount learnt • On Average 15 new concepts per user
Future Work • Improve page matching • 22.5% of sample tags unresolved • Handle disambiguation • 13% of sample tags refer to ambiguous terms • Cooccurrence networks • Category hierarchy • Increase network coverage • Already have the data to include Last.fm • Understand which tags actually specify an interest of the individual • Filter out categories such as ‘Surname’