1 / 19

Kwan Yi School of Library and Information Science

Mining a Web 2.0 service for the discovery of semantically similar terms : A case study with Del.icio.us. Kwan Yi School of Library and Information Science College of Communications and Information Studies University of Kentucky. Social bookmarking: Del.icio.us.

Download Presentation

Kwan Yi School of Library and Information Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining a Web 2.0 service for the discovery of semantically similar terms: A case study with Del.icio.us Kwan Yi School of Library and Information Science College of Communications and Information Studies University of Kentucky

  2. Social bookmarking: Del.icio.us • Del.icio.us is one of most popular social bookmarking systems: • 3 million registered users and • 100 million unique URLs bookmarked, as of September 2007

  3. Folksonomy • We define folksonomy as a collective set of tags (keywords or terms) assigned by participants in a social tagging system. • User-created vocabulary • Uncontrolled vocabulary • Built in a collaborative manner

  4. Example: A folksonomy in Delicious.com Resource title Resource URL Resource taggers Popular tags Tagging history

  5. Objective of the Study • To examine an effective way of mining semantically similar terms from folksonomy for the purpose of investigating the feasibility of folksonomy as a potential data source of semantically similar terms

  6. Proposed algorithms for mining similar terms from Folksonomy • Co-occurrence-based similarity algorithm • Correlation-based similarity algorithm

  7. Experiment (I) • To identify similar terms of each of the 121 most popular tags on Del.icio.us posted on the fifteenth of May 2008

  8. Result: How many similar terms for the 121 popular tags? • Co-occurrence-based algorithm • 2.6 similar terms (Level of similarity = 0.9) • 5.1 similar terms (Level of similarity = 0.7) • 10.1 similar terms (Level of similarity = 0.5) • Correlation-based algorithm • 0.9 similar terms (Level of similarity = 0.9) • 1.6 similar terms (Level of similarity = 0.7) • 2.6 similar terms (Level of similarity = 0.5)

  9. Experiment (II) • To identify similar terms of each of the 32 tags (out of the 121) that are not listed on the online version of Merriam-Webster Dictionary

  10. Result: How many similar terms for the 32 not-in-the-dictionary tags? • Co-occurrence-based algorithm • 3.3 similar terms (Level of similarity = 0.9) • 5.9 similar terms (Level of similarity = 0.7) • 10.1 similar terms (Level of similarity = 0.5) • Correlation-based algorithm • 1 similar terms (Level of similarity = 0.9) • 1.7 similar terms (Level of similarity = 0.7) • 2.4 similar terms (Level of similarity = 0.5)

  11. Webdesign(similarity level: 0.9) • Co-occurrence [12]: resources css web design reference html tutorial tutorials inspiration gallery development webdev • Correlation [4]: css design html inspiration

  12. Findings • The correlation-based is more selective than the co-occurrence-based. • The co-occurrence-based appears to be most attractive with the similarity level of 0.7.

  13. Conclusion • As social bookmarking systems are more popularly utilized, the potential of their folksonomies for the mining task will be more increased.

  14. Thanks!

  15. Co-occurrence-based similarity algorithm (Identifying similar terms of the term W) 1 W (100) A (50) B (20) C (10) W (87) B (57) C (40) A (30) W (1032) A (250) F (120) D (78) W (37) A (29) B (16) F (9) 3 CoSA(s=1: A  W) CoSA(s=0.75: B  W) A (4) B (3) C (2) F (2) D (1) CoSA(s=0.5: C  W) CoSA(s=0.5: F  W) CoSA(s=0.25: D  W) 2

  16. Correlation-based similarity algorithm • Term X is said to be similar to term W on the basis of the correlation-based algorithm: CrSA(s: XW) • CrSA(s: XW) can be defined only if both CoSA(s: XW) and CoSA(s: WX) are satisfied.

More Related