1 / 32

CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS

CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS. Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University of Hong Kong {bbi, sdlee, kao, ckcheng}@cs.hku.hk. SOCIAL TAGGING SYSTEMS. Tags. SEARCH IN SOCIAL TAGGING SYSTEMS. Two Problems:

aideen
Download Presentation

CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CUBELSI: AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University of Hong Kong {bbi, sdlee, kao, ckcheng}@cs.hku.hk

  2. SOCIAL TAGGING SYSTEMS Tags

  3. SEARCH IN SOCIAL TAGGING SYSTEMS • Two Problems: • Tag Inconsistency • A Multitude of Aspects

  4. Tag Inconsistency car? automobile? car car, Benz car car, automobile Audi car, automobile automobile

  5. A Multitude of Aspects moon, worm moon, Perigee moon, lunar cherry blossoms, Sakura, cherry blossom Nikon, astrophotography, D40

  6. SOLUTION • Analyzing semantic relations among tags by taking into account the role of taggers LSI (Latent Semantic Indexing) CubeLSI Taggers SVD (Singular Value Decomposition) Tucker Decomposition

  7. PROPOSED RANKING FRAMEWORK CubeLSI Algorithm: Input: tag assignments Output: pairwise tag semantic distances

  8. CONCEPT DISTILLATION photo photo photos photos music music mp3 mp3 video video movie movie Tags with pairwise distances Concepts/Clusters

  9. PROPOSED RANKING FRAMEWORK

  10. BAG-OF-CONCEPTS REPRESENTATION Distilled Concepts

  11. PROPOSED RANKING FRAMEWORK

  12. PROPOSED RANKING FRAMEWORK

  13. RANKING SEARCH RESULTS y • Search results are sorted in descending order of their Cosine similarity scores. Query Resource 1 x Resource 2 z

  14. PROPOSED RANKING FRAMEWORK CubeLSI Algorithm: Input: tag assignments Output: pairwise tag semantic distances

  15. CUBELSI • Tensor Second-order Tensor Third-order Tensor

  16. REPRESENTING DATA AS A THIRD-ORDER TENSOR

  17. PAIRWISE TAG DISTANCE • Two sources of noise: • may not result from user considering tag to be irrelevant to • Tagging is a casual and ad-hoc activity

  18. TUCKER DECOMPOSITION Resource User Tag User User Resource Resource core tensor factor matrices Tag Tag original tensor purified tensor Purified Tag Distance:

  19. SPACE & TIME COSTS • Last.fm dataset (3897 users, 3326 tags, 2849 resources) Space cost: 36.9 billion entries Computational cost: 11.1 million entries Computing the Frobenius-norm for EACH tag pair requires 11.1 million subtractions, squaring and additions. There are a total of 5.5 million tag pairs for 3326 tags ! The amount of computations needed would be prohibitively huge!!!

  20. SHORT-CUT TO EVALUATING impractical • The new formula depends only on core tensor and factor matrix • There is no need to compute any entries of purified tensor • The relatively low dimensions of and implies much fewer computations needed is a matrix that can be readily computed from the core tensor

  21. EXPERIMENTAL RESULTS #users #records #resources #tags Dataset statistics

  22. SAMPLE TAG CLUSTERS

  23. OTHER RANKING METHODS • Freq: Resources are ranked in descending order of # of users who annotate the resource with query tags. • BOW (Bag-of-Words) : Use IR; each resource is a document and each tag is a word. • FolkRank [Hotho et al. 2006]: A modified version of PageRank. It follows the assumption that votes cast by important users with important tags would make the annotated resources important.

  24. OTHER RANKING METHODS • LSI: This method projects the third-order tensor onto a 2D tag-resource matrix, and then applies traditional LSI on the tag-resource matrix using SVD. • CubeSim: This method is similar to CubeLSI except that it computes the distance between two tags and directly from the original tensor by

  25. 16 users, each proposing 8 queries RANKING QUALITY • Evaluation Metric • Normalized Discounted Cumulative Gain (NDCG) • NDCG rewards more heavily to relevant resources that are top-rankedthan those that appear lower down in the list. where denotes that the metric is evaluated only on the resources that are ranked top in the list, is the relevance level of the resource ranked in the list, and is a normalization factor that is chosen so that the optimal ranking’s NDCG score is 1.

  26. RANKING QUALITY (Delicious)

  27. RANKING QUALITY (Bibsonomy)

  28. RANKING QUALITY (Last.fm)

  29. EFFICIENCY • Offline: pre-processing times (hours) • Online: query processing times (seconds) • Storage size:

  30. RELATED WORK • Matrix Factorization • Our work differs from MF in two ways: • We aim at capturing semantic relations among tags. • We deal with a three-dimensional tensor. • Hotho et al. 2006 • Our work differs from FolkRank in that our approach performs offline semantic analysis, which allows online query processing to be efficiently done. • Wu et al. 2006 • Our approach is technically different from that work. • Bi et al. 2009 • Our approach scales to large social tagging databases, which the previous work is unable to handle.

  31. CONCLUSIONS • We introduce a novel tag-based framework for searching resources in social tagging systems. • We study the role of taggers in search quality for social tagging systems. • We propose CubeLSI, which is a 3D extension of LSI, for semantic analysis over the third-order tensor of resources, taggers, and tags. • We present a comprehensive empirical evaluation of CubeLSI against a number of ranking methods on real datasets.

  32. THANK YOU! Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University of Hong Kong {bbi, sdlee, kao, ckcheng}@cs.hku.hk

More Related