1 / 34

Document Recommendation in Social Tagging Services

Document Recommendation in Social Tagging Services. Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai , and X. He Zhejiang University, China WWW 2010 July 22, 2010 Hyunwoo Kim. Contents. Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion. Introduction [1/5].

miriam
Download Presentation

Document Recommendation in Social Tagging Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July 22, 2010 Hyunwoo Kim

  2. Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion

  3. Introduction [1/5] • Social tagging services • Allowing users to annotate various online resources with tags • Facilitating the users in finding and organizing online resources • Providing meaningful collaborative semantic data • Recommender systems • Focusing on user rating data in traditional studies • Social tagging data is becoming more and more prevalent recently • In this paper • The problem of document recommendation using purely tagging data

  4. Introduction [2/5] • Searching in most tagging services • Keyword-based search • The number of returned results is very large • Returning resources which literally match the given tags • Ignoring semantically related tags • Searching for automobile→ resources tags by car may not be retrieved

  5. Introduction [3/5] • Differences between tagging data and rating data • Tagging data doesn’t have users’ explicit preference information on resources • Tagging data: user, tag and resource • Rating data: user and resource • Collaborative filtering method

  6. Introduction [4/5] • Multi-type Interrelated Objects Embedding (MIOE) • Annotation relationships between tags and documents • Usage relationships between tags and users • Bookmarking relationships between users and documents • Affinity relationships among documents • 3 bipartite graphs and 1 affinity graph • Optimal semantic space • Preserving the connectivity structure of these graphs • Representing users, tags and documents in the same space if (two objects are strongly connected) { the corresponding edge has a high weight; two object should be mapped close to each other in the space; }

  7. Introduction [5/5] • Goal of MIOE • Given a user, the closest documents which have not been bookmarked by this user are recommended to her • Naturally capturing the correlations among tags • Applied to any social tagging data as long as a notion of similarity between resources is defined

  8. Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion

  9. Multi-type Interrelated Objects Embedding [1/7] The basic intuition behind MIOE if (a user u has used a tag t many times) { she has strong interest in the topic represented by the tag t; } if(t has been applied to document d many times) { d is strongly related to the topic represented by t; } We should recommend such document d to the user u;

  10. MIOE [2/7]- Learning the Optimal Semantic Space

  11. MIOE [3/7]- Learning the Optimal Semantic Space y : documents : users : tags x z Representing users, tags and documents in the same space Strongly connected two objects should be mapped close to each other in the learned space

  12. MIOE [4/7]- Learning the Optimal Semantic Space • The problem • Finding a semantic space for users, tags and document which best preserves the connectivity structures of graphs • Annotation relationship, usage relationship, bookmark relationship and affinity relationship • Given a user, recommending a list of document in which the users would be interested with the highest probabilities M. Belkin et al., “LaplacianEigenmaps and Spectral Techniques for Embedding and Clustering”, Advances in Neural Information Processing Systems 14, 2001 W. Min et al., “Locality Pursuit Embedding”, Pattern Recognition 37, 2004 X. He et al., “Learning a Maximum Margin Subspace for Image Retrieval”, IEEE Transactions on Knowledge and Data Engineering 20, 2008

  13. MIOE [5/7]- Learning the Optimal Semantic Space • Projections* • PCA (Principal Component Analysis) • LPE (Locality Pursuit Embedding) * W. Min et al., “Locality pursuit Embedding”, Pattern Recognition 37, 2004

  14. MIOE [6/7]- Learning the Optimal Semantic Space A(a) B(b) B(b1, b2) A(a1, a2) B(b1, b2, b3) A(a1, a2, a3) • Distance metric: Euclideandistance

  15. MIOE [7/7]- Learning the Optimal Semantic Space • In practice • New objects will continually join in the tagging data • Re-computing the optimal space for each new object is costly • Solution • Approximating the positions of new objects in the learned space by using approximated eigenfunctions based on the kernel trick* • Re-computing the optimal space periodically * Y. Bengio et al., “Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering”, Advances in Neural Information Processing Systems 16, 2003

  16. Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion

  17. Experiments [1/6] • Data sets: Del.icio.us and CiteULike • Compared Algorithms • User-CF: a version of user-based CF algorithm for unary data • Funk-SVD: Singular Vector Decomposition to approximate the original user-item matrix using a low rank matrix • TVS: Tag Vector Similarity to represent users and document in the tag space as TF-IDF tag profile vectors • CVS: Content Vector Similarity to maintain multiple for a user to better capture the user’s interests

  18. Experiments [2/6] • Evaluation methodology • Total 300 users • 270 users as training users • 30 users as test users • 50% bookmarks are used for model construction (training) • Remaining 50% bookmarks are used for evaluation (ground truth) • Evaluation metrics • Precision • Mean Average Precision (MAP) • Normalized Discount Cumulative Gain (NDCG)

  19. Experiments [3/6]

  20. Experiments [4/6]

  21. Experiments [5/6]

  22. Experiments [6/6] • Case studies • Recommended Web pages • Nearest tags

  23. Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion

  24. Conclusion • Focusing on the problem of document recommendation in social tagging services • Modeling as a representation learning problem • Proposing a novel semantic space learning algorithm (MIOE) • Optimal semantic space for users, tags and documents by keeping related objects close in the target space • Future work • Examining tag ambiguity issue which is harmful to MIOE • Improving MIOE’s scalability to be applied to very large datasets

  25. Thank You

  26. Appendix [1/9] Q(f, g, p):cost function f: |U|x1 vector for U, fiis the coordinate of uion the line g: |T|x1 vector for T, giis the coordinate of tion the line p: |D|x1 vector for D, piis the coordinate of dion the line Rut, Rtd, Rud: weighted adjacent matrices W: affinity matrix

  27. Appendix [2/9] Dut: diagonal matrix, (i, i)th-elements equal to the sum of the i-th row ofRut Dtu: diagonal matrix, (i, i)th-elements equal to the sum of the i-th column of Rut

  28. Appendix [3/9] Dtd: diagonal matrix, (i, i)th-elements equal to the sum of the i-th row of Rtd Ddt: diagonal matrix, (i, i)th-elements equal to the sum of the i-th column of Rtd Dtd: diagonal matrix, (i, i)th-elements equal to the sum of the i-th row of Rud Ddt: diagonal matrix, (i, i)th-elements equal to the sum of the i-th column of Rud

  29. Appendix [4/9] Using graph Laplacian matrix* D: diagonal matrix, (i, i)-th elements equal to the sum of the i-th row of W W: affinity matrix * M. Belkin et al., “LaplacianEigenmaps and Spectral Techniques for Embedding and Clustering”, Advances in Neural Information Processing Systems 14, 2001

  30. Appendix [5/9] Using Rayleigh quotient*in order to remove an arbitrary scaling factor * J. Ham et al., “Semisupervised alignment of manifolds”, the Annual Conference on Uncertainty in Artificial Intelligence, 2005

  31. Appendix [6/9] Using Rayleigh quotient

  32. Appendix [7/9] • By the Rayleigh-Ritz theorem* • The solution of this optimization problem is given by the eigenvector corresponding to the second smallest eigenvalue of * H. Lutkepohl, “Handbook of Matrices”, Wiley, 1996

  33. Appendix [8/9] Maximizing the global variance in the target subspace instead of maximizing The variance of f, g and p* * F. R. K. Chung, “Spectral Graph Theory”, American Mathematical Society, 1997

  34. Appendix [9/9] The optimization problem becomes This optimization problem can be solved by finding the generalized eigenvector corresponding to the second smallest eigenvalue of

More Related