conmf exploiting user comments for clustering web2 0 items
Download
Skip this Video
Download Presentation
CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Loading in 2 Seconds...

play fullscreen
1 / 17

CoNMF: Exploiting User Comments for Clustering Web2.0 Items - PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on

CoNMF: Exploiting User Comments for Clustering Web2.0 Items. Presenter: He Xiangnan 28 June 2013 Email: [email protected] School of Computing National University of Singapore. Introduction. Motivations: Users comment on items based on their own interests.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CoNMF: Exploiting User Comments for Clustering Web2.0 Items' - minya


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
conmf exploiting user comments for clustering web2 0 items

CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Presenter: He Xiangnan

28 June 2013

Email: [email protected]

School of Computing

National University of Singapore

introduction
Introduction
  • Motivations:
    • Users comment on items based on their own interests.
    • Most users’ interests are limited.
    • The categories of items can be inferred from the comments.
  • Proposed problem:
    • Clustering items by exploiting user comments.
  • Applications:
    • Improve search diversity.
    • Automatic tag generation from comments.
    • Group-based recommendation

WING, NUS

challenges
Challenges
  • Traditional solution:
    • Represent items as a feature space.
    • Apply any clustering algorithm, e.g. k-means.
  • Key challenges:
    • Items have heterogeneous features:
      • Own features (e.g. words for articles, pixels for images)
      • Comments
        • Usernames
        • Textual contents
    • Simply concatenate all features does not preform well.
    • How to meaningfully combine the heterogeneous views to produce better clustering (i.e. multi-view clustering)?

WING, NUS

proposed solution
Proposed solution
  • Extend NMF (Nonnegative Matrix Factorization) to support multi-view clustering…

WING, NUS

nmf non negative matrix factorization
NMF (Non-negative Matrix Factorization)
  • Factorize data matrix V (#doc×#words) as:
    • where W is #doc×k and H is k×#words, and each entry is non-negative
  • Goal is minimizing the objective function:
    • where || || denotes the Frobenius norm
  • Alternating optimization:
    • With Lagrange multipliers, differentiate on W and H respectively.

Local optimum, not global!

WING, NUS

characteristics of nmf
Characteristics of NMF
  • Matrix Factorization with a non-negative constraint
    • Reduce the dimension of the data; derive the latent space
  • Difference with SVD(LSI):
  • Theoretically proved suitable for clustering (Chis et al. 2005)
  • Practically shown superior performance than SVD and k-means in document clustering (Xu et al. 2003)
extensions of nmf
Extensions of NMF
  • Relationships with other clustering algorithms:
    • K-means: Orthogonal NMF = K-means
    • PLSI: KL-Divergence NMF = PLSI
    • Spectral clustering
  • Extensions:
    • Tri-factor of NMF( V = WSH ) (Ding et al. 2006)
    • NMF with sparsity constraints (Hoyer 2004)
    • NMF with graph regularization (Cai et al. 2011)
    • However, studies on NMF-based multi-view clustering approaches are quite limited. (Liu et al. 2013)
  • My proposal:
    • Extend NMF to support multi-view clustering

WING, NUS

proposed solution conmf
Proposed solution - CoNMF
  • Idea:
    • Couple the factorization process of NMF
  • Example:
    • Single NMF:
      • Factorization equation:
      • Objective function:
      • Constraints: all entries of W and H are non-negative.
  • - 2-view CoNMF:
    • Factorization equation:
    • Objective function:

WING, NUS

conmf framework
CoNMF Framework
  • Mutual-based:
    • Point-wise:
    • Cluster-wise:
  • Coupling the factorization process of multiple matrices(i.e. views) via regularization.
  • Objective function:
    • Similar alternating optimization with Lagrange multipliers can solve it.
  • Different options of regularization:
    • Centroid-based (Liu et al. 2013):

WING, NUS

experiments
Experiments
  • Last.fm dataset:
  • 3-views:
  • Ground-truth:
    • Music type of each artist provided by Last.fm
  • Evaluation metrics:
    • Accuracy and F1
  • Average performance of 20 runs.

WING, NUS

statistics of datasets
Statistics of datasets

Statistics of #items/user

Statistics of #clusters/user

P(T<=3) = 0.6229

P(T<=5) = 0.8474

P(T<=10) = 0.9854

Verify our assumption: each user usually comments on limited music types.

WING, NUS

experimental results accuracy
Experimental results (Accuracy)

1. Users>Comm.>Desc., while combined is best.

2. SVD performs badly on users (non-textual).

3. Users>Comm.>Desc., while combined does worse.

4. Initialization is important for NMF.

5. CoNMF-point performs best.

6. Other two state-of-the-art baselines.

WING, NUS

conclusions
Conclusions
  • Comments benefit clustering.
  • Mining different views from the comments is important:
    • The two views (commenting words and users) contribute differently for clustering.
    • For this Last.fm dataset, users is more useful.
    • Combining all views works best.
  • For NMF-based methods, initialization is important.

WING, NUS

ongoing
Ongoing
  • More experiments on other datasets.
  • Improve the CoNMF framework through adding the sparseness constraints.
  • The influence of normalization on CoNMF.

WING, NUS

slide16

Thanks!

QA?

WING, NUS

references i
References(I)
  • Ding Chris, Xiaofeng He, and Horst D. Simon. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In  Proc. SIAM Data Mining Conf 2005.
  • Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-negative matrix factorization. In Proc. of SIGIR 2003
  • Chris Ding, Tao Li, Wei Peng. 2006. Orthogonal nonnegative matrix tri-factorizations for clustering. In Proc. of SIGKDD 2006
  • Patrik O. Hoyer. 2004. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Researh 2004
  • Deng Cai, Xiaofei He, Jiawei Han, and Thomas S. Huang. 2011. Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011 
  • Jialu Liu, Chi Wang, Jing Gao and Jiawei Han. 2013. Multi-View Clustering via Joint Nonnegative Matrix Factorization, In Proceedings of SIAM Data Mining Conference (SDM’13)

WING, NUS

ad