Conmf exploiting user comments for clustering web2 0 items
Download
1 / 17

CoNMF: Exploiting User Comments for Clustering Web2.0 Items - PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on

CoNMF: Exploiting User Comments for Clustering Web2.0 Items. Presenter: He Xiangnan 28 June 2013 Email: [email protected] School of Computing National University of Singapore. Introduction. Motivations: Users comment on items based on their own interests.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CoNMF: Exploiting User Comments for Clustering Web2.0 Items' - minya


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Conmf exploiting user comments for clustering web2 0 items

CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Presenter: He Xiangnan

28 June 2013

Email: [email protected]

School of Computing

National University of Singapore


Introduction
Introduction

  • Motivations:

    • Users comment on items based on their own interests.

    • Most users’ interests are limited.

    • The categories of items can be inferred from the comments.

  • Proposed problem:

    • Clustering items by exploiting user comments.

  • Applications:

    • Improve search diversity.

    • Automatic tag generation from comments.

    • Group-based recommendation

WING, NUS


Challenges
Challenges

  • Traditional solution:

    • Represent items as a feature space.

    • Apply any clustering algorithm, e.g. k-means.

  • Key challenges:

    • Items have heterogeneous features:

      • Own features (e.g. words for articles, pixels for images)

      • Comments

        • Usernames

        • Textual contents

    • Simply concatenate all features does not preform well.

    • How to meaningfully combine the heterogeneous views to produce better clustering (i.e. multi-view clustering)?

WING, NUS


Proposed solution
Proposed solution

  • Extend NMF (Nonnegative Matrix Factorization) to support multi-view clustering…

WING, NUS


Nmf non negative matrix factorization
NMF (Non-negative Matrix Factorization)

  • Factorize data matrix V (#doc×#words) as:

    • where W is #doc×k and H is k×#words, and each entry is non-negative

  • Goal is minimizing the objective function:

    • where || || denotes the Frobenius norm

  • Alternating optimization:

    • With Lagrange multipliers, differentiate on W and H respectively.

Local optimum, not global!

WING, NUS


Characteristics of nmf
Characteristics of NMF

  • Matrix Factorization with a non-negative constraint

    • Reduce the dimension of the data; derive the latent space

  • Difference with SVD(LSI):

  • Theoretically proved suitable for clustering (Chis et al. 2005)

  • Practically shown superior performance than SVD and k-means in document clustering (Xu et al. 2003)


Extensions of nmf
Extensions of NMF

  • Relationships with other clustering algorithms:

    • K-means: Orthogonal NMF = K-means

    • PLSI: KL-Divergence NMF = PLSI

    • Spectral clustering

  • Extensions:

    • Tri-factor of NMF( V = WSH ) (Ding et al. 2006)

    • NMF with sparsity constraints (Hoyer 2004)

    • NMF with graph regularization (Cai et al. 2011)

    • However, studies on NMF-based multi-view clustering approaches are quite limited. (Liu et al. 2013)

  • My proposal:

    • Extend NMF to support multi-view clustering

WING, NUS


Proposed solution conmf
Proposed solution - CoNMF

  • Idea:

    • Couple the factorization process of NMF

  • Example:

    • Single NMF:

      • Factorization equation:

      • Objective function:

      • Constraints: all entries of W and H are non-negative.

  • - 2-view CoNMF:

    • Factorization equation:

    • Objective function:

WING, NUS


Conmf framework
CoNMF Framework

  • Mutual-based:

    • Point-wise:

    • Cluster-wise:

  • Coupling the factorization process of multiple matrices(i.e. views) via regularization.

  • Objective function:

    • Similar alternating optimization with Lagrange multipliers can solve it.

  • Different options of regularization:

    • Centroid-based (Liu et al. 2013):

WING, NUS


Experiments
Experiments

  • Last.fm dataset:

  • 3-views:

  • Ground-truth:

    • Music type of each artist provided by Last.fm

  • Evaluation metrics:

    • Accuracy and F1

  • Average performance of 20 runs.

WING, NUS


Statistics of datasets
Statistics of datasets

Statistics of #items/user

Statistics of #clusters/user

P(T<=3) = 0.6229

P(T<=5) = 0.8474

P(T<=10) = 0.9854

Verify our assumption: each user usually comments on limited music types.

WING, NUS


Experimental results accuracy
Experimental results (Accuracy)

1. Users>Comm.>Desc., while combined is best.

2. SVD performs badly on users (non-textual).

3. Users>Comm.>Desc., while combined does worse.

4. Initialization is important for NMF.

5. CoNMF-point performs best.

6. Other two state-of-the-art baselines.

WING, NUS



Conclusions
Conclusions

  • Comments benefit clustering.

  • Mining different views from the comments is important:

    • The two views (commenting words and users) contribute differently for clustering.

    • For this Last.fm dataset, users is more useful.

    • Combining all views works best.

  • For NMF-based methods, initialization is important.

WING, NUS


Ongoing
Ongoing

  • More experiments on other datasets.

  • Improve the CoNMF framework through adding the sparseness constraints.

  • The influence of normalization on CoNMF.

WING, NUS


Thanks!

QA?

WING, NUS


References i
References(I)

  • Ding Chris, Xiaofeng He, and Horst D. Simon. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In  Proc. SIAM Data Mining Conf 2005.

  • Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-negative matrix factorization. In Proc. of SIGIR 2003

  • Chris Ding, Tao Li, Wei Peng. 2006. Orthogonal nonnegative matrix tri-factorizations for clustering. In Proc. of SIGKDD 2006

  • Patrik O. Hoyer. 2004. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Researh 2004

  • Deng Cai, Xiaofei He, Jiawei Han, and Thomas S. Huang. 2011. Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011 

  • Jialu Liu, Chi Wang, Jing Gao and Jiawei Han. 2013. Multi-View Clustering via Joint Nonnegative Matrix Factorization, In Proceedings of SIAM Data Mining Conference (SDM’13)

WING, NUS


ad