1 / 13

Efficient Semi-supervised Spectral Co-clustering with Constraints

Efficient Semi-supervised Spectral Co-clustering with Constraints. Xiaoxiao Shi, Wei Fan, Philip S. Yu. Motivation. Co-clustering with constraints Document-word co-clustering. C. How to use?. Co-clustering. Network. Clustering. Doc 1 (ICNP). Doc 2 (ICDM). Doc 3 (AAAI). C. Doc 4 (KDD).

shilah
Download Presentation

Efficient Semi-supervised Spectral Co-clustering with Constraints

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Semi-supervised Spectral Co-clustering with Constraints Xiaoxiao Shi, Wei Fan, Philip S. Yu

  2. Motivation • Co-clustering with constraints • Document-word co-clustering C How to use? Co-clustering Network Clustering Doc 1 (ICNP) Doc 2 (ICDM) Doc 3 (AAAI) C Doc 4 (KDD)

  3. Motivation • Co-clustering with constraints • Author-conference co-clustering Collaborators Collaborators John Mary Jack Cathy Tom How to use? ICDM 07 ICDM 08 ICDM 09 AAAI 08 AAAI 09 ICDM AAAI

  4. Straightforward solution I: transform constraints as edges, and solve global graph partition problem Keyword-conference co-clustering ICDM ICDM Co-clustering Co-clustering Cut I KDD KDD Clustering Clustering AAAI AAAI Cut II Network Network ICNP ICNP

  5. Straightforward solution II: transform constraints as nodes, and solve bipartite graph partition problem in a larger graph Pseudo node Pseudo node ICDM Co-clustering ICDM Co-clustering Cut I KDD KDD Clustering Clustering AAAI Cut II AAAI Network Network ICNP ICNP

  6. Problems of the two straightforward solutions • Not efficient • more edges are added; more nodes are included • (10 to 80 times slower than the original co-clustering without constraint) • Not effective • The graph becomes more complicated, of which the optimal partition is more difficult to find • (In some cases, the Normalized Mutual Information drops 30% compared with the original co-clustering without constraint)

  7. Formulate the problem as an optimization problem The solution can be directly obtained via the left and right eigenvectors of the following matrix (more details in Theorem 2 of the paper): Minimize the number of inter-group edges Maximize the number of satisfied constraints Graph Laplacian

  8. Algorithm Flow

  9. Experiments • Document-word co-clustering

  10. Experiments • Graph-pattern co-clustering

  11. Results

  12. Conclusions • For many applications, some prior knowledge exists about the relationship among rows and columns for co-clustering applications. Problem: how to use the knowledge (constraints) to find better co-clusters? • Two straightforward solutions • Model the constraints as linkages • Model the constraints as additional pseudo nodes • Problem: not efficient; not effective • Proposed method: model the problem as an optimization problem, and solve it with the selected eigenvectors

  13. Related Work • Traditional Co-clustering without constraint • Information based co-clustering • Information-theoretic co-clustering (Dhillon, etc 2003) • Partition based co-clustering • Spectral co-clustering (Dhillon, etc 2001) • Previous constraint-based co-clustering models • Co-clustering with row constraint (Chen, etc 2008) • Co-clustering with order based constraint (suitable for a specific type of constraint, not comparable with the proposed model; Pensa. Etc 2008) • Straightforward modifications of traditional co-clustering models to use constraints: • Link-based constraint co-clustering • Node-based constraint co-clustering

More Related