Integrating constraints and metric learning in semi supervised clustering
Download
1 / 21

Integrating Constraints and Metric Learning in Semi-Supervised Clustering - PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on

Integrating Constraints and Metric Learning in Semi-Supervised Clustering. Mikhail Bilenko, Sugato Basu, Raymond J. Mooney ICML 2004 Presented by Xin Li. Semi-Supervised Clustering. K=4. Semi-Supervised Clustering. Semi-Supervised Clustering. How to exploit supervision in clustering.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Integrating Constraints and Metric Learning in Semi-Supervised Clustering' - calvin-skinner


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Integrating constraints and metric learning in semi supervised clustering

Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Mikhail Bilenko, Sugato Basu, Raymond J. Mooney

ICML 2004

Presented by Xin Li


Semi supervised clustering
Semi-Supervised Clustering Semi-Supervised Clustering

K=4


Semi supervised clustering1
Semi-Supervised Clustering Semi-Supervised Clustering


Semi supervised clustering2
Semi-Supervised Clustering Semi-Supervised Clustering


How to exploit supervision in clustering
How to exploit supervision in clustering Semi-Supervised Clustering

  • Incorporate supervision as constraints

  • Learn a distance metric using supervision

  • Integration of these two approaches


K means clustering
K-means Clustering Semi-Supervised Clustering

X = {x1,x2,…}

L = {l1,l2,…,lk}

Euclidean Distance:

Minimizing:


Clustering with constraints
Clustering with constraints Semi-Supervised Clustering

Pairwise constraints:

  • M – Must-link pairs

    • (xi, xj) should be in the same cluster

  • C -- Cannot-link pairs

    • (xi, xj) should be in different clusters


Learning a pairwise distance metric
Learning a pairwise distance metric Semi-Supervised Clustering

Binary Classification: (xi, xj)  0/1

  • M  positive examples

    • (xi, xj) are the same cluster

  • C  negative examples

    • (xi, xj) are in different clusters

  • Apply the learned distance metric in clustering

  • Metric learning and clustering are disjointed


Unsupervised clustering with metric learning

Maximizing the complete data log-likelihood under generalized K-means

Unsupervised Clustering with Metric Learning

Learn a distance metric that optimize a quality function


Integrating constraints and metric learning
Integrating Constraints and Metric Learning generalized K-means

Combining the previous two equations leads to the following objective function that minimizes cluster dispersion under that learned metrics while reducing constraint violations.


Penalty for violating constraints
Penalty for violating constraints generalized K-means

  • Penalty for violating a must-link constraints between distant points should be higher than that between nearby points.

  • Penalty for violating a cannot-link constraints between nearby points should be lower than that between nearby points.


Mpck means algorithm
MPCK-MEANS Algorithm generalized K-means

  • Constraints are utilized during cluster initialization and when assigning points to clusters.

  • The distance metric is adapted by re-estimating the weights in matrices Ah.


Initialization
Initialization generalized K-means

  • An initial guess of the clusters.

  • Assign each point x to one of K clusters in a way that satisfies the constraints.

  • Compute the centroid of each cluster.


E step
E-step generalized K-means

  • Every point x is assigned to the cluster that minimizes the sum of the distance of x to the cluster centroid according to the local metric and the cost of any constraint violations incurred by the cluster assignment.


M step

= 0 generalized K-means

Update Metrics:

M-Step


Experimental setting
Experimental Setting generalized K-means




Multiple Metrics, Full Matrix A generalized K-means


Multiple metrics full matrix a
Multiple Metrics, Full Matrix A generalized K-means


Conclusion and discussion
Conclusion and Discussion generalized K-means

  • This paper has presented MPCK-MEANS, a new approach to semi-supervised clustering.

  • Supervision and metric learning are helpful in clustering and multiple distance metrics are not necessary in most cases.

  • Question 1: If we have supervision in clustering, why not utilize supervision in the same way as in a typical classification task ?

  • Question 2: If there are infinite number of classes, can we gain from supervision on part of them ?


ad