Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Download Presentation

Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Loading in 2 Seconds...

- 97 Views
- Uploaded on
- Presentation posted in: General

Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Mikhail Bilenko, Sugato Basu, Raymond J. Mooney

ICML 2004

Presented by Xin Li

K=4

- Incorporate supervision as constraints
- Learn a distance metric using supervision
- Integration of these two approaches

X = {x1,x2,…}

L = {l1,l2,…,lk}

Euclidean Distance:

Minimizing:

Pairwise constraints:

- M – Must-link pairs
- (xi, xj) should be in the same cluster

- C -- Cannot-link pairs
- (xi, xj) should be in different clusters

Binary Classification: (xi, xj) 0/1

- M positive examples
- (xi, xj) are the same cluster

- C negative examples
- (xi, xj) are in different clusters

- Apply the learned distance metric in clustering
- Metric learning and clustering are disjointed

Maximizing the complete data log-likelihood under generalized K-means

Learn a distance metric that optimize a quality function

Combining the previous two equations leads to the following objective function that minimizes cluster dispersion under that learned metrics while reducing constraint violations.

- Penalty for violating a must-link constraints between distant points should be higher than that between nearby points.
- Penalty for violating a cannot-link constraints between nearby points should be lower than that between nearby points.

- Constraints are utilized during cluster initialization and when assigning points to clusters.
- The distance metric is adapted by re-estimating the weights in matrices Ah.

- An initial guess of the clusters.
- Assign each point x to one of K clusters in a way that satisfies the constraints.
- Compute the centroid of each cluster.

- Every point x is assigned to the cluster that minimizes the sum of the distance of x to the cluster centroid according to the local metric and the cost of any constraint violations incurred by the cluster assignment.

= 0

Update Metrics:

Single Metric, Diagonal Matrix A

Multiple Metrics, Full Matrix A

- This paper has presented MPCK-MEANS, a new approach to semi-supervised clustering.
- Supervision and metric learning are helpful in clustering and multiple distance metrics are not necessary in most cases.
- Question 1: If we have supervision in clustering, why not utilize supervision in the same way as in a typical classification task ?
- Question 2: If there are infinite number of classes, can we gain from supervision on part of them ?