1 / 34

Clustering on the Simplex

Clustering on the Simplex. Morten Mørup DTU Informatics Intelligent Signal Processing Technical University of Denmark. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A. Joint work with. Christian Walder DTU Informatics

vin
Download Presentation

Clustering on the Simplex

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering on the Simplex Morten Mørup DTU Informatics Intelligent Signal Processing Technical University of Denmark EMMDS 2009 July 3rd, 2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA

  2. Joint work with Christian Walder DTU Informatics Intelligent Signal Processing Technical University of Denmark Lars Kai Hansen DTU Informatics Intelligent Signal Processing Technical University of Denmark EMMDS 2009 July 3rd, 2009

  3. Clustering Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. (Wikipedia) EMMDS 2009 July 3rd, 2009

  4. Clustering approaches Assignmnt Step (S): Assign each data point to the cluster with closest mean value Update Step (C):Calculate the new mean value for each cluster • K-means iterative refinement algorithm (Lloyd, 1982; Hartigan, 1979) • Problem NP-complete (Megiddo and Supowit, 1984) Relaxations of the hard assigment problem: • Annealing approaches basedon temperature parameter(T0 the original clustering problem is recovered)(see for instance Hofmann and Buhmann, 1997) • Fuzzy clustering (Hathaway and Bezdek, 1988) • Expectation Maximization (Mixture of Gaussians) • Spectral Clustering Guarantee of optimality: No single change in assignment better than current assignment (1-spin stability). Drawbacks: Previously relaxations are either not exact or dependent on some problem specific annealing parameter in order to recover the original binary combinatorial assignments. EMMDS 2009 July 3rd, 2009

  5. From the K-means objective to Pairwise Clustering K-mean objective Pairwise Clustering (Buhmann and Hofmann, 1994) Ksimilarity matrix, K=XTXequivalent tothe k-means objective EMMDS 2009 July 3rd, 2009

  6. Although Clustering is hard there is room to be simple(x) minded! Binary Combinatorial (BC) Simplicial Relaxation (SR) EMMDS 2009 July 3rd, 2009

  7. The simplicial relaxation (SR) admits standard continuous optimization to solve for the pairwise clustering problems. For instance by normalization invariant projected gradient ascent: EMMDS 2009 July 3rd, 2009

  8. Synthetic data example K-means SR-clustering Brown and grey clusters each contain 1000 data-points in R2 Whereas the remaining clusters each have 250 data-points. EMMDS 2009 July 3rd, 2009

  9. SR-clustering algorithm driven by high density regions EMMDS 2009 July 3rd, 2009

  10. Thus, solutions in general substantially better than Lloyd’s algorithm having the same computational complexity SR-clustering (init=1) SR-clustering (init=0.01) Lloyd’s K-means EMMDS 2009 July 3rd, 2009

  11. K-means SR-clustering (init=1) SR-clustering (init=0.01) 10 components 50 components 100 components EMMDS 2009 July 3rd, 2009

  12. SR-clustering for Kernel based semi-supervised learning Kernel based semi-supervised learning based on pairwise clustering (Basu et al, 2004, Kulis et al. 2005, Kulis et al, 2009) EMMDS 2009 July 3rd, 2009

  13. Simplicial relaxation admit solving the problem as a (non-convex) continous optimization problem EMMDS 2009 July 3rd, 2009

  14. Class labels can be handled explicitly fixingMust and cannot links can be absorbed into the Kernel Hence the problem reduces more or less to standard SR-clustering problem for the estimation of S EMMDS 2009 July 3rd, 2009

  15. At stationarity we have that the gradients of elements in each column of S that are 1 are larger than elements that are 0. Thus, evaluating the impact of the supervision can be done estimating the minimal lagrange multipliers that guarantee stationarity of the solution obtained by the SR-clustering algorithm. This is a convex optimization problem Thus, Lagrange multipliers give a measure of conflict between the data and the supervision EMMDS 2009 July 3rd, 2009

  16. Digit classification with one miss-labeled data observation from each class. EMMDS 2009 July 3rd, 2009

  17. Community Detection in Complex Networks Communities/modules: a natural divisions of network nodes into densely connected subgroups (Newman & Girvan 2003) G(V,E) Adjacency Matrix A Permuted adjacency matrix PAPT Community detection algorithm Permutation P of graph from clustering assignment S EMMDS 2009 July 3rd, 2009

  18. Common Community detection objectives Hamiltonian (Fu & Anderson, 1986, Reichardt & Bornholdt, 2004) Modularity (Newman & Girvan, 2004) Generic problems of the form EMMDS 2009 July 3rd, 2009

  19. Again we can make an exact relaxation to the simplex! EMMDS 2009 July 3rd, 2009

  20. EMMDS 2009 July 3rd, 2009

  21. EMMDS 2009 July 3rd, 2009

  22. SR-clustering of complex networks Quality of solutions comparable to results obtained by extensive Gibbs sampling EMMDS 2009 July 3rd, 2009

  23. So far we have demonstrated how binary combinatorial constraints are recovered at stationarity when relaxing the problems to the simplex. However, simplex constraints also holds promising data mining properties of their own! EMMDS 2009 July 3rd, 2009

  24. The Convex Hull The Principal Convex Hull (PCH) Def: The convex hull/convex envelope of XRMNis the minimal convex set containing X.(Informally it can be described as a rubber band wrapped around the data points.) Finding the convex hull is solvable in linear time,O(N)(McCallum and D. Avis, 1979) However, the size of the convex set grows exponentially with the dimensionality of the data,O(logM-1(N)) (Dwyer, 1988) Def: The best convex set of size K according to some measure of distortion D(·|·) (Mørup et al. 2009). (Informally it can be described as a less flexible rubber band that wraps most of the data points.) EMMDS 2009 July 3rd, 2009

  25. The mathematical formulation of the Principal Convex Hull (PCH) is given by two simplex constraints ”Principal” in terms of the Frobenius norm C:Give the fraction in which observations in X are used to form each feature (distinct aspects/freaks). In general C will be very sparse!! S:Give the fraction each observation resembles each distinct aspects XC. X X C S  (note when K large enough such that the PCH recover the convex hull) EMMDS 2009 July 3rd, 2009

  26. Relation between the PCH model, low rank decomposition and clustering approaches PCH naturally bridges clustering and low-rank approximations! EMMDS 2009 July 3rd, 2009

  27. Two important properties of the PCH model The PCH model is invariant to affine transformation and scaling The PCH model is unique up to permutation of the components EMMDS 2009 July 3rd, 2009

  28. A feature extraction example More contrast in features than obtained by clustering approaches. As such, PCH aim for distict aspects/regions in data The PCH model strives to attain Platonic ”Ideal Forms” EMMDS 2009 July 3rd, 2009

  29. Data contain 3 components: High-Binding regions Low-binding regions Non-binding regions Each voxel given concentrationfraction of these regions PCH model for PET data(Positron Emission Tomography) XC S EMMDS 2009 July 3rd, 2009

  30. NMF spectroscopy of samples of mixtures of propanol butanol and pentanol. EMMDS 2009 July 3rd, 2009

  31. Collaborative filtering example Medium size and large size Movie lens data (www.grouplens.org) Medium size: 1,000,209 ratings of 3,952 movies by 6,040 users Large size: 10,000,054 ratings of 10,677 movies given by 71,567 EMMDS 2009 July 3rd, 2009

  32. Conclusion • The simplex offers unique data mining properties • Simplicial relaxations (SR) form exact relaxation of common hard assignment clustering problems, i.e. K-means, Pairwise Clustering and Community detection in graphs. • SR Enable to solve binary combinatorial problems using standard solvers from continuous optimization. • The proposed SR-clustering algorithm outperforms traditional iterative refinement algorithms • No need for annealing parameter. hard assignments guaranteed atstationarity (Theorem 1 and 2) • Semi-Supervised learning can be posed as continuous optimization problem with associated lagrange multipliers giving an evaluation measure of each supervised constraint EMMDS 2009 July 3rd, 2009

  33. Conclusion cont. • The Principal Convex Hull (PCH) formed by two types of simplex constraints • Extract distinct aspects of the data • Relevant for data mining in general where low rank approximation and clustering approaches have been invoked. EMMDS 2009 July 3rd, 2009

  34. A reformulation of ”Lex Parsimoniae” The simplest explanation is usually the best. - William of Ockham The simplex explanation is usually the best. Simplicity is the ultimate sophistication. - Leonardo Da Vinci Simplexity is the ultimate sophistication. The presented work is described in: M. Mørup and L. K. Hansen ”An Exact Relaxation of Clustering”, Submitted JMLR 2009 M. Mørup, C. Walder and L. K. Hansen ”Simplicial Semi-supervised Learning”, submitted M. Mørup and L. K. Hansen ” Platonic Forms Revisited”, submitted EMMDS 2009 July 3rd, 2009

More Related