400 likes | 414 Views
Spectral Clustering. Most slides are from Eyal David’s presentation. Dataset exhibits complex cluster shapes K-means performs very poorly in this space due bias toward dense spherical clusters. In the embedded space given by two leading eigenvectors, clusters are trivial to separate.
E N D
Spectral Clustering Most slides are from Eyal David’s presentation
Dataset exhibits complex cluster shapes • K-means performs very poorly in this space due bias toward dense spherical clusters. In the embedded space given by two leading eigenvectors, clusters are trivial to separate. Spectral Clustering Example – 2 Spirals
Why? • If we eventually use K-means, why not just apply K-means to the original data? • This method allows us to cluster non-convex regions
User’s Prerogative • Affinity matrix construction • Choice of scaling factor • Realistically, search over and pick value that gives the tightest clusters • Choice of k, the number of clusters • Choice of clustering method
Largest eigenvalues of Cisi/Medline data λ1 • Choose k=2 λ2 How to select k? • Eigengap: the difference between two consecutive eigenvalues. • Most stable clustering is generally given by the value k that maximises the expression
Summary • Spectral clustering can help us in hard clustering problems • The technique is simple to understand • The solution comes from solving a simple algebra problem which is not hard to implement • Great care should be taken in choosing the “starting conditions”