Clustering, Self_Organizing Feature Map

Clustering, Self_Organizing Feature Map 2003. 11.3 Yoonyoung Nam Shakeel,M. YoungIn Yeo

Clustering

Outliers Cluster1 Cluster 2 Introduction • Cluster • Group of the similar objects • Clustering • Special method of classification • Unsupervised learning : no predefined classes

What is Good Clustering? • High Intra-cluster similarity • Dissimilar to the objects in other clusters • Low Inter-cluster similarity • Similar to one another within the same cluster Depending on the similarity measure

The problem of unsupervised clustering • Nearly identical to that of distribution estimation for classes with multi-modal features Example of 4 data sets with the same mean and covariance

Similarity Measures • The distance between them • If the Euclidean distance between them is less than some threshold distance d0, • same cluster

A simple scaling of the coordinate axes can result in a different grouping of the data into clusters

To achieve invariance, normalize the data • Subtracting the mean and dividing by the standard deviation • Inappropriate if the spread is due to the presence of subclasses

Mahalanobis distance • Similarity function s(x, x’) • Using the angle between two vectors, normalized inner product may be an appropriate similarity function.

Tanimoto coefficient • Using binary values • The ratio of the number of shared attributes to the number possessed by x or x’ • =

Category of Clustering Method • Hierarchical Clustering • Group objects into a tree of clusters • AGNES(Agglomerative Nesting) • DIANA(Divisible Analysis) • Partitioning Clustering • Construct a partition of a object V into a set of k clusters (k : user input parameter) • K-means • K-medoids

agglomerative(AGNES) d e a b c d e a b e d c d e b a c divisible(DIANA) 0step 4step 3step 1step 2step 2step 3step 1step 4step 0step Hierarchical Method

Hierarchical Method • Algorithm for Agglomerative • Input : Set V of objects • Put each object in a cluster • Loop until the number of cluster is one • Calculate the set of inter-cluster similarity • Form merge by the fusion of the most similar pair of current clusters

Single-Link Cj Ci Complete-Link Hierarchical Method • Similarity Method • Single-Linkage • Complete-Linkage • Average-Linkage

K-means • Use gravity center of the objects • Algorithm • Input : k(the number of cluster), Set V of n objects Output : A set of k clusters which minimizes the sum of distance error criterion Method: • Choose k objects as the initial cluster centers; set i=0 • Loop • For each object v • Find the NearestCenter(i)(p), and assign p to it • Compute mean of cluster as center • Pro : quick convergence • Con : sensitive to noise, outlier and initial seed selection

centre object K-means

K-means clustering • Choose k and v

K-means clustering • Assign each object to the cluster to which it is the closest • Compute the center of each cluster

K-means clustering • Reassign subjects to the cluster whose centroid is nearest

K-medoids • Medoid : the object whose average dissimilarity to all the objects in the cluster is minimal. • Algorithm Input : k (the number of cluster), Set V of n objects Output : A set of k clusters which minimizes the sum of distance error criterion Method : • Choose k objects as the initial cluster centers; set i=0 • Loop • For each object v • Find the NearestCenter(i)(p), and assign p to it • Randomly select a non-centre object orandom • Compute the total cost S of swapping oj with orandom to form new set • If S<0, swap oj with orandom • Break when threshold is met

oi oi oi oi p p p p oj oj oj oj orandom orandom orandom K-medoids • Swapping cases orandom 1.re-assigned to oi 2.re-assigned to orandom 3.no change 4.re-assigned to orandom data object cluster center before swapping afterswapping

K-medoids • PAM(Partition Around Method) • Algorithm 5) for each of the k-centre oj 6) examines all of the non-centre n-k object 7) swap oj with onew s.t. Enew = min Ei in all n-k • Complexity • O(k(n-k)2) – very costy • Good to small database

Clustering Example • http://www.rzuser.uni-heidelberg.de/~mmaier4/clusteringdemo/applet.shtml

Self Organizing Feature Map

Self Organizing Maps • Based on competitive learning(Unsupervised) • Only one output neuron activated at any one time • Winner-takes-all neuron or winning neuron • In a Self-Organizing Map • Neurons placed at the nodes of a lattice • one or two dimensional • Neurons selectively tuned to input patterns • by a competitive learning process • Locations of neurons so tuned to be ordered • formation of topographic map of input patterns • Spatial locations of the neurons in the lattice -> intrinsic statistical features contained in the input patterns

                                                                                           Self Organizing Maps • Topology-preserving transformation

SOM as a Neural Model • Distinct feature of human brain • Organized in such a way that different sensory inputs are represented by topologically ordered computational maps • Computational map • Basic building block in information-processing infrastructure of the nervous system • Array of neurons representing slightly differently tuned processors, operate on the sensory information-bearing signals in parallel

Basic Feature-mapping models • Willshaw-von der Malsburg Model(1976) • Biological grounds to explain the problem of retinotopic mapping from the retina to the visual cortex • Two 2D lattices : presynaptic, postsynaptic neurons • Geometric proximity of presynaptic neurons is coded in the form of correlation, and it is used in postsynaptic lattice • Specialized for mapping for same dimension of input and output

Basic Feature-mapping models • Kohonen Model(1982) • Captures essential features of computational maps in Brain • remains computationally tractable • More general and more attention than Willshaw-Malsburg model • Capable of dimensionality reduction • Class of vector coding algorithm

Formation Process of SOM • After initialization for synaptic weights, three essential processes • Competition • Largest value of discriminant function selected • Winner of competition • Cooperation • Spatial neighbors of winning neuron is selected • Synaptic adaptation • Excited neurons adjust synaptic weights

Competitive Process • Input vector, synaptic weight vector x = [x1, x2, …, xm]T wj=[wj1, wj2, …, wjm]T, j = 1, 2,3, l • Best matching, winning neuron i(x) = arg min ||x-wj||, j =1,2,3,..,l • Determine the location where the topological neighborhood of excited neurons is to be centered

Cooperative Process • For a winning neuron, the neurons in its immediate neighborhood excite more than those farther away • topological neighborhood decay smoothly with lateral distance • Symmetric about maximum point defined by dij= 0 • Monotonically decreasing to zero for dij ∞ • Neighborhood function: Gaussian case • Size of neighborhood shrinks with time

Adaptive process • Synaptic weight vector is changed in relation with input vector wj(n+1)= wj(n) + (n) hj,i(x)(n) (x - wj(n)) • Applied to all neurons inside the neighborhood of winning neuron i • Upon repeated presentation of the training data, weight tend to follow the distribution • Learning rate (n) : decay with time • May decompose two phases • Self-organizing or ordering phase : topological ordering of the weight vectors • Convergence phase : after ordering, for accurate statistical quantification of the input space

Summary of SOM • Continuous input space of activation patterns that are generated in accordance with a certain probability distribution • Topology of the network in the form of a lattice of neurons, which defines a discrete output space • Time-varying neighborhood function defined around winning neuron • Learning rate decrease gradually with time, but never go to zero

SOFM Example(1)2-D Lattice by 2-D distribution

SOFM Example(2)Phoneme Recognition • Phonotopic maps • Recognition result for “humppila”

SOFM Example(3) • http://www-ti.informatik.uni-tuebingen.de/~goeppert/KohonenApp/KohonenApp.html • http://davis.wpi.edu/~matt/courses/soms/applet.html

Clustering, Self_Organizing Feature Map