1 / 37

Clustering, Self_Organizing Feature Map

Clustering, Self_Organizing Feature Map. 2003. 11.3 Yoonyoung Nam Shakeel,M. YoungIn Yeo. Clustering. Outliers. Cluster1. Cluster 2. Introduction. Cluster Group of the similar objects Clustering Special method of classification Unsupervised learning : no predefined classes.

talib
Download Presentation

Clustering, Self_Organizing Feature Map

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering, Self_Organizing Feature Map 2003. 11.3 Yoonyoung Nam Shakeel,M. YoungIn Yeo

  2. Clustering

  3. Outliers Cluster1 Cluster 2 Introduction • Cluster • Group of the similar objects • Clustering • Special method of classification • Unsupervised learning : no predefined classes

  4. What is Good Clustering? • High Intra-cluster similarity • Dissimilar to the objects in other clusters • Low Inter-cluster similarity • Similar to one another within the same cluster Depending on the similarity measure

  5. The problem of unsupervised clustering • Nearly identical to that of distribution estimation for classes with multi-modal features Example of 4 data sets with the same mean and covariance

  6. Similarity Measures • The distance between them • If the Euclidean distance between them is less than some threshold distance d0, • same cluster

  7. A simple scaling of the coordinate axes can result in a different grouping of the data into clusters

  8. To achieve invariance, normalize the data • Subtracting the mean and dividing by the standard deviation • Inappropriate if the spread is due to the presence of subclasses

  9. Mahalanobis distance • Similarity function s(x, x’) • Using the angle between two vectors, normalized inner product may be an appropriate similarity function.

  10. Tanimoto coefficient • Using binary values • The ratio of the number of shared attributes to the number possessed by x or x’ • =

  11. Category of Clustering Method • Hierarchical Clustering • Group objects into a tree of clusters • AGNES(Agglomerative Nesting) • DIANA(Divisible Analysis) • Partitioning Clustering • Construct a partition of a object V into a set of k clusters (k : user input parameter) • K-means • K-medoids

  12. agglomerative(AGNES) d e a b c d e a b e d c d e b a c divisible(DIANA) 0step 4step 3step 1step 2step 2step 3step 1step 4step 0step Hierarchical Method

  13. Hierarchical Method • Algorithm for Agglomerative • Input : Set V of objects • Put each object in a cluster • Loop until the number of cluster is one • Calculate the set of inter-cluster similarity • Form merge by the fusion of the most similar pair of current clusters

  14. Single-Link Cj Ci Complete-Link Hierarchical Method • Similarity Method • Single-Linkage • Complete-Linkage • Average-Linkage

  15. K-means • Use gravity center of the objects • Algorithm • Input : k(the number of cluster), Set V of n objects Output : A set of k clusters which minimizes the sum of distance error criterion Method: • Choose k objects as the initial cluster centers; set i=0 • Loop • For each object v • Find the NearestCenter(i)(p), and assign p to it • Compute mean of cluster as center • Pro : quick convergence • Con : sensitive to noise, outlier and initial seed selection

  16. centre object K-means

  17. K-means clustering • Choose k and v

  18. K-means clustering • Assign each object to the cluster to which it is the closest • Compute the center of each cluster

  19. K-means clustering • Reassign subjects to the cluster whose centroid is nearest

  20. K-medoids • Medoid : the object whose average dissimilarity to all the objects in the cluster is minimal. • Algorithm Input : k (the number of cluster), Set V of n objects Output : A set of k clusters which minimizes the sum of distance error criterion Method : • Choose k objects as the initial cluster centers; set i=0 • Loop • For each object v • Find the NearestCenter(i)(p), and assign p to it • Randomly select a non-centre object orandom • Compute the total cost S of swapping oj with orandom to form new set • If S<0, swap oj with orandom • Break when threshold is met

  21. oi oi oi oi p p p p oj oj oj oj orandom orandom orandom K-medoids • Swapping cases orandom 1.re-assigned to oi 2.re-assigned to orandom 3.no change 4.re-assigned to orandom data object cluster center before swapping afterswapping

  22. K-medoids • PAM(Partition Around Method) • Algorithm 5) for each of the k-centre oj 6) examines all of the non-centre n-k object 7) swap oj with onew s.t. Enew = min Ei in all n-k • Complexity • O(k(n-k)2) – very costy • Good to small database

  23. Clustering Example • http://www.rzuser.uni-heidelberg.de/~mmaier4/clusteringdemo/applet.shtml

  24. Self Organizing Feature Map

  25. Self Organizing Maps • Based on competitive learning(Unsupervised) • Only one output neuron activated at any one time • Winner-takes-all neuron or winning neuron • In a Self-Organizing Map • Neurons placed at the nodes of a lattice • one or two dimensional • Neurons selectively tuned to input patterns • by a competitive learning process • Locations of neurons so tuned to be ordered • formation of topographic map of input patterns • Spatial locations of the neurons in the lattice -> intrinsic statistical features contained in the input patterns

  26.                                                                                           Self Organizing Maps • Topology-preserving transformation

  27. SOM as a Neural Model • Distinct feature of human brain • Organized in such a way that different sensory inputs are represented by topologically ordered computational maps • Computational map • Basic building block in information-processing infrastructure of the nervous system • Array of neurons representing slightly differently tuned processors, operate on the sensory information-bearing signals in parallel

  28. Basic Feature-mapping models • Willshaw-von der Malsburg Model(1976) • Biological grounds to explain the problem of retinotopic mapping from the retina to the visual cortex • Two 2D lattices : presynaptic, postsynaptic neurons • Geometric proximity of presynaptic neurons is coded in the form of correlation, and it is used in postsynaptic lattice • Specialized for mapping for same dimension of input and output

  29. Basic Feature-mapping models • Kohonen Model(1982) • Captures essential features of computational maps in Brain • remains computationally tractable • More general and more attention than Willshaw-Malsburg model • Capable of dimensionality reduction • Class of vector coding algorithm

  30. Formation Process of SOM • After initialization for synaptic weights, three essential processes • Competition • Largest value of discriminant function selected • Winner of competition • Cooperation • Spatial neighbors of winning neuron is selected • Synaptic adaptation • Excited neurons adjust synaptic weights

  31. Competitive Process • Input vector, synaptic weight vector x = [x1, x2, …, xm]T wj=[wj1, wj2, …, wjm]T, j = 1, 2,3, l • Best matching, winning neuron i(x) = arg min ||x-wj||, j =1,2,3,..,l • Determine the location where the topological neighborhood of excited neurons is to be centered

  32. Cooperative Process • For a winning neuron, the neurons in its immediate neighborhood excite more than those farther away • topological neighborhood decay smoothly with lateral distance • Symmetric about maximum point defined by dij= 0 • Monotonically decreasing to zero for dij ∞ • Neighborhood function: Gaussian case • Size of neighborhood shrinks with time

  33. Adaptive process • Synaptic weight vector is changed in relation with input vector wj(n+1)= wj(n) + (n) hj,i(x)(n) (x - wj(n)) • Applied to all neurons inside the neighborhood of winning neuron i • Upon repeated presentation of the training data, weight tend to follow the distribution • Learning rate (n) : decay with time • May decompose two phases • Self-organizing or ordering phase : topological ordering of the weight vectors • Convergence phase : after ordering, for accurate statistical quantification of the input space

  34. Summary of SOM • Continuous input space of activation patterns that are generated in accordance with a certain probability distribution • Topology of the network in the form of a lattice of neurons, which defines a discrete output space • Time-varying neighborhood function defined around winning neuron • Learning rate decrease gradually with time, but never go to zero

  35. SOFM Example(1)2-D Lattice by 2-D distribution

  36. SOFM Example(2)Phoneme Recognition • Phonotopic maps • Recognition result for “humppila”

  37. SOFM Example(3) • http://www-ti.informatik.uni-tuebingen.de/~goeppert/KohonenApp/KohonenApp.html • http://davis.wpi.edu/~matt/courses/soms/applet.html

More Related