Unsupervised learning

Unsupervised learning • Unsupervised learning is the process of finding • structure, patterns or correlation in the given data. • we distinguish: • Unsupervised Hebbian learning • Principal component analysis • Unsupervised competitive learning • Clustering • Data compression Rudolf Mak TU/e Computer Science

Unsupervised Competitive Learning • In unsupervised competitive learning the neurons • take part in some competition for each input. The • winner of the competition and sometimes some • other neurons are allowed to change their weights • In simple competitive learning only the winner is allowed to learn (change its weight). • In self-organizing maps other neurons in the • neighborhood of the winner may also learn. Rudolf Mak TU/e Computer Science

Applications • Speech Recognition • OCR, e.g. handwritten characters • Image compression • (using code-book vectors) • Texture maps • Classification of cloud pattern (cumulus etc.) • Contextual maps Rudolf Mak TU/e Computer Science

Network topology For simple competitive learning the network consists of a single layer of linear neurons each con- nected to all inputs. Lateral inhibition not indicated Rudolf Mak TU/e Computer Science

There are various criteria to define which neuron i • becomes the winner of the competition for input x: • When the weights are normalized these criteria are identical as can be seen from the equation Definition of the Winner Rudolf Mak TU/e Computer Science

Training Set A training set for unsupervised learning consists only of input vectors (no targets!) Given a network with weight matrix W the training set can be partitioned into clusters Xi according to the classification made by the network Rudolf Mak TU/e Computer Science

Simple Competitive Learning(incremental version) This technique is sometimes called ‘the winner takes it all’ Rudolf Mak TU/e Computer Science

Convergence (incremental version) • Unless the learning parameter tends to 0, • the incremental version of simple compe- • titive learning does not convergence • In absence of convergence the weight • vectors oscillate around the centers of their • clusters Rudolf Mak TU/e Computer Science

Simple Competitive Learning (batch version) Rudolf Mak TU/e Computer Science

Cluster Means Let ni be the number of element in cluster Xi. Then we define the mean mi of cluster i by Hence in the batch version the weight is given by So the weights of the winning neuron are moved in the direction of the mean of its cluster. Rudolf Mak TU/e Computer Science

Data Compression • The final value of the weight vectors are some- • times called code-book vectors. This nomencla- • ture stems from data compression applications. • Compress (encode) • Map vector x to code-word i = win (W, x) • Decompress (decode) • Map code-word i to code-book vector wi which is presumably close to the original vector x • Note that this is a form of lossy data compression Rudolf Mak TU/e Computer Science

Convergence (batch version) • In the batch version of simple competitive learning the weight vector wi can be shown to converge to the mean of the input vectors that have i as winning neuron • In fact the batch version is a gradient des-cent method that converges to a local minimum of a suitably chosen error func-tion Rudolf Mak TU/e Computer Science

For a network with weight matrix W and training set we define the error function E(W) by Let , then Error Function Rudolf Mak TU/e Computer Science

Gradients of the Error functions Because It follows that the gradient of the error in the i-th cluster is given by Rudolf Mak TU/e Computer Science

Minima are Cluster Means After termination of the learning algorithm all gradients are zero, i.e. for all i, 1·i·k , So after learning the weight vectors of the non-empty clusters have converged to the mean vectors of those clusters. Note that learning stops in a local minimum, so better clusters may exist. Rudolf Mak TU/e Computer Science

Dead neurons & minima Rudolf Mak TU/e Computer Science

K-means clustering as SCL • K-means clustering is a popular statistical method to organize multi-dimensional data into K groups. • K-means clustering can be seen as an instance of simple competitive learning, where each neuron has its own learning rate. Rudolf Mak TU/e Computer Science

SCL (batch version) Rudolf Mak TU/e Computer Science

Move learning factor outside repetition Rudolf Mak TU/e Computer Science

Set individual learning rate Rudolf Mak TU/e Computer Science

Split such that Rudolf Mak TU/e Computer Science

Eliminate Rudolf Mak TU/e Computer Science

Introduce separate cluster variables Rudolf Mak TU/e Computer Science

Reuse mj :K-means clustering I Rudolf Mak TU/e Computer Science

K-means Clustering II Rudolf Mak TU/e Computer Science

Convergence of K-means Clustering • The convergence proof of the K-means • clustering algorithms involves showing two • facts • Reassigning a vector to a different cluster does not increase the error function • Updating the mean of a cluster does not increase the error function Rudolf Mak TU/e Computer Science

Reassigning a vector • Assume vector x(p) moves from cluster j to clus- • ter i. Then it follows that • Hence Rudolf Mak TU/e Computer Science

Updating the Mean of a Cluster Consider cluster Xi with old mean and new mean Rudolf Mak TU/e Computer Science

0 10 6 8 3 Non-optimal Stable Clusters Rudolf Mak TU/e Computer Science

Unsupervised learning