Ch. 9 Unsupervised Learning

Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009based on slides from Stephen Marsland and some slides from the Internet Collected and modified by Longin Jan Latecki Temple University latecki@temple.edu

Stephen Marsland Introduction • Suppose we don’t have good training data • Hard and boring to generate targets • Don’t always know target values • Biologically implausible to have targets? • Two cases: • Know when we’ve got it right • No external information at all

Stephen Marsland Unsupervised Learning • We have no external error information • No task-specific error criterion • Generate internal error • Must be general • Usual method is to cluster data together according to activation of neurons • Competitive learning

Stephen Marsland Competitive Learning • Set of neurons compete to fire • Neuron that ‘best matches’ the input (has the highest activation) fires • Winner-take-all • Neurons ‘specialise’ to recognise some input • Grandmother cells

Stephen Marsland The k-Means Algorithm • Suppose that you know the number of clusters, but not what the clusters look like • How do you assign each data point to a cluster? • Position k centers at random in the space • Assign each point to its nearest center according to some chosen distance measure • Move the center to the mean of the points that it represents • Iterate

k-means Clustering

Stephen Marsland y1 - y2 x1 - x2 Euclidean Distance y x

Stephen Marsland . . + + . . ^ ^ * . . + . + - . . - - . - . - . The k-Means Algorithm 4 means

Stephen Marsland + + + + ^ ^ ^ * * + - + - - - - - - - - - - - The k-Means Algorithm These are local minima solutions ^

Stephen Marsland + + + + ^ ^ * ^ - - - - - - - - - - - - - * The k-Means Algorithm More perfectly valid, wrong solutions ^ ^

Stephen Marsland + + + + ^ + * + - - - - - - - - - - - - - - The k-Means Algorithm If you don’t know the number of means the problem is worse ^ +

Stephen Marsland The k-Means Algorithm • One solution is to run the algorithm for many values of k • Pick the one with lowest error • Up to overfitting • Run the algorithm from many starting points • Avoids local minima? • What about noise? • Median instead of mean?

Stephen Marsland k-Means Neural Network • Neuron activation measures distance between input and neuron position in weight space

Stephen Marsland w2 w1 w2 w3 w1 w3 Weight Space • Image we plot neuronal positions according to their weights

Stephen Marsland k-Means Neural Network • Use winner-take-all neurons • Winning neuron is the one closest to input • Best-matching cluster • How do we do training? • Update weights - move neuron positions • Move winning neuron towards current input • Ignore the rest

Stephen Marsland w2 w1 w3 Normalisation • Suppose the weights are: • (0.2, 0.2, -0.1) • (0.15, -0.15, 0.1) • (10, 10, 10) • The input is (0.2, 0.2, -0.1)

Stephen Marsland Normalisation • For a perfect match with first neuron: • 0.2*0.2 + 0.2*0.2 + -0.1*-0.1 = 0.09 • 0.15*0.2 + -0.15*0.2 + 0.1*-0.1 = -0.01 • 10*0.2 + 10*0.2 + 10*-0.1 = 3 • Can only compare activations if the weights are about the same size

Stephen Marsland Normalisation • Make the distance between each neuron and the origin be 1 • All neurons lie on the unit hypersphere • Need to stop the weights growing unboundedly

Stephen Marsland k-Means Neural Network • Normalise inputs too • Then use: • That’s it • Simple and easy

Stephen Marsland Vector Quantisation (VQ) • Think about the problem of data compression • Want to store a set of data (say, sensor readings) in as small an amount of memory as possible • We don’t mind some loss of accuracy • Could make a codebook of typical data and index each data point by reference to a codebook entry • Thus, VQ is a coding method by mapping each data point x to the closest codeword, i.e., we encode x by replacing it with the closest codeword.

Outline of Vector Quantization of Images

Stephen Marsland 10110 01001 11010 11100 11001 0 1 2 3 4 10110 01001 11010 11100 11001 0 1 2 3 4 Vector Quantisation The Codebook... … is sent to the receiver At least 30 bits

Stephen Marsland 10110 01001 11010 11100 11001 0 1 2 3 4 01001 11100 11101 00101 11110 Vector Quantisation The data... … is encoded... 1 …and sent 3 bits

Stephen Marsland 10110 01001 11010 11100 11001 0 1 2 3 4 01001 11100 11101 00101 11110 Vector Quantisation The data... … is encoded... 3 …and sent 3 bits

Stephen Marsland 10110 01001 11010 11100 11001 0 1 2 3 4 01001 11100 11101 00101 11110 Vector Quantisation The data... … is encoded... ?

Stephen Marsland 10110 01001 11010 11100 11001 0 1 2 3 4 01001 11100 11101 00101 11110 Vector Quantisation The data... … is encoded... ? Pick the nearest according to some measure

Stephen Marsland 10110 01001 11010 11100 11001 0 1 2 3 4 01001 11100 11101 00101 11110 Vector Quantisation The data... … is encoded... ? And send … 3 bits, but information is lost Pick the nearest according to some measure

Stephen Marsland 01001 11100 11101 00101 11110 Vector Quantisation The data... … is sent as 13313 … which takes 15 bits instead of 30 Of course, sending the codebook is inefficient for this data, but if there was a lot more information, the cost would have been reduced

Stephen Marsland Vector Quantisation • The problem is that we have only sent 2 different pieces of data - 11100 and 00101, instead of the 5 we had. • If the codebook had been picked more carefully, this would have been a lot better • How can you pick the codebook? • Usually k-means is used for Learning Vector Quantisation

Stephen Marsland Voronoi Tesselation • Join neighbouring points • Draw lines equidistant to each pair of points • These are perpendicular to other lines

Two Dimensional Voronoi Diagram Codewords in 2-dimensional space. Input vectors are marked with an x, codewords are marked with red circles, and the Voronoi regions are separated with boundary lines.

Self Organizing Maps • Self-organizing maps (SOMs) are a data visualization technique invented by Professor Teuvo Kohonen • Also called Kohonen Networks, Competitive Learning, Winner-Take-All Learning • Generally reduces the dimensions of data through the use of self-organizing neural networks • Useful for data visualization; humans cannot visualize high dimensional data so this is often a useful technique to make sense of large data sets

Neurons in the Brain • Although heterogeneous, at a low level the brain is composed of neurons • A neuron receives input from other neurons (generally thousands) from its synapses • Inputs are approximately summed • When the input exceeds a threshold the neuron sends an electrical spike that travels that travels from the body, down the axon, to the next neuron(s)

Stephen Marsland Higher pitch High pitch Low pitch Feature Maps

Stephen Marsland Feature Maps • Sounds that are similar (‘close together’) excite neurons that are near to each other • Sounds that are very different excite neurons that are a long way off • This is known as topology preservation • The ordering of the inputs is preserved • If possible (perfectly topology-preserving)

Stephen Marsland Topology Preservation Inputs Outputs

Stephen Marsland Topology Preservation

Introduction to Cognitive Science Lecture 21: Self-Organizing Maps i i Neighborhood of neuron i Self-Organizing Maps (Kohonen Maps) Common output-layer structures: One-dimensional(completely interconnectedfor determining “winner” unit) Two-dimensional(connections omitted, only neighborhood relations shown)

Stephen Marsland The Self-Organising Map Inputs

Stephen Marsland Neuron Connections? • We don’t actually need the inhibitory connections • Just use a neighbourhood of positive connections • How large should this neighbourhood be? • Early in learning, network is unordered • Big neighbourhood • Later on, just fine-tuning network • Small neighbourhood

Stephen Marsland The Self-Organising Map • The weight vectors are randomly initialised • Input vectors are presented to the network • The neurons are activated proportional to the Euclidean distance between the input and the weight vector • The winning node has its weight vector moved closer to the input • So do the neighbours of the winning node • Over time, the network self-organises so that the input topology is preserved

Stephen Marsland Self-Organisation • Global ordering from local interactions • Each neurons sees its neighbours • The whole network becomes ordered • Understanding self-organisation is part of complexity science • Appears all over the place

Basic “Winner Take All” Network • Two layer network • Input units, output units, each input unit is connected to each output unit Input Layer Output Layer I1 O1 I2 O2 I3 Wi,j

Basic Algorithm (the same as k-Means Neural Network) • Initialize Map (randomly assign weights) • Loop over training examples • Assign input unit values according to the values in the current example • Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g. • Modify weights on the winner to more closely match the input For all output units j=1 to m and input units i=1 to n Find the one that minimizes: where c is a small positive learning constant that usually decreases as the learning proceeds

Result of Algorithm • Initially, some output nodes will randomly be a little closer to some particular type of input • These nodes become “winners” and the weights move them even closer to the inputs • Over time nodes in the output become representative prototypes for examples in the input • Note there is no supervised training here • Classification: • Given new input, the class is the output node that is the winner

Typical Usage: 2D Feature Map • In typical usage the output nodes form a 2D “map” organized in a grid-like fashion and we update weights in a neighborhood around the winner Output Layers Input Layer O11 O12 O13 O14 O15 I1 O21 O22 O23 O24 O25 I2 O31 O32 O33 O34 O35 … O41 O42 O43 O44 O45 I3 O51 O52 O53 O54 O55

Modified Algorithm • Initialize Map (randomly assign weights) • Loop over training examples • Assign input unit values according to the values in the current example • Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g. • Modify weights on the winner to more closely match the input • Modify weights in a neighborhood around the winner so the neighbors on the 2D map also become closer to the input • Over time this will tend to cluster similar items closer on the map

Introduction to Cognitive Science Lecture 21: Self-Organizing Maps Unsupervised Learning in SOMs For n-dimensional input space and m output neurons: (1) Choose random weight vector wi for neuron i, i = 1, ..., m (2) Choose random input x (3) Determine winner neuron k: ||wk – x|| = mini ||wi – x|| (Euclidean distance) (4) Update all weight vectors of all neurons i in the neighborhood of neuron k: wi := wi + η·h(i, k)·(x – wi) (wi is shifted towards x) (5) If convergence criterion met, STOP. Otherwise, narrow neighborhood function hand learning parameter η and go to (2).

Stephen Marsland The Self-Organising Map Before training (large neighbourhood)

Stephen Marsland The Self-Organising Map After training (small neighbourhood)

Ch. 9 Unsupervised Learning

Ch. 9 Unsupervised Learning

Presentation Transcript

Unsupervised Learning

Ch 9 Learning Goals

Unsupervised Learning

Unsupervised Learning

Machine learning: Unsupervised learning

Unsupervised learning

Ch. 9 Learning

Unsupervised Learning

Ch. 9 Social Learning

Unsupervised Learning

Unsupervised Learning

Unsupervised Learning

Unsupervised learning

Unsupervised Learning

Unsupervised Learning

Unsupervised learning

Unsupervised Learning

Ch. 9 Unsupervised Learning

Unsupervised Learning

Unsupervised Learning