Ch. 9 Unsupervised Learning. Stephen Marsland, Machine Learning: An Algorithmic Perspective . CRC 2009 based on slides from Stephen Marsland and some slides from the Internet. Collected and modified by Longin Jan Latecki Temple University latecki@temple.edu. Introduction.

Download Presentation

Ch. 9 Unsupervised Learning

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009based on slides from Stephen Marsland and some slides from the Internet Collected and modified by Longin Jan Latecki Temple University latecki@temple.edu

Stephen Marsland Introduction • Suppose we don’t have good training data • Hard and boring to generate targets • Don’t always know target values • Biologically implausible to have targets? • Two cases: • Know when we’ve got it right • No external information at all

Stephen Marsland Unsupervised Learning • We have no external error information • No task-specific error criterion • Generate internal error • Must be general • Usual method is to cluster data together according to activation of neurons • Competitive learning

Stephen Marsland Competitive Learning • Set of neurons compete to fire • Neuron that ‘best matches’ the input (has the highest activation) fires • Winner-take-all • Neurons ‘specialise’ to recognise some input • Grandmother cells

Stephen Marsland The k-Means Algorithm • Suppose that you know the number of clusters, but not what the clusters look like • How do you assign each data point to a cluster? • Position k centers at random in the space • Assign each point to its nearest center according to some chosen distance measure • Move the center to the mean of the points that it represents • Iterate

Stephen Marsland + + + + ^ + * + - - - - - - - - - - - - - - The k-Means Algorithm If you don’t know the number of means the problem is worse ^ +

Stephen Marsland The k-Means Algorithm • One solution is to run the algorithm for many values of k • Pick the one with lowest error • Up to overfitting • Run the algorithm from many starting points • Avoids local minima? • What about noise? • Median instead of mean?

Stephen Marsland k-Means Neural Network • Neuron activation measures distance between input and neuron position in weight space

Stephen Marsland w2 w1 w2 w3 w1 w3 Weight Space • Image we plot neuronal positions according to their weights

Stephen Marsland k-Means Neural Network • Use winner-take-all neurons • Winning neuron is the one closest to input • Best-matching cluster • How do we do training? • Update weights - move neuron positions • Move winning neuron towards current input • Ignore the rest

Stephen Marsland w2 w1 w3 Normalisation • Suppose the weights are: • (0.2, 0.2, -0.1) • (0.15, -0.15, 0.1) • (10, 10, 10) • The input is (0.2, 0.2, -0.1)

Stephen Marsland Normalisation • For a perfect match with first neuron: • 0.2*0.2 + 0.2*0.2 + -0.1*-0.1 = 0.09 • 0.15*0.2 + -0.15*0.2 + 0.1*-0.1 = -0.01 • 10*0.2 + 10*0.2 + 10*-0.1 = 3 • Can only compare activations if the weights are about the same size

Stephen Marsland Normalisation • Make the distance between each neuron and the origin be 1 • All neurons lie on the unit hypersphere • Need to stop the weights growing unboundedly

Stephen Marsland k-Means Neural Network • Normalise inputs too • Then use: • That’s it • Simple and easy

Stephen Marsland Vector Quantisation (VQ) • Think about the problem of data compression • Want to store a set of data (say, sensor readings) in as small an amount of memory as possible • We don’t mind some loss of accuracy • Could make a codebook of typical data and index each data point by reference to a codebook entry • Thus, VQ is a coding method by mapping each data point x to the closest codeword, i.e., we encode x by replacing it with the closest codeword.

Stephen Marsland 10110 01001 11010 11100 11001 0 1 2 3 4 10110 01001 11010 11100 11001 0 1 2 3 4 Vector Quantisation The Codebook... … is sent to the receiver At least 30 bits

Stephen Marsland 10110 01001 11010 11100 11001 0 1 2 3 4 01001 11100 11101 00101 11110 Vector Quantisation The data... … is encoded... 1 …and sent 3 bits

Stephen Marsland 10110 01001 11010 11100 11001 0 1 2 3 4 01001 11100 11101 00101 11110 Vector Quantisation The data... … is encoded... 3 …and sent 3 bits

Stephen Marsland 10110 01001 11010 11100 11001 0 1 2 3 4 01001 11100 11101 00101 11110 Vector Quantisation The data... … is encoded... ?

Stephen Marsland 10110 01001 11010 11100 11001 0 1 2 3 4 01001 11100 11101 00101 11110 Vector Quantisation The data... … is encoded... ? Pick the nearest according to some measure

Stephen Marsland 10110 01001 11010 11100 11001 0 1 2 3 4 01001 11100 11101 00101 11110 Vector Quantisation The data... … is encoded... ? And send … 3 bits, but information is lost Pick the nearest according to some measure

Stephen Marsland 01001 11100 11101 00101 11110 Vector Quantisation The data... … is sent as 13313 … which takes 15 bits instead of 30 Of course, sending the codebook is inefficient for this data, but if there was a lot more information, the cost would have been reduced

Stephen Marsland Vector Quantisation • The problem is that we have only sent 2 different pieces of data - 11100 and 00101, instead of the 5 we had. • If the codebook had been picked more carefully, this would have been a lot better • How can you pick the codebook? • Usually k-means is used for Learning Vector Quantisation

Stephen Marsland Voronoi Tesselation • Join neighbouring points • Draw lines equidistant to each pair of points • These are perpendicular to other lines

Two Dimensional Voronoi Diagram Codewords in 2-dimensional space. Input vectors are marked with an x, codewords are marked with red circles, and the Voronoi regions are separated with boundary lines.

Self Organizing Maps • Self-organizing maps (SOMs) are a data visualization technique invented by Professor Teuvo Kohonen • Also called Kohonen Networks, Competitive Learning, Winner-Take-All Learning • Generally reduces the dimensions of data through the use of self-organizing neural networks • Useful for data visualization; humans cannot visualize high dimensional data so this is often a useful technique to make sense of large data sets

Neurons in the Brain • Although heterogeneous, at a low level the brain is composed of neurons • A neuron receives input from other neurons (generally thousands) from its synapses • Inputs are approximately summed • When the input exceeds a threshold the neuron sends an electrical spike that travels that travels from the body, down the axon, to the next neuron(s)

Stephen Marsland Feature Maps • Sounds that are similar (‘close together’) excite neurons that are near to each other • Sounds that are very different excite neurons that are a long way off • This is known as topology preservation • The ordering of the inputs is preserved • If possible (perfectly topology-preserving)

Introduction to Cognitive Science Lecture 21: Self-Organizing Maps i i Neighborhood of neuron i Self-Organizing Maps (Kohonen Maps) Common output-layer structures: One-dimensional(completely interconnectedfor determining “winner” unit) Two-dimensional(connections omitted, only neighborhood relations shown)

Stephen Marsland Neuron Connections? • We don’t actually need the inhibitory connections • Just use a neighbourhood of positive connections • How large should this neighbourhood be? • Early in learning, network is unordered • Big neighbourhood • Later on, just fine-tuning network • Small neighbourhood

Stephen Marsland The Self-Organising Map • The weight vectors are randomly initialised • Input vectors are presented to the network • The neurons are activated proportional to the Euclidean distance between the input and the weight vector • The winning node has its weight vector moved closer to the input • So do the neighbours of the winning node • Over time, the network self-organises so that the input topology is preserved

Stephen Marsland Self-Organisation • Global ordering from local interactions • Each neurons sees its neighbours • The whole network becomes ordered • Understanding self-organisation is part of complexity science • Appears all over the place

Basic “Winner Take All” Network • Two layer network • Input units, output units, each input unit is connected to each output unit Input Layer Output Layer I1 O1 I2 O2 I3 Wi,j

Basic Algorithm (the same as k-Means Neural Network) • Initialize Map (randomly assign weights) • Loop over training examples • Assign input unit values according to the values in the current example • Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g. • Modify weights on the winner to more closely match the input For all output units j=1 to m and input units i=1 to n Find the one that minimizes: where c is a small positive learning constant that usually decreases as the learning proceeds

Result of Algorithm • Initially, some output nodes will randomly be a little closer to some particular type of input • These nodes become “winners” and the weights move them even closer to the inputs • Over time nodes in the output become representative prototypes for examples in the input • Note there is no supervised training here • Classification: • Given new input, the class is the output node that is the winner

Typical Usage: 2D Feature Map • In typical usage the output nodes form a 2D “map” organized in a grid-like fashion and we update weights in a neighborhood around the winner Output Layers Input Layer O11 O12 O13 O14 O15 I1 O21 O22 O23 O24 O25 I2 O31 O32 O33 O34 O35 … O41 O42 O43 O44 O45 I3 O51 O52 O53 O54 O55

Modified Algorithm • Initialize Map (randomly assign weights) • Loop over training examples • Assign input unit values according to the values in the current example • Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g. • Modify weights on the winner to more closely match the input • Modify weights in a neighborhood around the winner so the neighbors on the 2D map also become closer to the input • Over time this will tend to cluster similar items closer on the map

Introduction to Cognitive Science Lecture 21: Self-Organizing Maps Unsupervised Learning in SOMs For n-dimensional input space and m output neurons: (1) Choose random weight vector wi for neuron i, i = 1, ..., m (2) Choose random input x (3) Determine winner neuron k: ||wk – x|| = mini ||wi – x|| (Euclidean distance) (4) Update all weight vectors of all neurons i in the neighborhood of neuron k: wi := wi + η·h(i, k)·(x – wi) (wi is shifted towards x) (5) If convergence criterion met, STOP. Otherwise, narrow neighborhood function hand learning parameter η and go to (2).

Stephen Marsland The Self-Organising Map Before training (large neighbourhood)

Stephen Marsland The Self-Organising Map After training (small neighbourhood)