Perceptrons

Perceptrons • Introduced in1957 by Rosenblatt • Used for pattern recognition • Name is in use both for a particular artificial neuron model and for entire systems built from these neurons • Introduced as a model for the visual system • Heavily criticized by Minsky and Papert (1969) • this caused a recession in ANN-research that lasted for more than a decade, until the advent of BP-learning for MLFF networks (Rumelhart e.a. 1986) and RNN-networks (Hopfield e.a.1982-85) Rudolf Mak TU/e Computer Science

Single-layer Perceptrons • A discrete-neuron single-layer perceptron consists of • an input layer of n real-valued input nodes (not neurons) • an output layer of m neurons • the output of a discrete neuron can only have the values zero (non firing) and one (firing) • each neuron has a real-valued threshold and fires if and only if its accumulated input exceeds that threshold • each connection from an input node j to an output neuron i has a real-valued weight wij • It computes a vector function f: Rn! {0,1}m Rudolf Mak TU/e Computer Science

Questions • Since a perceptron with n input nodes and m output nodes computes a function Rn! {0,1}m, we therefore study the questions: • Which functions can be computed? • Does there exist a learning method, i.e. is there an algorithm that optimizes the weights? Rudolf Mak TU/e Computer Science

Single-layer Single-output Perceptron We start with the simplest configuration: A single-layer single-output perceptron consists of a single neuron whose output is either zero or one, and is given by -w0 is called the threshold Rudolf Mak TU/e Computer Science

Where do we put the threshold Heaviside function Linear combiner Heaviside + threshold Affine combiner Standard Heaviside Rudolf Mak TU/e Computer Science

Artificial Neuron affine combiner transfer function Rudolf Mak TU/e Computer Science

Form affine to linear combiners Rudolf Mak TU/e Computer Science

Boolean Function: AND logical geometrical 2x + 2y > 3 2x + 2y < 3 Rudolf Mak TU/e Computer Science

Boolean Function: OR Rudolf Mak TU/e Computer Science

Boolean Functions: XOR Rudolf Mak TU/e Computer Science

Linearly Separable Sets A set X2 Rn£ {0,1} is called (absolutely) linearly separable if there exists a vector w2Rn+1 such that for each pair (x,t) 2X : A training set X is correctly classified by a perceptron if for each (x,t) 2X the output of the perceptron with input x is also t. A finite set X can be classified correctly by a one-layer perceptron if and only if it is linearly separable. Rudolf Mak TU/e Computer Science

A Linearly Separable Set (in 2D) Rudolf Mak TU/e Computer Science

Not linearly separable set (in 2D) Rudolf Mak TU/e Computer Science

One-layer Perceptron Learning Since the output neurons of a one-layer perceptron are independent, it suffices to study perceptron with a single output. Consider a finite set also called a training set. We say that such a set X is correctly classified by a perceptron, if for each pair (x,t) in X the output of the perceptron with input x is t. A finite set X can be classified correctly by a one-layer perceptron if and only if it is linearly separable. Rudolf Mak TU/e Computer Science

Perceptron Learning Rule(incremental version) Rudolf Mak TU/e Computer Science

Geometric Interpretation < 0 > 0 The weights are modified such that the angle with the input vector is decreased. Rudolf Mak TU/e Computer Science

Geometric Interpretation The weights are modified such that the angle with the input vector is increased. Rudolf Mak TU/e Computer Science

Perceptron Convergence Theorem Let X be a finite, linearly separable training set. Let the initial weight vector and the learning parameter  be chosen an arbitrary positive number. Then for each infinite sequence of training pairs from X, the sequence of weight vectors obtained by applying the perceptron learning rule converges in a finite number of steps. Rudolf Mak TU/e Computer Science

Proof sketch 1 Rudolf Mak TU/e Computer Science

Remarks • The perceptron learning algorithm is a form of reinforcement learning and is due to Rosenblatt • By adjusting the weights sufficiently the network may learn the current training vector. Other vectors, however, may be unlearned • Although the learning algorithm converges for any positive learning parameter , faster convergence can be obtained by a suitable choice, possible dependent on the observed error • Scaling of the input vectors can also be beneficial to the convergence of the algorithm Rudolf Mak TU/e Computer Science

Perceptron Learning Rule(batch version) Rudolf Mak TU/e Computer Science

Learning by Error Minimization Consider the error function Then the gradient of E (w) is given by Hence the weight updates (batch version) are given by Rudolf Mak TU/e Computer Science

Capacity of One-layer Perceptrons • The number of boolean functions of n arguments is 2(2n) • Each boolean function defines a dichotomy of the points • of an n-dimensional hypercube • The number of linear dichotomies Bn of the corner points • of the hypercubeis bounded by C(2n, n), where C(m, n) • is the number of linear dichotomies of m points in Rn • (in general position) which is given by Rudolf Mak TU/e Computer Science

# bool fie versus # lin. sep. dichotomies Rudolf Mak TU/e Computer Science

Multi-layer Perceptrons • A discrete-neuron multi-layer perceptron consists of • an input layer of n real-valued input nodes (not neurons) • an output layer of m neurons • several intermediate (hidden) layers consisting of one or more neurons. • with exception of the last layer the nodes of each layer serve as inputs to the nodes of the next layer • each connection from node j in layer k-1 to node i in layer k has a real valued weight wijk • It computes a function f: Rn! {0,1}m Rudolf Mak TU/e Computer Science

Graphical representation input nodes output nodes edge direction left to right not drawn hidden layers Rudolf Mak TU/e Computer Science

Discrete Multi-layer Perceptrons • The computational capabilities of multi-layer perceptrons • for two and three layers are given by • Every boolean function can be computed by a two-layer • perceptron • Every region in Rnthat is bounded by a finite number • of n-1 dimensional hyperplanes can be classified by a • three-layer perceptron • Unfortunately there is no simple learning algorithm for • multi-layer perceptrons Rudolf Mak TU/e Computer Science

Clause Cj x1x2 x3 x4 x5 literals Disjunctive Normal Form Logic table for f Rudolf Mak TU/e Computer Science

Perceptron for a Clause Rudolf Mak TU/e Computer Science

2-layer perceptron for a boolean function Rudolf Mak TU/e Computer Science

XOR revisited Rudolf Mak TU/e Computer Science

XOR revisited again Rudolf Mak TU/e Computer Science

Minsky Papert observation • No diameter limited perceptron can determine • whether a geometric figure is connected A B C D Rudolf Mak TU/e Computer Science

Diameter limited perceptron C Rudolf Mak TU/e Computer Science

 Rudolf Mak TU/e Computer Science

Star Region Rudolf Mak TU/e Computer Science

3-layer perceptron for star region Rudolf Mak TU/e Computer Science

Summary • One-layer perceptrons have limited computational capabilities. Only linearly separable sets can be classified. • For one-layer perceptrons there exists a learning algorithm with robust convergence properties. • Multi-layer perceptrons have larger computational capabilities (all boolean functions for two-layer perceptrons), but for those there does not exist a simple learning algorithm. Rudolf Mak TU/e Computer Science

Perceptrons

Perceptrons

Presentation Transcript

Simple Perceptrons

Learning and Perceptrons

Perceptrons for Dummies

Lecture 11 – Perceptrons

Lecture 13 – Perceptrons

Lecture 13 – Perceptrons

Neural Networks ( Multi-Layer Perceptrons )

Linear Classification with Perceptrons

Perceptrons Michael J. Watts mike.watts.nz

Perceptrons

Multilayer Perceptrons

Perceptrons

ARTIFICIAL NEURAL NETWORKS – Multilayer Perceptrons

Chapter 3 Single-Layer Perceptrons

Perceptrons

Learning: Perceptrons & Neural Networks

Perceptrons

Perceptrons

Presentation Transcript

Simple Perceptrons

Learning and Perceptrons

Perceptrons for Dummies

Lecture 11 – Perceptrons

Lecture 13 – Perceptrons

Lecture 13 – Perceptrons

Neural Networks ( Multi-Layer Perceptrons )

Linear Classification with Perceptrons

Perceptrons Michael J. Watts mike.watts.nz

Perceptrons

Multilayer Perceptrons

Perceptrons

ARTIFICIAL NEURAL NETWORKS – Multilayer Perceptrons

Chapter 3 Single-Layer Perceptrons

Perceptrons

Learning: Perceptrons &amp; Neural Networks

Learning: Perceptrons & Neural Networks