Introduction to Neural Networks

Introduction to Neural Networks

Chapter 1 Introduction

Introduction • Why ANN • Some tasks can be done easily (effortlessly) by humans but are hard by conventional paradigms on Von Neumann machine with algorithmic approach • Pattern recognition (old friends, hand-written characters) • Content addressable recall • Approximate, common sense reasoning (driving, playing piano, baseball player) • These tasks are often ill-defined, experience based, hard to apply logic

Introduction • Human Brain • -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- • Large # (1011) of low speed processors (ms) with limited computing power • Large # (1015) of low speed connections • Content addressable recall (CAM) • Problem-solving knowledge resides in the connectivity of neurons • Adaptation by changing the connectivity Von Neumann machine -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- • One or a few high speed (ns) processors with considerable computing power • One or a few shared high speed buses for communication • Sequential memory access by address • Problem-solving knowledge is separated from the computing component • Hard to be adaptive

Biological neural activity • Each neuron has a body, an axon, and many dendrites • Can be in one of the two states: firing and rest. • Neuron fires if the total incoming stimulus exceeds the threshold • Synapse: thin gap between axon of one neuron and dendrite of another. • Signal exchange • Synaptic strength/efficiency

Introduction • What is an (artificial) neural network • A set of nodes (units, neurons, processing elements) • Each node has input and output • Each node performs a simple computation by its node function • Weightedconnections between nodes • Connectivity gives the structure/architecture of the net • What can be computed by a NN is primarily determined by the connections and their weights • A very much simplified version of networks of neurons in animal nerve systems

Introduction • Bio NN • -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- • Cell body • signal from other neurons • firing frequency • firing mechanism • Synapses • synaptic strength ANN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- • Nodes • input • output • node function • Connections • connection strength • Highly parallel, simple local computation (at neuron level) achieves global results as emerging property of the interaction (at network level) • Pattern directed (meaning of individual nodes only in the context of a pattern) • Fault-tolerant/graceful degrading • Learning/adaptation plays important role.

History of NN • Pitts & McCulloch (1943) • First mathematical model of biological neurons • All Boolean operations can be implemented by these neuron-like nodes (with different threshold and excitatory/inhibitory connections). • Competitor to Von Neumann model for general purpose computing device • Origin of automata theory. • Hebb (1949) • Hebbian rule of learning: increase the connection strength between neurons i and j whenever both i and j are activated. • Or increase the connection strength between nodes i and j whenever both nodes are simultaneously ON or OFF.

x1 x2 xn History of NN • Early booming (50’s – early 60’s) • Rosenblatt (1958) • Perceptron: network of threshold nodes for pattern classification Perceptron learning rule • Percenptron convergence theorem: everything that can be represented by a perceptron can be learned • Widow and Hoff (1960, 19062) • Learning rule based on gradient descent (with differentiable unit) • Minsky’s attempt to build a general purpose machine with Pitts/McCullock units

History of NN • The setback (mid 60’s – late 70’s) • Serious problems with perceptron model (Minsky’s book 1969) • Single layer perceonptrons cannot represent (learn) simple functions such as XOR • Multi-layer of non-linear units may have greater power but there is no learning rule for such nets • Scaling problem: connection weights may grow infinitely • The first two problems overcame by latter effort in 80’s, but the scaling problem persists • Death of Rosenblatt (1964) • Striving of Von Neumann machine and AI

History of NN • Renewed enthusiasm and flourish (80’s – present) • New techniques • Backpropagation learning for multi-layer feed forward nets (with non-linear, differentiable node functions) • Thermodynamic models (Hopfield net, Boltzmann machine, etc.) • Unsupervised learning • Impressive application (character recognition, speech recognition, text-to-speech transformation, process control, associative memory, etc.) • Traditional approaches face difficult challenges • Caution: • Don’t underestimate difficulties and limitations • Poses more problems than solutions

ANN Neuron Models • Each node has one or more inputs from other nodes, and one output to other nodes • Input/output values can be • Binary {0, 1} • Bipolar {-1, 1} • Continuous • All inputs to one node come in at the same time and remain activated until the output is produced • Weights associated with links General neuron model Weighted input summation

Node Function • Step (threshold) function where c is called the threshold • Ramp function Step function Ramp function

Node Function • Sigmoid function • S-shaped • Continuous and everywhere differentiable • Rotationally symmetric about some point (net = c) • Asymptotically approach saturation points • Examples: Sigmoid function When y = 0 and z = 0: a = 0, b = 1, c = 0. When y = 0 and z = -0.5 a = -0.5, b = 0.5, c = 0. Larger x gives steeper curve

Node Function • Gaussian function • Bell-shaped (radial basis) • Continuous • f(net) asymptotically approaches 0 (or some constant) when |net| is large • Single maximum (when net = ) • Example: Gaussian function

Network Architecture • (Asymmetric) Fully Connected Networks • Every node is connected to every other node • Connection may be excitatory (positive), inhibitory (negative), or irrelevant ( 0). • Most general • Symmetric fully connected nets: weights are symmetric (wij = wji) Input nodes: receive input from the environment Output nodes: send signals to the environment Hidden nodes: no direct interaction to the environment

Network Architecture • Layered Networks • Nodes are partitioned into subsets, called layers. • No connections that lead from nodes in layer j to those in layer k if j > k. • Inputs from the environment are applied to nodes in layer 0 (input layer). • Nodes in input layer are place holders with no computation occurring (i.e., their node functions are identity function)

Network Architecture • Acyclic Networks • Connections do not form directed cycles. • Multi-layered feedforward nets are acyclic • Recurrent Networks • Nets with directed cycles. • Much harder to analyze than acyclic nets.

Network Architecture • Feedforward Networks • A connection is allowed from a node in layer i only to nodes in layer i + 1. • Most widely used architecture. Conceptually, nodes at higher levels successively abstract features from preceding layers

Network Architecture • Modular nets • Consists of several modules, each of which is itself a neural net for a particular sub-problem • Sparse connections between modules

Neural Learning • Correlation leaning • sometimes called "Hebbian learning .“ • a gradual increase in strength of connections among nodes having similar outputs when presented with the same input.

Neural Learning • Competitive learning • when an input pattern is presented to a network, different nodes compete to be " winners" with high levels of activity . • The connections between input nodes and the winner node are then modified , increasing the likelihood that the same winner continues to win in future competitions (for input patterns similar to the one that caused the adaptation). • each node specializes to be the winner for a set of similar patterns.

Neural Learning • Feedback-based weight adaptation • if increasing a particular weight leads to diminished performance or larger error, then that weight is decreased as the network is trained to perform better. • The amount of change made at every step is very small in most networks to ensure that a network does not stray too far from its partially evolved state. • Some training methods cleverly vary the rate at which a network is modified.

What Can Neural Networks Be Used for? • the tasks performed using neural networks can be classified as those requiring supervised or unsupervised learning. • In supervised learning, a teacher Is available to indicate whether a system is performing correctly • "classification“ • Ex: An archaeologist needs to determine if skeleton belong to a man or woman • In unsupervised learning, no teacher is available and learning must rely on guidance obtained heuristically by the system examining different sample data or the environment • Clustering • Ex: An archaeologist needs to determine whether a set of skeleton fragments belong to the same dinosaur species or need to be differentiated into different species.

What Can Neural Networks Be Used for? • Classification • the assignment of each object to a specific "class“ • We are provided with a "training set“ • Recognizing printed or handwritten characters • Classifying loan applications into credit-worthy and non-credit-worthy groups

What Can Neural Networks Be Used for? • Clustering • Clustering requires grouping together objects that are similar to each other

What Can Neural Networks Be Used for? • The number of clusters depends on the problem, but should be as small as possible.

What Can Neural Networks Be Used for? • Vector quantization • Neural networks have been used for compressing voluminous input data into a small number of weight vectors associated with nodes in the networks • Vector quantization is the process of dividing up space into several connected regions (called "Voronoi regions") • Each region is represented using a single vector (called a "codebook vector"). • Every point in the input space belongs to one of these regions, and is mapped to the corresponding (nearest ) codebook vector. • For two-dimensional input space the boundaries of Voronoi regions are obtained by sketching the perpendicular bisectors of the lines joining neighboring codebook vectors

What Can Neural Networks Be Used for?

What Can Neural Networks Be Used for? • Pattern association • the presentation of an input sample should trigger the generation of a specific output pattern • Auto-associative • Hetero-associative

What Can Neural Networks Be Used for? • In auto-association or associative memory tasks, the input sample is presumed to be a corrupted, noisy, or partial version of the desired output pattern.

What Can Neural Networks Be Used for? • In hetero-association, the output pattern may be any arbitrary pattern that is to be associated with a set of input patterns.

What Can Neural Networks Be Used for? • Function approximation • Function approximation is the task of learning or constructing a function that generates approximately the same outputs from input vectors as the process being modeled, based on available training data. the same finite set of samples can be used to obtain many different functions

What Can Neural Networks Be Used for? • Forecasting • predicting the behavior of stock market indices • Though perfect prediction is hardly ever possible, neural networks can be used to obtain reasonably good predictions in a number of cases.

Introduction to Neural Networks