Learning in Neural and Belief Networks

Learning in Neural and Belief Networks Feed Forward Neural Network 2001년 3월 28일 20013329 안순길

Contents • How the Brain works • Neural Networks • Perceptrons

Introduction • Two view points in this chapter • Computational view points : representing function using network • Biological view points : mathematical model for brain • Neuron: computing elements • Neural Networks: collection of interconnected neurons

How the Brain Works • Cell body (soma) :provides the support functions and structure of the cell • Axon : a branching fiber which carries signals away from the neurons • Synapse : converts a electrical signal into a chemical signal • Dendrites : consist of more branching fibers which receive signal from other nerve cells • Action potential: electrical pulse • Synapse • excitatory: increasing potential • synaptic connection: plasticity • inhibitory: decreasing potential A collection of simple cells can lead to thoughts, action, and consciousness.

Comparing brains with digital computers • They perform quite different tasks, have different properties • Speed (in Switching speed) • computer is a million times faster • brain is a billion times faster • Brain • Perform a complex task • More fault-tolerant: graceful degradation • To be trained using an inductive learning algorithm

Neural Networks • NN: nodes(unit), links(has a numeric weight) • Each link has a weight • Learning : updating the weights • Two computational components • linear component: input function • nonlinear component: activation function

Notation

Simple computing elements • Total weighted input • By applying the activation function g

Three activation function

Threshold • To cause the neuron to fire • can be replaced with an extra input weight. • The input greater than threshold, output 1 • Otherwise 0

Applying neural network in Logic Gates

Network structures(I) • Feed-forward networks • Unidirectional links, no cycles • DAG(directed acyclic graph) • No links between units in the same layer, no links backward to a previous layer, no links that skip a layer. • Uniformly processing from input units to output units • No internal state

input units/ output units/ hidden units • Perceptron: no hidden units • Multilayer networks: one or more hidden units • Specific parameterized structure: fixed structure and activation function • Nonlinear regression: g(nonlinear function)

Network Structures(II) • Recurrent Network • The Brain similar to Recurrent Network • Brain has backward link like Recurrent • Recurrent networks have internal states stored in the activation level • Unstable, oscillate, exhibit chaotic behavior • Long computation time • Need advanced mathematical method

Network Structures(III) • Examples • Hopfield networks • Bidirectional connections with symmetric weights • Associative memory: most closely resembles the new stimulus • Boltzmann machines • Stochastic(probabilitic) activation function

Optimal Network Struture(I) • Too small network: in capable of representation • Too big network: not generalized well • Overfitting when there are too many parameters. • Feed forward NN with one hidden layer • can approximate any continuous function • Feed forward NN with 2 hidden layer • can approximate any function

Optimal Network Structures(II) • NERF(Network Efficiently Representable Functions) • Function that can be approximated with a small number of units • Using genetic algorithm: running the whole NN training protocol • Hill-climbing search(modifying an existing network structure) • Start with a big network: optimal brain damage • Removing weights from fully connected model • Start with a small network: tiling algorithm • Start with single unit and add subsequent units • Cross-validation techniques

Perceptrons • Perceptron: single-layer, feed-forward network • Each output unit is indep. of the others • Each weight only affects one of the outputs where,

What perceptrons can represent • Boolean function AND, OR, and NOT • Majority function: Wj=1, t=n/2 ->1 unit, n weights • In case of decision tree: O(2n) nodes • can only represent linearly separable functions. • cannot represent XOR

Examples of Perceptrons • Entire input space is divided in two along a boundary defined by • In Figure 19.9(a): n=2 • In Figure 19.10(a): n=3

Learning linearly separable functions(I) • Bad news: not many problem in this set • Good news: given enough training examples, there exists a perceptron algorithm learning them. • Neural network learning algorithm • Current-best-hypothesis(CBH) scheme • Hypothesis: a network defined by the current values of the weights • Initial network: randomly assigned weight in [-0.5, 0.5] • Repeat the update phase to achieve convergence • Each epoch: updating all the weights for all the examples

Learning linearly separable functions(II) • Learning • The error • Err=T-O • :Rosenblatt in 1960 • : learning rate • Error positive • Need to increase O • Error negative • Need to decrease O

Algorithm

Perceptrons(Minsky and Papert, 1969) • Limits of linearly separable functions • Gradient descent search through weight space • Weight space han no local minima • Difference btw. NN and other attribute-based methods such as decision trees. • Real numbers in some fixed range vs. discrete set • Dealing with discrete set • Local encoding: a single input, discrete attribute values • None=0.0, Some=0.5, Full=1.0 (WillWait) • Distributed encoding: one input unit for each attribute

Example

Summary(I) • Neural network is made by seeing human’s brain • Brain still superior to Computer in Switching Speed • More fault-tolerant • Neural network • nodes(unit), links(has a numeric weight) • Each link has a weight • Learning : updating the weights • Two computational components • linear component: input function • nonlinear component: activation function

Summary(II) • In this text, We only consider • Feed-forward networks • Unidirectional links, no cycles • DAG(directed acyclic graph) • No links between units in the same layer, no links backward to a previous layer, no links that skip a layer. • Uniformly processing from input units to output units • No internal state

Summary(III) • Network size decides Representation Power • Overfitting when there are too many parameters. • Feed forward NN with one hidden layer • can approximate any continuous function • Feed forward NN with 2 hidden layer • can approximate any function

Summary(IV) • Perceptron: single-layer, feed-forward network • Each output unit is indep. of the others • Each weight only affects one of the outputs • Only available in linear separable functions • If Problem Space is flat, Neural Network is very available. • In other words, if we make it easy in algorithm perspective, Neural network also do • Basically, Back Propagation only guarantee Local Optimality in neural network

Learning in Neural and Belief Networks