1 / 21

# Presented by: Yevgeniy Gershteyn Larisa Perman 04 - PowerPoint PPT Presentation

Debate 2 : Boltzmann Machine and Simulated Annealing. Presented by: Yevgeniy Gershteyn Larisa Perman 04/24/2003. Boltzmann Machine. Boltzmann Machine neural net was introduced by Hinton and Sejnowski in 1983. Used for solving constrained optimization problems. Typical Boltzmann Machine:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Presented by: Yevgeniy Gershteyn Larisa Perman 04' - ralph

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Boltzmann Machine and Simulated Annealing

Presented by:

Yevgeniy Gershteyn

Larisa Perman

04/24/2003

• Boltzmann Machine neural net was introduced by Hinton and Sejnowski in 1983.

• Used for solving constrained optimization problems.

• Typical Boltzmann Machine:

• Weights are fixed to represent the constrains of the problem and the function to be optimized.

• The net seeks the solution by changing the activations of the units (0 or 1) based on a probability distribution and the effect that the change would have on the energy function or consensus function for the net.

• The objective of the neural net is to maximize the consensus function:

Where:

wij – weight of the connection

xi, xj – are the states of the Xi and Xj units

If units are connected: wij ≠ 0

The bidirectional nature of connections: wij = wji

• The sum runs over all units of the net.

C = ∑ [∑ wij xi xj]

i j≤1

• The net finds the maximum or at least a local maximum by letting each unit attempt to change its state (from 1 to 0 or vice versa):

Where:

∆C(i) - the change in consensus (unit Xi were to change its state)

xi – the current state of unit Xi

[1 – 2xi] - +1, if Xi is ‘off’; -1, if Xi is ‘on’

∆C(i) = [1 – 2xi][wii + ∑ wij xj]

j ≠ i

• Unit Xi does not necessary change its state, so the probability of the net accepting a change in state for Xi:

Where:

T (temperature) – control parameter that reduced as the net searches for a maximal consensus

• This process of gradually reducing the temperature is called simulated annealing. This is used to reduce the probability of the net becoming trapped in a local minimum which is not a global minimum.

A(i,T) = 1 / ( 1 + exp( - ∆C(i) / T ) )

• Learning Rule was proposed by Ackey, Hinton and Sejnowski in 1985.

• Extends Hopfield model with learning:

• Each neuron fires with bipolar values.

• All connections are symmetric.

• In activation passing, the next neuron whose state we wish to update is selected randomly.

• There are no self-feedback (connections from a neuron to itself)

• Based on probabilistic operation during training, correlations.

• Deterministic operation once weights determined.

• Boltzmann Machine with hidden and visible neurons. The network is fully connected with symmetric connections.

• The Boltzmann Machine is a Hopfield network, in which

• The neurons are divided into two subsets:

• Visible, which is further divided into:

• Input

• Output

• Hidden

This allows a much richer representation of the input data.

• The neurons are stochastic: at any time there is a probability attached to whether the neurons fires whereas the Hopfield net is based on deterministic principles.

• May use either supervised or unsupervised learning.

• We will concentrate on the unsupervised learning methods.

• There are three phases in operation of the network:

• The clamped phase in which the input and output of visible neurons are held fixed, while the hidden neurons are allowed to vary.

• The free running phase in which only the inputs are held fixed and other neurons are allowed to vary.

• The learning phase.

• These phases iterate till learning has created a Boltzmann Machine which can be said to have learned the input patterns and will converge to the learned patterns when noisy or incomplete pattern is presented.

• Generally the initial weights of the net are randomly set to values in a small range e.g. -0.5 to +0.5.

• Then an input pattern is presented to the net and clamped to the visible neurons.

• Now perform a simulated annealing on the net: choose a hidden neurons at random and flip its state from sj to –sj with probability:

Where the energy of the net is:

P( sj -sj) = 1 / ( 1 + exp(-∆E/ T) )

N

E = -1/2 ∑ ∑ wji sj si

i≠j j=1

• The activation passing can continue till the net reaches a low energy state.

• Because the stochastic nature of the Boltzmann Machine, we cannot specify a single state which will be the attractor of the system.

• But, the net will reach a state of thermal equilibrium in which individual neurons will change state and the probability of any single state can be calculated.

• For a system being in any state αwith associated energyEα and temperature T the probability will be:

P(α) = ( exp( -Eα / T ) ) / ( ∑β exp(-Eβ /T) )

• The updating can be present as a local operation since:

Where: | vj | - the absolute value of the jth neuron’s activation

• After the temperature is gradually dropped, the net goes as low an energy state as it can at each temperature.

• So, the correlations between the firing of pairs of neurons at the final temperature:

Where: ‘+’ indicates that the correlations is calculated when the visible neurons are in a clamped state

N

∆E = -∆sj ∑wjisi = -2 * | vj |

i=1

ρ+ĳ = ‹sj si›+

• Here we need to repeat the same calculations, but do not clamp the visible neurons.

• After presentation of the input patterns all neurons can update their states and the annealing schedule is performed (as before).

• And again, the correlations between the firing of pairs of neurons at the final temperature:

Where: ‘-’ indicates that the correlations is carried out when the visible neurons are not in a clamped state

ρ-ij = ‹sjsi›-

• Here we use the Boltzmann Machine’s learning rule to update the weights:

Where: η – is a learning rate.

This means: whether weight are changed depend on the difference between the correlations in clamped vs. free mode.

• By applying this learning rule the pattern completion property of the Boltzmann Machine is established.

∆wĳ = η(ρ+ij – ρ-ij), ∀i,j

• For supervised learning the set of visible neurons is split into input and output neurons, and the machine will be used to associate an input pattern with an output pattern.

• During the clamped phase, the input and output patterns are clamped to the appropriate units.

• As before the hidden neurons’ activations can settle at the various temperatures.

• During free running phase, only the input neurons are clamped – both the output neurons and the hidden neurons can pass activation round till the activations in the network settles.

• Learning rule here must be modulated by the probability of the input’s patterns:

Where: Pα – priori probability of state αat the input neurons.

∆wĳ,α = ηPα(ρ+ij – ρ-ij), ∀i,j

• The weighted matching problem:

• A set of N point with a known “distance” between each.

• Link the points together in pairs so as to minimize the total length of the links.

• The Traveling Salesman Problem.

• Graph bipartitioning:

• A set of points which will be split into two disjoint sets with as low an associated cost as possible.

• The Boltzmann Machine: Necker Cube Example http://www.cs.cf.ac.uk/Dave/JAVA/boltzman/Necker.html

• Simulated Annealing Demo http://www.taygeta.com/annealing/demo1.html

• Simulated Annealing: Animation http://goethe.ira.uka.de/~syrjakow/anim_env/startenv_mpi.html

• Learning in Boltzmann Machine is accomplished by using a Simulated Annealing technique which has stochastic nature.

• Boltzmann Machine:

• Global search: finding the global minimum/maximum of potential energy.

• The concept of it specifies that the neural net is first operated at a high temperature, which is gradually lowered until the net is trapped in an balance configuration around a single minimum of the energy function.

• Simulated Annealing:

• Local search: finding relative minimum/maximum of potential energy.

• Force search out of local regions by accepting suboptimal state transitions with decreasing probability.

• L. Fausett, Fundamentals of Neural Networks, Prentice Hall, 1994

• J. Freeman, D. Skapura, Neural Networks: Algorithms, Applications, and Programming Techniques, Addison-Wesley, 1991

• B. Muller, J. Reinhardt, M.T. Strickland, Neural Networks: An Introduction, Springer-Verlag, 1995