Presented by: Yevgeniy Gershteyn Larisa Perman 04/24/2003

1 / 21

# Presented by: Yevgeniy Gershteyn Larisa Perman 04/24/2003 - PowerPoint PPT Presentation

Debate 2 : Boltzmann Machine and Simulated Annealing. Presented by: Yevgeniy Gershteyn Larisa Perman 04/24/2003. Boltzmann Machine. Boltzmann Machine neural net was introduced by Hinton and Sejnowski in 1983. Used for solving constrained optimization problems. Typical Boltzmann Machine:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Presented by: Yevgeniy Gershteyn Larisa Perman 04/24/2003' - ralph

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Debate 2:

Boltzmann Machine and Simulated Annealing

Presented by:

Yevgeniy Gershteyn

Larisa Perman

04/24/2003

Boltzmann Machine
• Boltzmann Machine neural net was introduced by Hinton and Sejnowski in 1983.
• Used for solving constrained optimization problems.
• Typical Boltzmann Machine:
• Weights are fixed to represent the constrains of the problem and the function to be optimized.
• The net seeks the solution by changing the activations of the units (0 or 1) based on a probability distribution and the effect that the change would have on the energy function or consensus function for the net.
Boltzmann Machine
• The objective of the neural net is to maximize the consensus function:

Where:

wij – weight of the connection

xi, xj – are the states of the Xi and Xj units

If units are connected: wij ≠ 0

The bidirectional nature of connections: wij = wji

• The sum runs over all units of the net.

C = ∑ [∑ wij xi xj]

i j≤1

Boltzmann Machine
• The net finds the maximum or at least a local maximum by letting each unit attempt to change its state (from 1 to 0 or vice versa):

Where:

∆C(i) - the change in consensus (unit Xi were to change its state)

xi – the current state of unit Xi

[1 – 2xi] - +1, if Xi is ‘off’; -1, if Xi is ‘on’

∆C(i) = [1 – 2xi][wii + ∑ wij xj]

j ≠ i

Boltzmann Machine
• Unit Xi does not necessary change its state, so the probability of the net accepting a change in state for Xi:

Where:

T (temperature) – control parameter that reduced as the net searches for a maximal consensus

• This process of gradually reducing the temperature is called simulated annealing. This is used to reduce the probability of the net becoming trapped in a local minimum which is not a global minimum.

A(i,T) = 1 / ( 1 + exp( - ∆C(i) / T ) )

Boltzmann Machine Learning Rule
• Learning Rule was proposed by Ackey, Hinton and Sejnowski in 1985.
• Extends Hopfield model with learning:
• Each neuron fires with bipolar values.
• All connections are symmetric.
• In activation passing, the next neuron whose state we wish to update is selected randomly.
• There are no self-feedback (connections from a neuron to itself)
• Based on probabilistic operation during training, correlations.
• Deterministic operation once weights determined.
Simple Boltzmann Machine
• Boltzmann Machine with hidden and visible neurons. The network is fully connected with symmetric connections.
Boltzmann Machine Structure
• The Boltzmann Machine is a Hopfield network, in which
• The neurons are divided into two subsets:
• Visible, which is further divided into:
• Input
• Output
• Hidden

This allows a much richer representation of the input data.

• The neurons are stochastic: at any time there is a probability attached to whether the neurons fires whereas the Hopfield net is based on deterministic principles.
• May use either supervised or unsupervised learning.
Boltzmann Machine Operation
• We will concentrate on the unsupervised learning methods.
• There are three phases in operation of the network:
• The clamped phase in which the input and output of visible neurons are held fixed, while the hidden neurons are allowed to vary.
• The free running phase in which only the inputs are held fixed and other neurons are allowed to vary.
• The learning phase.
• These phases iterate till learning has created a Boltzmann Machine which can be said to have learned the input patterns and will converge to the learned patterns when noisy or incomplete pattern is presented.
Clamped Phase
• Generally the initial weights of the net are randomly set to values in a small range e.g. -0.5 to +0.5.
• Then an input pattern is presented to the net and clamped to the visible neurons.
• Now perform a simulated annealing on the net: choose a hidden neurons at random and flip its state from sj to –sj with probability:

Where the energy of the net is:

P( sj -sj) = 1 / ( 1 + exp(-∆E/ T) )

N

E = -1/2 ∑ ∑ wji sj si

i≠j j=1

Clamped Phase (cont)
• The activation passing can continue till the net reaches a low energy state.
• Because the stochastic nature of the Boltzmann Machine, we cannot specify a single state which will be the attractor of the system.
• But, the net will reach a state of thermal equilibrium in which individual neurons will change state and the probability of any single state can be calculated.
• For a system being in any state αwith associated energyEα and temperature T the probability will be:

P(α) = ( exp( -Eα / T ) ) / ( ∑β exp(-Eβ /T) )

Clamped Phase (cont)
• The updating can be present as a local operation since:

Where: | vj | - the absolute value of the jth neuron’s activation

• After the temperature is gradually dropped, the net goes as low an energy state as it can at each temperature.
• So, the correlations between the firing of pairs of neurons at the final temperature:

Where: ‘+’ indicates that the correlations is calculated when the visible neurons are in a clamped state

N

∆E = -∆sj ∑wjisi = -2 * | vj |

i=1

ρ+ĳ = ‹sj si›+

Free Running Phase
• Here we need to repeat the same calculations, but do not clamp the visible neurons.
• After presentation of the input patterns all neurons can update their states and the annealing schedule is performed (as before).
• And again, the correlations between the firing of pairs of neurons at the final temperature:

Where: ‘-’ indicates that the correlations is carried out when the visible neurons are not in a clamped state

ρ-ij = ‹sjsi›-

Learning Phase
• Here we use the Boltzmann Machine’s learning rule to update the weights:

Where: η – is a learning rate.

This means: whether weight are changed depend on the difference between the correlations in clamped vs. free mode.

• By applying this learning rule the pattern completion property of the Boltzmann Machine is established.

∆wĳ = η(ρ+ij – ρ-ij), ∀i,j

Supervised Learning
• For supervised learning the set of visible neurons is split into input and output neurons, and the machine will be used to associate an input pattern with an output pattern.
Supervised Learning (cont)
• During the clamped phase, the input and output patterns are clamped to the appropriate units.
• As before the hidden neurons’ activations can settle at the various temperatures.
• During free running phase, only the input neurons are clamped – both the output neurons and the hidden neurons can pass activation round till the activations in the network settles.
• Learning rule here must be modulated by the probability of the input’s patterns:

Where: Pα – priori probability of state αat the input neurons.

∆wĳ,α = ηPα(ρ+ij – ρ-ij), ∀i,j

Applications
• The weighted matching problem:
• A set of N point with a known “distance” between each.
• Link the points together in pairs so as to minimize the total length of the links.
• The Traveling Salesman Problem.
• Graph bipartitioning:
• A set of points which will be split into two disjoint sets with as low an associated cost as possible.
Demos
• The Boltzmann Machine: Necker Cube Example http://www.cs.cf.ac.uk/Dave/JAVA/boltzman/Necker.html
• Simulated Annealing Demo http://www.taygeta.com/annealing/demo1.html
• Simulated Annealing: Animation http://goethe.ira.uka.de/~syrjakow/anim_env/startenv_mpi.html
Conclusion
• Learning in Boltzmann Machine is accomplished by using a Simulated Annealing technique which has stochastic nature.
• Boltzmann Machine:
• Global search: finding the global minimum/maximum of potential energy.
• The concept of it specifies that the neural net is first operated at a high temperature, which is gradually lowered until the net is trapped in an balance configuration around a single minimum of the energy function.
• Simulated Annealing:
• Local search: finding relative minimum/maximum of potential energy.
• Force search out of local regions by accepting suboptimal state transitions with decreasing probability.
References
• L. Fausett, Fundamentals of Neural Networks, Prentice Hall, 1994
• J. Freeman, D. Skapura, Neural Networks: Algorithms, Applications, and Programming Techniques, Addison-Wesley, 1991
• B. Muller, J. Reinhardt, M.T. Strickland, Neural Networks: An Introduction, Springer-Verlag, 1995