Loading in 5 sec....

Presented by: Yevgeniy Gershteyn Larisa Perman 04/24/2003PowerPoint Presentation

Presented by: Yevgeniy Gershteyn Larisa Perman 04/24/2003

Download Presentation

Presented by: Yevgeniy Gershteyn Larisa Perman 04/24/2003

Loading in 2 Seconds...

- 157 Views
- Uploaded on
- Presentation posted in: General

Presented by: Yevgeniy Gershteyn Larisa Perman 04/24/2003

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Debate 2:

Boltzmann Machine and Simulated Annealing

Presented by:

Yevgeniy Gershteyn

Larisa Perman

04/24/2003

- Boltzmann Machine neural net was introduced by Hinton and Sejnowski in 1983.
- Used for solving constrained optimization problems.
- Typical Boltzmann Machine:
- Weights are fixed to represent the constrains of the problem and the function to be optimized.
- The net seeks the solution by changing the activations of the units (0 or 1) based on a probability distribution and the effect that the change would have on the energy function or consensus function for the net.

- The objective of the neural net is to maximize the consensus function:
Where:

wij – weight of the connection

xi, xj – are the states of the Xi and Xj units

If units are connected: wij ≠ 0

The bidirectional nature of connections: wij = wji

- The sum runs over all units of the net.

C = ∑ [∑ wij xi xj]

i j≤1

- The net finds the maximum or at least a local maximum by letting each unit attempt to change its state (from 1 to 0 or vice versa):
Where:

∆C(i) - the change in consensus (unit Xi were to change its state)

xi – the current state of unit Xi

[1 – 2xi] - +1, if Xi is ‘off’; -1, if Xi is ‘on’

∆C(i) = [1 – 2xi][wii + ∑ wij xj]

j ≠ i

- Unit Xi does not necessary change its state, so the probability of the net accepting a change in state for Xi:
Where:

T (temperature) – control parameter that reduced as the net searches for a maximal consensus

- This process of gradually reducing the temperature is called simulated annealing. This is used to reduce the probability of the net becoming trapped in a local minimum which is not a global minimum.

A(i,T) = 1 / ( 1 + exp( - ∆C(i) / T ) )

- Learning Rule was proposed by Ackey, Hinton and Sejnowski in 1985.
- Extends Hopfield model with learning:
- Each neuron fires with bipolar values.
- All connections are symmetric.
- In activation passing, the next neuron whose state we wish to update is selected randomly.
- There are no self-feedback (connections from a neuron to itself)

- Based on probabilistic operation during training, correlations.
- Deterministic operation once weights determined.

- Boltzmann Machine with hidden and visible neurons. The network is fully connected with symmetric connections.

- The Boltzmann Machine is a Hopfield network, in which
- The neurons are divided into two subsets:
- Visible, which is further divided into:
- Input
- Output

- Hidden
This allows a much richer representation of the input data.

- Visible, which is further divided into:
- The neurons are stochastic: at any time there is a probability attached to whether the neurons fires whereas the Hopfield net is based on deterministic principles.
- May use either supervised or unsupervised learning.

- The neurons are divided into two subsets:

- We will concentrate on the unsupervised learning methods.
- There are three phases in operation of the network:
- The clamped phase in which the input and output of visible neurons are held fixed, while the hidden neurons are allowed to vary.
- The free running phase in which only the inputs are held fixed and other neurons are allowed to vary.
- The learning phase.

- These phases iterate till learning has created a Boltzmann Machine which can be said to have learned the input patterns and will converge to the learned patterns when noisy or incomplete pattern is presented.

- Generally the initial weights of the net are randomly set to values in a small range e.g. -0.5 to +0.5.
- Then an input pattern is presented to the net and clamped to the visible neurons.
- Now perform a simulated annealing on the net: choose a hidden neurons at random and flip its state from sj to –sj with probability:
Where the energy of the net is:

P( sj -sj) = 1 / ( 1 + exp(-∆E/ T) )

N

E = -1/2 ∑ ∑ wji sj si

i≠j j=1

- The activation passing can continue till the net reaches a low energy state.
- Because the stochastic nature of the Boltzmann Machine, we cannot specify a single state which will be the attractor of the system.
- But, the net will reach a state of thermal equilibrium in which individual neurons will change state and the probability of any single state can be calculated.
- For a system being in any state αwith associated energyEα and temperature T the probability will be:

P(α) = ( exp( -Eα / T ) ) / ( ∑β exp(-Eβ /T) )

- The updating can be present as a local operation since:
Where: | vj | - the absolute value of the jth neuron’s activation

- After the temperature is gradually dropped, the net goes as low an energy state as it can at each temperature.
- So, the correlations between the firing of pairs of neurons at the final temperature:
Where: ‘+’ indicates that the correlations is calculated when the visible neurons are in a clamped state

N

∆E = -∆sj ∑wjisi = -2 * | vj |

i=1

ρ+ĳ = ‹sj si›+

- Here we need to repeat the same calculations, but do not clamp the visible neurons.
- After presentation of the input patterns all neurons can update their states and the annealing schedule is performed (as before).
- And again, the correlations between the firing of pairs of neurons at the final temperature:
Where: ‘-’ indicates that the correlations is carried out when the visible neurons are not in a clamped state

ρ-ij = ‹sjsi›-

- Here we use the Boltzmann Machine’s learning rule to update the weights:
Where: η – is a learning rate.

This means: whether weight are changed depend on the difference between the correlations in clamped vs. free mode.

- By applying this learning rule the pattern completion property of the Boltzmann Machine is established.

∆wĳ = η(ρ+ij – ρ-ij), ∀i,j

- For supervised learning the set of visible neurons is split into input and output neurons, and the machine will be used to associate an input pattern with an output pattern.

- During the clamped phase, the input and output patterns are clamped to the appropriate units.
- As before the hidden neurons’ activations can settle at the various temperatures.
- During free running phase, only the input neurons are clamped – both the output neurons and the hidden neurons can pass activation round till the activations in the network settles.
- Learning rule here must be modulated by the probability of the input’s patterns:
Where: Pα – priori probability of state αat the input neurons.

∆wĳ,α = ηPα(ρ+ij – ρ-ij), ∀i,j

- The weighted matching problem:
- A set of N point with a known “distance” between each.
- Link the points together in pairs so as to minimize the total length of the links.

- The Traveling Salesman Problem.
- Graph bipartitioning:
- A set of points which will be split into two disjoint sets with as low an associated cost as possible.

- The Boltzmann Machine: Necker Cube Example http://www.cs.cf.ac.uk/Dave/JAVA/boltzman/Necker.html
- Simulated Annealing Demo http://www.taygeta.com/annealing/demo1.html
- Simulated Annealing: Animation http://goethe.ira.uka.de/~syrjakow/anim_env/startenv_mpi.html

- Learning in Boltzmann Machine is accomplished by using a Simulated Annealing technique which has stochastic nature.
- Boltzmann Machine:
- Global search: finding the global minimum/maximum of potential energy.
- The concept of it specifies that the neural net is first operated at a high temperature, which is gradually lowered until the net is trapped in an balance configuration around a single minimum of the energy function.

- Simulated Annealing:
- Local search: finding relative minimum/maximum of potential energy.
- Force search out of local regions by accepting suboptimal state transitions with decreasing probability.

- L. Fausett, Fundamentals of Neural Networks, Prentice Hall, 1994
- J. Freeman, D. Skapura, Neural Networks: Algorithms, Applications, and Programming Techniques, Addison-Wesley, 1991
- B. Muller, J. Reinhardt, M.T. Strickland, Neural Networks: An Introduction, Springer-Verlag, 1995