Soft Computing. Lecture 10 Boltzmann machine. Definition of wikipedia. A Boltzmann machine is a type of stochastic recurrent neural network originally invented by Geoffrey Hinton and Terry Sejnowski . Boltzmann machines can be seen as the stochastic , generative counterpart
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Soft Computing
Lecture 10
Boltzmann machine
A Boltzmann machine is a type of stochastic recurrent neural network
originally invented by Geoffrey Hinton and Terry Sejnowski.
Boltzmann machines can be seen as the stochastic, generative counterpart
of Hopfield nets.
They were an early example of neural networks capable of forming internal
representations. Because they are very slow to simulate they are
not very useful for most practical purposes.
However, they are theoretically intriguing due to the biological plausibility
of their training algorithm.
Definition of BM (2)
Definition of BM (3)
Thus, the difference in the global energy that results from
a single unit i being 0 or 1, written ΔEi, is given by:
A Boltzmann machine is made up of stochastic units.
The probability, piof the ithunit being on is given by:
(The scalarT is referred to as the "temperature“
of the system.)
Notice that temperature T plays a crucial role in the equation, and that
in the course of running the network, the value of T will start high, and
gradually 'cool down' to a lower value.
This is a continuous function that transforms any inputs - from –infinity
to +infinity - into real numbers in the interval [0, 1]. This is the logistic function,
& has a characteristic sigmoid shape:
So when Net = 0, e-Net = 1, because
any number raised to the power 0 is 1.
This true for all temperatures.
So always prob(A = 1) = 1/2.
I.e. if the netinput is 0, it's as likely
to fire as not.
Definition of BM (6)
Units are divided into "visible" units, V, and "hidden" units, H.
The visible units are those which receive information from the "environment",
i.e. those units that receive binary state vectors for training.
The connections in a Boltzmann machine have three restrictions on them:
(No unit has a connection with itself)
(All connections are symmetric)
(Visible units have no connections between them)
Boltzmann machines can be viewed as a type of maximum likelihood model,
i.e. training involves modifying the parameters (weights) in the network
to maximize the probability of the network producing the data as it was seen
in the training set. In other words, the network must successfully model
the probabilities of the data in the environment.
There are two phases to Boltzmann machine training. One is the "positive“
phase where the visible units' states are clamped to a particular binary state
vector from the training set. The other is the "negative" phase where
the network is allowed to run freely, i.e. no units have their state determined
by external data.
A vector over the visible units is denoted Vα and a vector over the hidden
units is denoted as Hβ. The probabilities P+(S) and P−(S) represent
the probability for a given state, S, in the positive and negative phases
respectively.
Note that this means that P+(Vα) is determined by the environment
for every Vα, because the visible units are set by the environment
in the positive phase.
Boltzmann machines are trained using a gradient descent algorithm,
so a given weight, wij is changed by subtracting the partial derivative
of a cost function with respect to the weight. The cost function used
for Boltzmann machines, G, is given as:
This means that the cost function is lowest when the probability of a vector
in the negative phase is equivalent to the probability of the same vector
in the positive phase. As well, it ensures that the most probable vectors
in the data have the greatest effect on the cost
This cost function would seem to be complicated to perform gradient descent
with. Suprisingly though, the gradient with respect to a given weight, wij,
at thermal equilibrium is given by the very simple equation:
This result follows from the fact that at thermal equilibrium the probability
of any state when the network is free-running is given by the Boltzmann
distribution (hence the name "Boltzmann machine").
A state of thermal equilibrium is crucial for this though, hence the network
must be brought to thermal equilibrium before the probabilities of two units
both being on are calculated. Thermal equilibrium is achieved with
simulated annealing in a Boltzmann machine. It is the necessity of simulated
annealing which can make training a Boltzmann machine on a digital
computer a very slow process.
However, this learning rule is fairly biologically plausible because the only
information needed to change the weights is provided by "local" information.
That is, the connection (or synapse biologically speaking) does not need
information about anything other than the two neurons it connects.
This is far more biologically realistic than the information needed by
a connection in many other neural network training algorithms, such as
backpropagation.
Simulated annealing (SA) is a generic probabilisticmeta-algorithm
for the global optimization problem, namely locating a good approximation
to the global optimum of a given function in a large search space.
It was independently invented by S. Kirkpatrick, C. D. Gelatt and
M. P. Vecchi in 1983, and by V. Cerny in 1985.
The name and inspiration come from annealing in metallurgy,
a technique involving heating and controlled cooling of a material
to increase the size of its crystals and reduce their defects.
The heat causes the atoms to become unstuck from their initial positions
(a local minimum of the internal energy) and wander randomly through
states of higher energy; the slow cooling gives them more chances
of finding configurations with lower internal energy than the initial one.
Sometimes machine boltzmann are used in combination
with Hopfield model and perceptron as device for find global
minimum of energy function or error function correspondingly.
In first case during of working (recall) state of any neuron
is changed according of T (temperature) and if energy
function decreases then this changing is accepted and
process continues.
In perceptron during of learning any weight is changed and this
changing is accepted or not in according with estimation
of error function.
This process of changing may be executed in mixture with
usual process of working or learning or after it to improve result.
Boltzmann Machine Simulator
/******************************************************************************
==========================================
Network: Boltzmann Machine with Simulated Annealing
==========================================
Application: Optimization Traveling Salesman Problem
void InitializeApplication(NET* Net)
{ INT n1,n2;
REAL x1,x2,y1,y2;
REAL Alpha1, Alpha2;
Gamma = 7;
for (n1=0; n1<NUM_CITIES; n1++)
{ for (n2=0; n2<NUM_CITIES; n2++)
{ Alpha1 = ((REAL) n1 / NUM_CITIES) * 2 * PI;
Alpha2 = ((REAL) n2 / NUM_CITIES) * 2 * PI;
x1 = cos(Alpha1);
y1 = sin(Alpha1);
x2 = cos(Alpha2);
y2 = sin(Alpha2);
Distance[n1][n2] = sqrt(sqr(x1-x2) + sqr(y1-y2));
} }
f = fopen("BOLTZMAN.txt", "w");
fprintf(f, "Temperature Valid Length Tour\n\n"); }
void CalculateWeights(NET* Net)
{ INT n1,n2,n3,n4;
INT i,j;
INT Pred_n3, Succ_n3;
REAL Weight;
for (n1=0; n1<NUM_CITIES; n1++)
{ for (n2=0; n2<NUM_CITIES; n2++)
{ i = n1*NUM_CITIES+n2;
for (n3=0; n3<NUM_CITIES; n3++)
{ for (n4=0; n4<NUM_CITIES; n4++)
{ j = n3*NUM_CITIES+n4; Weight = 0;
if (i!=j)
{ Pred_n3 = (n3==0 ? NUM_CITIES-1 : n3-1);
Succ_n3 = (n3==NUM_CITIES-1 ? 0 : n3+1);
if ((n1==n3) OR (n2==n4))
Weight = -Gamma;
else if ((n1 == Pred_n3) OR (n1 == Succ_n3))
Weight = -Distance[n2][n4];
}
Net->Weight[i][j] = Weight;
}
}
Net->Threshold[i] = -Gamma/2;
} } }
void PropagateUnit(NET* Net, INT i)
{ INT j; REAL Sum, Probability;
Sum = 0;
for (j=0; j<Net->Units; j++)
{ Sum += Net->Weight[i][j] * Net->Output[j];
}
Sum -= Net->Threshold[i];
Probability = 1 / (1 + exp(-Sum / Net->Temperature));
if (RandomEqualREAL(0, 1) <= Probability)
Net->Output[i] = TRUE;
else
Net->Output[i] = FALSE; }
void BringToThermalEquilibrium(NET* Net)
{ INT n,i;
for (i=0; i<Net->Units; i++)
{ Net->On[i] = 0;
Net->Off[i] = 0;
}
for (n=0; n<1000*Net->Units; n++)
{ PropagateUnit(Net, i = RandomEqualINT(0, Net->Units-1));
}
for (n=0; n<100*Net->Units; n++)
{
PropagateUnit(Net, i = RandomEqualINT(0, Net->Units-1));
if (Net->Output[i])
Net->On[i]++;
else
Net->Off[i]++;
}
for (i=0; i<Net->Units; i++)
{ Net->Output[i] = Net->On[i] > Net->Off[i];
}
}
void Anneal(NET* Net)
{ Net->Temperature = 100;
do { BringToThermalEquilibrium(Net);
WriteTour(Net); Net->Temperature *= 0.99;
}
while (NOT ValidTour(Net));
}
void main()
{ NET Net;
InitializeRandoms();
GenerateNetwork(&Net);
InitializeApplication(&Net);
CalculateWeights(&Net);
SetRandom(&Net);
Anneal(&Net);
FinalizeApplication(&Net);
}