Back Propagation. Amir Ali Hooshmandan Mehran Najafi Mohamad Ali Honarpisheh. Contents. What is it? History Architecture Activation Function Learnig Algorithm EBP Heuristics How Long to Train Virtues AND Limitations of BP About Initialization Accelerating training An Application
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Back Propagation
Amir Ali Hooshmandan
Mehran Najafi
Mohamad Ali Honarpisheh
second order back propagation
second order direct propagation
Zk
k
Wj,k
j
Zj
Vi,j
Xi
i
Characteristics:
output of neuron J
ek(n) = dk(n)  yk(n)
Energy in this error
Total energy of the
output of net:
Energy of the error
produced by output
Neuron J in epoch n
Purpose
Minimizing E(n)
e_yk(n) = dk(n)  yk(n)
Local Field
Chain Rule in Derivation

Local gradient
Chain
Rule
Problem
We don’t have error because
it is responsible for many errors
Find another way to compute δj
(Weight correction) = (Learning rate parameter)
* (local gradient)
* (input signal of previouse layer Neuron)
Step 0 : Initialize weights
(Set to random variables with zero mean and variance one)
Step 1: While stopping condition is false do Step 29.
Step 2:
For each training pair do Steps 38.
Feed forward
Step 3: Each input unit(Xi,i=1,..,n) receives
input signal xi and broadcasts this signal
to all units in the layer above(the hidden units(
Step 4: Each hidden unit (Zj j=1,…,p) sums its weighted input signals
applies its activation function to compute its output signal
and sends this signal to all units in the layer above
(outputunits)
Step 5: Each output unit (Yk ,k=1,…..,m) sums its weighted input signals,
y_ink=wOk+
and applies its activation function to compute its output signal.
yk=f(y_ink).
Backpropagation of error:
Step 6: Each output unit ( Yk ,k=1,…,m) receives a target
pattern corresponding to the input training patern
computes its error information term.
calculates its weight correction term (used to update wjk later),
calculates its bias correction term ( used to update wOk later).
and sends to units in the layer below,
Step 7: Each hidden units (Zj, j=1,…,p) sums its delta inputs from units in the layer above).
O_inj=
multiplies by the derivative of its activation function to
calculate its error information term,
calculates its weight correction term(used to update vij later),
and calculates its bias correction term(used to update voj later),
Update weights and biases:
Step 8: Each output units(Yk,k=1,….,m) updates its bias
and weights(j=0,…,p):
wjk(new)=wjk(old)+
Each hidden unit(Zj j=1,….,p) updates its bias and
weights(i=0,….,n):
vij(new)=vij(old)+
Step 9: Test stopping condition
data until you have first tried a singlelayer model”
Both architectures are theoretically able to approximate any continuous function to the desired degree of accuracy.
( Minimizing cost function is not necessarily good idea ).
Saarinen (1992 ) : Local convergence rates of the BP algorithm are linear
contribution of the HL neurons to the output error, and consequently the effect of the hidden layer weights is not visible enough.
Training of a net initialized as discussed…
Net whose weights are initialized to random values between .5 and .5
Momentum Parameter , its in the range from 0 to 1
Use information of current and past derivative to form “deltabar”
Results for XOR …
Computer Network Intrusion Detection Via Neural Networks Methods
1. gradient descent back propagation (BP)
2. gradient descent BP with momentum
3. variable learning rate gradient descent BP
4. Conjugate Gradient BP (CGP)
detecting unauthorized users from accessing the information on those computers.
x = input vector consisting a user’s attributes
y = {authorized user, intruder}
We want to map the input set x to an output
The error of our model is: e = d  y
d = desired output
y = actual output
Sigmoidal :
command, host, time, execution time
Training Data :
90% Authorized traffic
10% Intrusion traffic
Testing Data:
98% Authorized traffic
2% Intrusion traffic
File1 consists of 5 CUs in each of the input
File2 consists of 6 CUs in each of the input
File3 consists of 7 CUs in each of the input
Each CU has 4 elements
1. Momentum allows a network to respond not only to the local gradient, but also to recent trends in the error Surface
2. Momentum allows the network to ignore small features in the error surface.
3. Without momentum a network may get stuck in shallow local minimum. With momentum a network can slide through such a minimum
the algorithm will become unstable.
the algorithm will take a long time to converge.
conjugate directions, which produces generally faster convergence
Than steepest descent directions.
Output Values of the Two Classes
(Memorize & Generalize Problem)
Different Problems Require Different Learning Rate Adaptive Methods
General parameter settings
Network Architecture : standard feedforward neural network.
Maximum of parameters was constant for all tasks.
AFs for noninput units : standard hyperbolic tangent.
Correctness : All output units produce correct answer.
Seven networks for each configuration…
Fixed total number of itteration.
Free Parameters
LRs
Each element represents the number of times the AM solve the task (max=7) for a given parameter combination. The lines correspond to different LRs and the columns to different free parameter values.
MOM and DBD achieving a performance of 6/7 .
For DBD and MOM, although many parameter combinations solved the task, none resulted in a one hundred percent efficacy.