BACKPROPAGATION: An Example of Supervised Learning

BACKPROPAGATION: An Example of Supervised Learning • One useful network is feed-forward network (often trained using the backpropagation algorithm) called the Multi-Layer Perceptron (MLP) in Fig.

MLP • Three layers of (real-valued) units in range (0..1). • INPUT LAYER: sets of data presented to the network, connected by weighted connections to the: • HIDDEN LAYER: connected by weighted connections to the: • OUTPUT LAYER: represents networks output for a given input

MLP training • Repeatedly presented with sample inputs and desired targets. • Output and targets compared and error measured. • Adjusts weights until correct output for every (most of) input. • For XOR inputs and target outputs as in Fig.

Training has two phases: • Forward pass (Next Fig.): • (A) One of training patterns presented to input layer. • xp = (xp1, xp2,... xpn) which may be a binary or real-numbered vector. • (B) Activations of hidden layer units calculated from net input (sum input layer units they are connected to *connection weights) then passing through transfer function.

i.e. take value of each of the n input units connected to it and multiply by the connection weight between them. net input to hidden layer unit j

ii) output (activation) of hidden layer unit j i.e. take net input of j and pass it through sigmoid (s-shaped) transfer function

(C) Activations of hidden layer units used to find activation(s) of output units (net input*connection weights) and passing through transfer function. Net input to output unit k:

Output of output unit k:

Backward pass • (A) Difference between actual activation of each output and desired target (dk) found, and used to generate error signal for each output. Quantity called delta then calculated for all output units.

i) Error signal for each output is difference between its output ook and target dk

ii) Delta = error signal*output of that unit*(1 - its output).

Errors and Deltas • Error signals for hidden layer calculated by (sum of deltas of output units a hidden unit connects to)*(weight between hidden and output). Deltas for hidden layer then calculated.

Error signal for each hidden unit j:

ii) Delta term for hidden j = error signal*output*(1 - output)

WEDs • (C) Weight error derivatives (WEDs) for each weight between hidden and output calculated = (delta of each output*activation of hidden unit). WEDs used to change weights between hidden and output.

WED between hidden j and output k = (delta term of output k) *(activation of hidden j) (D) WEDs = (delta of each hidden)* (activation of input it connects to xi )(i.e. that input pattern xi). WEDs used to change weights between input and hidden layers.

Learning rate parameter n used to control amount weights are updated during each cycle. Weights at time (t + 1) between hidden and output layers set using weights at time t and WEDs between hidden and output layers. In a similar way the weights are changed between the input and hidden units

Two passes repeated • In this way, each unit in network receives error signal describing its contribution to the total error between output(s) and target(s). • The two passes are repeated many times for different input patterns and their targets, error between actual outputs and targets output is small for all members training patterns.

BACKPROPAGATION algorithm in DELPHI {assumes all values have been initialised to 0, except weights which have been randomised to values between +1 and ‑1,epsilon = learning rate parameter} { PROPAGATE FORWARDS } for i := 1 to num_hidden do begin hidden_net[i] := 0;{clear net for hidden unit} for j := 1 to num_input do {sum inputs to hidden unit} begin hidden_net[i] := hidden_net[i] + (input_act[j]*input_to_hidden_wt[j,i]); end; hidden_act[i] := 1/(1 + exp(‑1 * hidden_net[i])); {apply transfer function to get activation of hidden unit} end;

for i := 1 to num_output do begin output_net[i] := 0;{clear net for output unit} for j := 1 to num_hidden do {sum inputs to output unit} begin output_net[i] := output_net[i] + (hidden_act[j]*hidden_to_output_wt[j,i]); end; output_act[i] := 1/(1 + exp(‑1 * output_net[i])); {apply transfer function to get activation of output unit} end;

{ PROPAGATE BACKWARDS } for i := 1 to num_output do {initialise output error terms} begin output_error[i] := 0; end; for i := 1 to num_hidden do {initialise hidden error terms} begin hidden_error[i] := 0; end; for i := 1 to num_output do begin output_error[i] := target[i] ‑ output_act[i];{difference between output and target is error for output layer} output_delta[i] := output_error[i] * output_act[i] * (1 ‑ output_act[i]); {calculate delta for output layer} for j := 1 to num_hidden do {error for hidden layer} begin hidden_error[j] := hidden_error[j] + (output_delta[i] * hidden_to_output_wt[j,i]); end; end;

for i := 1 to num_hidden do begin hidden_delta[i] := hidden_error[i] * hidden_act[i] * (1 ‑ hidden_act[i]); {delta for hidden layer} end; end; for i := 1 to num_hidden do {calculate wed's from hidden to output} begin for j := 1 to num_output do begin hid_to_out_wed[i,j] := hid_to_out_wed[i,j] + (output_delta[j] * hidden_act[i]); end; end; for i := 1 to num_input do {calculate wed's from input to hidden} begin for j := 1 to num_hidden do begin in_to_hid_wed[i,j] := in_to_hid_wed[i,j] + (hidden_delta[j] * input_act[i]); end; end; end;

for i := 1 to num_output do {change weights from hidden to output} begin for j := 1 to num_hidden do begin hidden_to_output_wt[j,i] := hidden_to_output_wt[j,i] + (epsilon * hid_to_out_wed[j,i]); hid_to_out_wed[j,i] := 0;{clear wed} end; end; for i := 1 to num_hidden do {change weights from input to hidden} begin for j := 1 to num_input do begin input_to_hidden_wt[j,i] := input_to_hidden_wt[j,i] + (epsilon * in_to_hid_wed[j,i]); in_to_hid_wed[j,i] := 0;{clear wed} end; end;

Example: Processing Consumer Credit Applications • Partitioning of available data: Details for 5000 previous credit agreements, these could be split into a training set of 4000 and a test set of 1000 (randomly selected from the original 5000, and held in reserve to test the predictive accuracy of the network once it has been trained).

Credit example • Training inputs (See fig): Details from the applications such as age, salary and size of other financial commitments. • Target outputs: Two units signifying whether the applicant repaid the loan or not, or one output to indicate whether the applicant repaid the loan and another to indicate the time taken to repay the loan.

Credit example • Network trained by repeated presentation of training inputs and loan outcome, until error between output and target units acceptably small. Data for 1000 examples in test set then presented to measure predictive accuracy of system on novel examples. Could be used to process new loan applications and decide whether to provide credit to a person depending on their application details.

Backprop summary • Backpropagation is an example of supervised learning • Training inputs and their corresponding outputs are supplied to the network • The network calculates error signals, and uses these to adjust the weights • After many passes, the network settles to a low error on the training data • It is then tested on test data that it has not seen before, to measure its generalisation ability

BACKPROPAGATION: An Example of Supervised Learning