Outline

Outline • Concept of Learning • Basis of Artificial Neural Network • Neural Network with Supervised Learning (Single Layer Net) • Hebb • Perceptron • Modelling Simple Problem

CONCEPT OF LEARNING Nooraini Yusoff Computer Science Department Faculty of Information Technology Universiti Utara Malaysia

Learning and Weights..? Think… Can you identify which is “the Simpsons”.

Learning and Weights..?

Learning and Weights..? “the Simpsons” “the not”

Learning and Weights..? Data / Observation / Experience (past experience) new data…

Learning and Weights..? Hidden layer Input layer w1 Output layer v1 v2 w2 AGE HEART ATTACK GENDER SUGAR NO HEART ATTACK wk vj HYPERTENSION

Learning and Weights..? Nukleus   Activation Function: (y-in) = 1 if y-in >= and (y-in) = 0 yin = x1w1 + x2w2 Synapse w1 x1 y Axon x2 w2 Dendrite • - A neuron receives input, determines the strength or the weight of the input, calculates the total weighted input, and compares the total weighted with a value (threshold) • The value is in the range of 0 and 1 • If the total weighted input greater than or equal the threshold value, the neuron will • produce the output, and if the total weighted input less than the threshold value, no output will be produced

BASIS OF ARTIFICIAL NEURAL NETWORK Nooraini Yusoff Computer Science Department Faculty of Information Technology Universiti Utara Malaysia

Basis of Artificial Neural Network (Content) • Data Preparation • Activation Functions • Biases and Threshold • Weight initialization and update • Linear Separability

Basis of Artificial Neural Network Data Preparation • In NN, inputs and outputs are to be represented numerically. • Garbage-in garbage-out principle: flawed data used in developing a network would result in a flawed network. • Unsuitable representation affects learning and could eventually turn a NN project into a failure.

Basis of Artificial Neural Network • Why preprocess the data? • Main goal – to ensure that the statistical distribution of values for each net input and output is roughly uniform. • NN will not produce accurate forecasts with incomplete, noisy and inconsistent data. • Decisions made in this phase of development are critical to the performance of the network.

Basis of Artificial Neural Network Binary [0,1] -1 -0.5 0 0.5 1 Bipolar [-1,1] • Input & Output representation • Binary vs. Bipolar

Basis of Artificial Neural Network V1 V2 V3 V4 T 0.63 0.68 0.21 0.04 1 0 0 0.56 0.68 0.16 0.04 1 0 0 0.76 0.9 0.18 0.08 1 0 0 0.75 1 0.28 0.16 1 0 0 0.71 0.88 0.24 0.16 1 0 0 0.67 0.79 0.21 0.12 1 0 0 0.68 0.61 0.59 0.56 0 1 0 0.65 0.45 0.53 0.4 0 1 0 0.77 0.68 0.63 0.6 0 1 0 0.78 0.5 0.6 0.4 0 1 0 0.8 0.65 0.71 0.56 0 1 0 0.73 0.65 0.54 0.52 0 1 0 1 0.68 1 0.84 0 0 1 0.64 0.48 0.68 0.68 0 0 1 0.96 0.52 0.95 0.72 0 0 1 0.88 0.56 0.87 0.72 0 0 1 0.94 0.81 0.92 1 0 0 1 0.85 0.72 0.77 0.8 0 0 1 • Example Binary Representation 1 0

Basis of Artificial Neural Network V1 V2 V3 V4 T 0.63 0.68 0.21 0.04 1 -1 -1 0.56 0.68 0.16 0.04 1 -1 -1 0.76 0.9 0.18 0.08 1 -1 -1 0.75 1 0.28 0.16 1 -1 -1 0.71 0.88 0.24 0.16 1 -1 -1 0.67 0.79 0.21 0.12 1 -1 -1 0.68 0.61 0.59 0.56 -1 1 -1 0.65 0.45 0.53 0.4 -1 1 -1 0.77 0.68 0.63 0.6 -1 1 -1 0.78 0.5 0.6 0.4 -1 1 -1 0.8 0.65 0.71 0.56 -1 1 -1 0.73 0.65 0.54 0.52 -1 1 -1 1 0.68 1 0.84 -1 -1 1 0.64 0.48 0.68 0.68 -1 -1 1 0.96 0.52 0.95 0.72 -1 -1 1 0.88 0.56 0.87 0.72 -1 -1 1 0.94 0.81 0.92 1 -1 -1 1 0.85 0.72 0.77 0.8 -1 -1 1 • Example Bipolar Representation 0 -1

Basis of Artificial Neural Network Binary Sigmoid Identity Function Bipolar Sigmoid Binary Step Function Common Activation Function

Basis of Artificial Neural Network Bias and Thresholds • A bias acts exactly as a weight on a connections from a unit whose activation is always 1. • The weight of the bias is trainable just like any other weight. • Increasing the bias increases the net input to the unit. • If a bias is included, the activation function is typically taken to be:

Basis of Artificial Neural Network x1 v11 v12 v21 w11 x2 z1 y1 v22 w12 w21 v31 w22 v32 x3 z2 y2 v41 v42 x4 Input layer Hidden layer Output layer Bias 1 1 v01 w01 v02 w02

Basis of Artificial Neural Network å = net x w i i å i = + net b x w i i i • Why use bias?...to increase value of yin With bias Without bias What happened if all x are 0? Learning??? Yes There is A response Learning??? No There is NO response

Basis of Artificial Neural Network Weight Initialization and Update • You may set the initial weights with any values. • The choice of initial weights will influence whether the net reaches a global (or only a local) minimum of the error and how quickly it converges. • Important to avoid choices of initial weights that would make it likely that either activations or its derivatives are zero. • Must not be too large • the initial input signals to each hidden or output unit will be likely to fall in the region where the derivative of the sigmoid func. has a very small value – saturation region.

Basis of Artificial Neural Network • Must not be too small • The net input to a hidden or output unit will be close to zero – cause extremely slow learning. • Methods Generating Random Weight Nguyen-Widrow Weights Weights

Basis of Artificial Neural Network • Random Initialization • A common procedure is to initialize the weights (and biases) to random values between any suitable interval. • Such as –0.5 and 0.5 or –1 and 1. • The values may be +ve or –ve because the final weights after training may be of either sign also.

Basis of Artificial Neural Network nnumber of input units pnumber of hidden units scale factor: • Nguyen-Widrow Initialization • For each hidden unit (j = 1, …, p): • Initialize its weight vector (from the input units): • vij(old) = random num. between –0.5 and 0.5 • Compute || vj(old) ||

Basis of Artificial Neural Network • Reinitialize weights: • Set bias:

Basis of Artificial Neural Network + positive region - negative region Linear Separability • The problem is “linear separable” • If there are weights (and a bias) so that all of the training input vectors for which the correct response is +1 lie on one side of the decision boundary and all of the training input vectors for which the correct response is –1 lie on the other side of the decision boundary.

Basis of Artificial Neural Network • Problem: AND, OR, XOR 1 0 1 1 0 1 0 0 1 0 1 0 AND OR XOR

NEURAL NETWORK WITH SUPERVISED LEARNING(Single Layer Net) Nooraini Yusoff Computer Science Department Faculty of Information Technology Universiti Utara Malaysia

Modelling a Simple Problem • Should I attend this lecture? • x1 = weather ( hot or raining) • x2 = day (weekday or weekend) 1 ? y ? x1 ? x2

Example: 2 input AND (bipolar) binary bipolar

Hebb’s Rule • 1949. Increase the weight between two neurons that are both “on”. • 1988. Increase the weight between two neurons that are both “off”. • wi(new) = wi(old) + xi*y

Hebb’s Algorithm • Set initial weights: wi = 0 for 0 <= i <= n 2. for each training vector 3. set xi = si for all input units 4. set y = t 5. wi(new) = wi(old) + xi*y

Training Procedure Initial weights: w0 = 0,w1 = 0, w2 = 0

Result Interpretation • -2 + 2x1 + 2x2 = 0 OR • x2 = -x1 + 1 • This training procedure is order dependent and not guaranteed.

Perceptrons (1958) • Very important early neural network • Guaranteed training procedure under certain circumstances 1 w0 y w1 x1 wn xn

Activation Function • (yin) = 1 if yin > q(yin) = 0 if - q <= yin <= q (yin) = -1 otherwise

Learning Rule • wi(new) = wi(old) + a*t*xi if error • a is the learning rate • Typically, 0 < a <= 1

Perceptron’s Algorithm 1.Set initial weights: wi = 0 for 0 <= i <= n (can be random) 2. for each training exemplar do 3. xi = si 4. yin =  xi*wi 5. y = f(yin) 6. wi(new) = wi(old) + a *t*xiif error 7. if stopping condition not reached, go to 2 f(yin) = 1 if yin >  f(yin) = 0 if -  <= yin <=  f(yin) = -1 otherwise

Example: AND concept • bipolar inputs • bipolar target • q = 0 • a = 1

Training Procedure - Epoch 1 Initial weights: w0 = 0,w1 = 0, w2 = 0

Exercise • Continue the above example until the learning is finished.

Training Procedure - Epoch 2

Perceptron Learning Rule Convergence Theorem • If a weight vector exists that correctly classifies all of the training examples, then the perceptron learning rule will converge to some weight vector that gives the correct response for all training patterns. This will happen in a finite number of steps.

Comparison between Hebb and Perceptron y = f(yin) y = f(yin)

Exercise Prepare a Perceptron learning table for epoch 1 and epoch 2 for problem of AND logic with bias using following learning requirements: - bipolar input and target - learning rate, () = 1 - threshold, () = 0.2.

Single Layer Net(Problem Analysis) Nooraini Yusoff Universiti Utara Malaysia

Perceptron x0 1 w0   w1 x1 y wn xn

Problem Description : To predict whether an application of a student to stay in college is accepted, KIV or rejected Problem: Classification

Original Data

Original Data (After selected)

Representation of Data Male ( 1 ) Gender Female ( -1 ) ≥ 3.00 ( 1 ) CGPA  3.00 ( -1 ) Accepted ( 1 ) Data for learning KIV ( 0 ) Result Rejected ( -1 )

Outline

Outline

Presentation Transcript

Outline

Outline

Outline

Outline

Outline

OUTLINE

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

OUTLINE