1 / 117

Chapter 2 Fundamental Neurocomputing Concepts

Chapter 2 Fundamental Neurocomputing Concepts. 國立雲林科技大學 資訊工程研究所 張傳育 (Chuan-Yu Chang ) 博士 Office: EB212 TEL: 05-5342601 ext. 4337 E-mail: chuanyu@yuntech.edu.tw HTTP://MIPL.yuntech.edu.tw. Basic Models of Artificial neurons.

moya
Download Presentation

Chapter 2 Fundamental Neurocomputing Concepts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 2 Fundamental Neurocomputing Concepts 國立雲林科技大學 資訊工程研究所 張傳育(Chuan-Yu Chang ) 博士 Office: EB212 TEL: 05-5342601 ext. 4337 E-mail: chuanyu@yuntech.edu.tw HTTP://MIPL.yuntech.edu.tw

  2. Basic Models of Artificial neurons • An artificial neuron can be referred to as a processing element, node, or a threshold logic unit. • There are four basic components of a neuron • A set of synapses with associated synaptic weights • A summing device, each input is multiplied by the associated synaptic weight and then summed. • A activation function, serves to limit the amplitude of the neuron output. • A threshold function, externally applied and lowers the cumulative input to the activation function.

  3. Basic Models of Artificial neurons

  4. Basic Models of Artificial neurons (2.2) (2.3) (2.4)

  5. Basic Models of Artificial neurons • The threshold (or bias) is incorporated into the synaptic weight vector wq for neuron q.

  6. Basic Models of Artificial neurons

  7. Basic Activation Functions • The activation function, transfer function, • Linear or nonlinear Linear (identity) activation function

  8. Basic Activation Functions • Hard limiter • Binary function, threshold function • (0,1) • The output of the binary hard limiter can be written as Hard limiter activation function

  9. Basic Activation Functions • Bipolar, symmetric hard limiter • (-1, 1) • The output of the symmetric hard limiter can be written as • Sometimes referred to as the signum (or sign) function. Symmetric limiter activation function

  10. Basic Activation Functions • Saturation linear function, piecewise linear function • The output of the saturation linear function is given by Saturation linear activation function

  11. Basic Activation Functions • Saturation linear function • The output of the symmetric saturation linear function is given by Saturation linear activation function

  12. Basic Activation Functions • Sigmoid function (S-shaped function) • Binary sigmoid function • The output of the binary sigmoid function is given by Binary sigmoid function Where a is the slope parameter of the binary sigmoid function Hard limiter has no derivative at the origin, the binary sigmoid is a continuous and differentiable function.

  13. Basic Activation Functions • The derivation of the binary sigmoid function for two different values of the slope parameter.

  14. Basic Activation Functions • Sigmoid function (S-shaped function) • Bipolar sigmoid function, hyperbolic tangent sigmoid • The output of the Binary sigmoid function is given by

  15. Basic Activation Functions • The effect of the threshold qqand bias bqcan be illustrated by observing the binary sigmoid activation function • Three plots of the binary sigmoid function (a=1) • Threshold=2 (qq=2) • Bias=2(bq=2) • Nominal case(qq=bq=0) • Applying a threshold is analogous to delaying a time-domain signal • Adding a bias is analogous to an advance of a signal.

  16. The Hopfield Model of the Artificial Neuron • Hopfield清楚的描述類神經網路理論與實務間的關係。 • The Hopfield neural network is an asynchronous parallel processing, fully interconnected. • Content-addressable memory (associative memory) • Retrieving a stored pattern in response to the presentation of a noisy or incomplete version of that pattern. • Discrete-time model of the Hopfield neuron

  17. The Hopfield Model of the Artificial Neuron • The unit delay z-1 delays the output of the activation function by one sample period to give yq(k).

  18. The Hopfield Model of the Artificial Neuron • Continuous-time model of the Hopfield artificial neuron • Tcq=RqCqis the integration time constant of the qth neuron, qq is an externally applied threshold. • The integrator can be realized with an OPA, capacitor Cq, and resistor Rq. • gq>0 is called the leakage factor of the integrator.

  19. Adaline and Madaline • Least-Mean-Square (LMS) Algorithm • Widrow-Hoff learning rule • Delta rule • The LMS is an adaptive algorithm that computes adjustments of the neuron synaptic weights. • The algorithm is based on the method of steepest decent. • It adjusts the neuron weights to minimize the mean square error between the inner product of the weight vector with the input vector and the desired output of the neuron. • Adaline (adaptive linear element) • A single neuron whose synaptic weights are updated according to the LMS algorithm. • Madaline (Multiple Adaline)

  20. Simple adaptive linear combiner x0=1, wo=b (bias) inputs

  21. Simple adaptive linear combiner • The difference between the desired response and the network response is • The MSE criterion can be written as • Expanding Eq(2.23) (2.22) (2.23) (2.24) (2.25)

  22. Simple adaptive linear combiner • Cross correlation vector between the desired response and the input patterns • Covariance matrix for the input pattern • J(w)的MSE表面有一個最小值(minimum) ,因此計算梯度等於零的權重值 • 因此,最佳的權重值為 (2.26) (2.27)

  23. The LMS Algorithm • 上式的兩個限制 • 求解covariance matrix的反矩陣很費時 • 不適合即時的修正權重,因為在大部分情況,covariance matrix和cross correlation vector無法事先知道。 • 為避開這些問題,Widow and Hoff提出了LMS algorithm • To obtain the optimal values of the synaptic weights when J(w) is minimum. • Search the error surface using a gradient descent method to find the minimum value. • We can reach the bottom of the error surface by changing the weights in the direction of the negative gradient of the surface.

  24. The LMS Algorithm Typical MSE surface of an adaptive linear combiner

  25. The LMS Algorithm • Because the gradient on the surface cannot be computed without knowledge of the input covariance matrix and the cross-correlation vector, these must be estimated during an iterative procedure. • Estimation of the MSE gradient surface can be obtained by taking the gradient of the instantaneous error surface. • The gradient of J(w) approximated as • The learning rule for updating the weights using the steepest descent gradients method as (2.28) (2.29) Learning ratespecifies the magnitude of the update step for the weights in the negative gradient direction.

  26. The LMS Algorithm • If the value of m is chosen to be too small, the learning algorithm will modify the weights slowly and a relatively large number of iterations will be required. • If the value of m is set too large, the learning rule can become numerically unstable leading to the weights not converging.

  27. The LMS Algorithm • The scalar form of the LMS algorithm can be written from (2.22) and (2.29) • 從(2.29)及(2.31)式,我們必須給learning rate設立一個上限,以維持網路的穩定性。(Haykin,1996) (2.30) (2.31) The largest eigenvalue of the input covariance matrix Cx

  28. The LMS Algorithm • 為使LMS收斂的最小容忍的穩定性,可接受的learning rate可限定在 • (2.33)式是一個近似的合理解法,因為 (2.33) (2.34)

  29. The LMS Algorithm • 從(2.32) 、(2.33)式知道,learning rate的決定,至少得計算輸入樣本的covariance matrix,在實際的應用上是很難達到的。 • 即使可以得到,這種固定learning rate在結果的精確度上是有問題的。 • 因此,Robbin’s and Monro’s root-finding algorithm提出了,隨時間變動learning rate的方法。(Stochastic approximation )where k is a very small constant. • 缺點:learning rate減低的速度太快。 (2.35)

  30. The LMS Algorithm • 理想的做法應該是在學習的過程中,learning rate m應該在訓練的開始時有較大的值,然後逐漸降低。(Schedule-type adjustment) • Darken and Moody • Search-then converge algorithm • Search phase: m is relatively large and almost constant. • Converge phase: m is decrease exponentially to zero. • m0 >0 and t>>1, typically 100<=t<=500 • These methods of adjusting the learning rate are commonly called learning rate schedules. (2.36)

  31. The LMS Algorithm • Adaptive normalization approach (non-schedule-type) • m is adjusted according to the input data every time step • where m0 is a fixed constant. • Stability is guaranteed if 0< m0 <2; the practical range is 0.1<= m0 <=1 (2.37)

  32. The LMS Algorithm m is a constant • Comparison of two learning rate schedules: stochastic approximation schedule and the search-then-converge schedule. Eq.(2.36) Eq.(2.35)

  33. Summary of the LMS algorithm • Step 1: set k=1, initialize the synaptic weight vector w(k=1), and select values for m0 and t. • Step 2: Compute the learning rate parameter • Step 3: Computer the error • Step 4: Update the synaptic weights • Step 5: If convergence is achieved, stop; else set k=k+1, then go to step 2.

  34. b x d Example 2.1: Parametric system identification • Input data consist of 1000 zero-mean Gaussian random vectors with three components. The bias is set to zero. The variance of the components of x are 5, 1, and 0.5. The assumed linear model is given by b=[1, 0.8, -1]T. • To generate the target values, the 1000 input vectors are used to form a matrix X=[x1x2…x1000], the desired outputs are computed according to d=bTX The learning process was terminated when The progress of the learning rate parameter as it is adjusted according to the search-then converge schedule.

  35. Example 2.1 (cont.) • Parametric system identification: estimating a parameter vector associated with a dynamic model of a system given only input/output data from the system. • The root mean square (RMS) value of the performance measure.

  36. Adaline and Madaline • Adaline • It is an adaptive pattern classification network trained by the LMS algorithm. 可調整的bias或 weight 產生bipolar (+1, -1)的輸出,可因activation function的不同,而有(0,1)的輸出 x0(k)=1

  37. Adaline • Linear error • The difference between the desired output and the output of the linear combiner. • Quantizer error • The difference between the desired output and the output of the symmetric hard limiter.

  38. Adaline • Adaline的訓練過程 • 輸入向量x必須和對應的desired輸出d,同時餵給Adaline。 • 神經鍵的權重值w,會根據linear LMS algorithm動態的調整。 • Adaline在訓練的過程,並沒有使用到activation function,(activation function只有在測試階段才會使用) • 一旦網路的權重經過適當的調整後,可用未經訓練的pattern來測試Adaline的反應。 • 如果Adaline的輸出和測試的輸入有很高的正確性時,可稱網路已經generalization。

  39. w0(k) Adaline • One common application of the Adaline is for the realization of a small class of logic functions: • Only those logic functions that are linearly separable can be realized by the Adaline. • AND

  40. w0(k) Adaline • OR

  41. Adaline • Majority

  42. Adaline • Linear separability • The Adaline acts as a classifier which separates all possible input patterns into two categories. • The output of the linear combiner is given as

  43. Adaline Linear separability of the Adaline Adaline只能分割 線性可分割的patten

  44. Adaline Nonlinear separation problem 若separating boundary非straight line, Adaline無法分割 Since the boundary is not a straight line, the Adaline cannot be used to accomplish this task.

  45. Adaline • Adaline with nonlinearly transformed inputs (polynomial discriminant function) • To solve the classification problem for patterns that are not linearly separable, the inputs to the Adaline can be preprocessed with fixed nonlinearities. (polynomial discriminant function) (2.45)

  46. Adaline • The critical thresholding condition for this Adaline with nonlinearly transformed inputs occurs when v(k) in (2.45) is set to zero. • Realizing a nonlinearly separable function (XNOR) • If the appropriate nonlinearities are chosen, the network can be trained to separate the input space into two subspaces which are not linearly separable.

  47. Adaline (cont.) • Linear error correction rules • 有兩種基本的線性修正規則,可用來動態調整網路的權重值。(網路權重的改變與網路實際輸出和desire輸出的差異有關) • m-LMS: same as (2.22) and (2.29) • a-LMS: a self-normalizing version of the m-LMS learning rule • a-LMS演算法是根據最小擾動原則(minimal-disturbance principle) ,當調整權重以適應新的pattern的同時,對於先前的pattern的反應,應該有最小的影響。 • m-LMS是基於最小化MSE表面。 • a-LMS則是更新權重使得目前的誤差降低。 (2.46)

  48. Adaline (cont.) • Consider the change in the error for a-LMS • From (2.47) • The choice of a controls stability and speed of convergence, is typically set in the range • a-LMS之所以稱為self-normalizing是因為a的選擇和網路的輸入大小無關, (2.47) (2.48)

  49. Adaline (cont.) • Detail comparison of the m-LMS and a-LMS • From (2.46) • Define normalized desired response and normalized training vector • Eq(2.49) can be rewrote as (2.49) (2.50-51) (2.52) 和m-LMS具有相同的型式,所以a-LMS表示正規化輸入樣本後的m-LMS 。

  50. Multiple Adaline (Madaline) • 單一個Adaline無法解決非線性分割區域的問題。 • 可使用多個adaline • Multiple adaline • Madaline • Madaline I:single-layer network with single output. • Madaline II:multi-layer network with multiple output.

More Related