1 / 40

Artificial Intelligence

Artificial Intelligence. Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs). Artificial neural networks. The brain is a pretty intelligent system. Can we ”copy” it? There are approx. 10 11 neurons in the brain.

zubin
Download Presentation

Artificial Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

  2. Artificial neural networks The brain is a pretty intelligent system. Can we ”copy” it? There are approx. 1011 neurons in the brain. There are approx. 23109 neurons in the male cortex (females have about 15% less).

  3. The simple model • The McCulloch-Pitts model (1943) w2 w1 w3 y = g(w0+w1x1+w2x2+w3x3) Image from Neuroscience: Exploring the brain by Bear, Connors, and Paradiso

  4. Firing rate  Depolarization  Neuron firing rate

  5. Transfer functions g(z) The logistic function The Heaviside function

  6. The simple perceptron With {-1,+1} representation Traditionally (early 60:s) trained with Perceptron learning.

  7. Perceptron learning Repeat until no errors are made anymore • Pick a random example [x(n),f(n)] • If the classification is correct, i.e. if y(x(n)) = f(n) , then do nothing • If the classification is wrong, then do the following update to the parameters (h, the learning rate, is a small positive number) Desired output

  8. Example: Perceptron learning x2 x1 The AND function Initial values: h = 0.3

  9. Example: Perceptron learning x2 This one is correctlyclassified, no action. x1 The AND function

  10. Example: Perceptron learning x2 This one is incorrectlyclassified, learning action. x1 The AND function

  11. Example: Perceptron learning x2 This one is incorrectlyclassified, learning action. x1 The AND function

  12. Example: Perceptron learning x2 This one is correctlyclassified, no action. x1 The AND function

  13. Example: Perceptron learning x2 This one is incorrectlyclassified, learning action. x1 The AND function

  14. Example: Perceptron learning x2 This one is incorrectlyclassified, learning action. x1 The AND function

  15. Example: Perceptron learning x2 x1 The AND function Final solution

  16. Perceptron learning • Perceptron learning is guaranteed to find a solution in finite time, if a solution exists. • Perceptron learning cannot be generalized to more complex networks. • Better to use gradient descent – based on formulating an error and differentiable functions

  17. Gradient search The “learning rate” (h) is set heuristically E(W) “Go downhill” W W(k) W(k+1) = W(k) + DW(k)

  18. The Multilayer Perceptron (MLP) • Combine several single layer perceptrons. • Each single layer perceptron uses a sigmoid function (C)E.g. input output Can be trained using gradient descent

  19. Example: One hidden layer • Can approximate any continuous function q(z) = sigmoid or linear, f(z) = sigmoid.

  20. Training: Backpropagation(Gradient descent)

  21. Support vector machines

  22. Linear classifier on a linearly separable problem There are infinitely manylines that have zero trainingerror. Which line should we choose?

  23. Linear classifier on a linearly separable problem There are infinitely manylines that have zero trainingerror. Which line should we choose?  Choose the line with thelargest margin. The “large margin classifier” margin

  24. Linear classifier on a linearly separable problem There are infinitely manylines that have zero trainingerror. Which line should we choose?  Choose the line with thelargest margin. The “large margin classifier” margin ”Support vectors”

  25. Computing the margin The plane separating and is defined by The dashed planes are given by w margin

  26. Computing the margin Divide by b Define new w = w/b anda = a/b w margin We have defined a scalefor w and b

  27. Computing the margin We have which gives x + lw lw x margin

  28. Linear classifier on a linearly separable problem Maximizing the margin isequal to minimizing ||w|| subject to the constraints wTx(n) – a +1 for all wTx(n) – a -1 for all w Quadratic programming problem, constraints can be included with Lagrange multipliers.

  29. Quadratic programming problem Minimize cost (Lagrangian) Minimum of Lp occurs at the maximum of (the Wolfe dual) Only scalar productin cost. IMPORTANT!

  30. Linear Support Vector Machine Test phase, the predicted output Where a is determined e.g. by looking at one of the support vectors. Still only scalar products in the expression.

  31. How deal with nonlinear case? • Project data into high-dimensional space Z. There we know that it will be linearly separable (due to VC dimension of linear classifier). • We don’t even have to know the projection...!

  32. Scalar product kernel trick If we can find kernel such that Then we don’t even have to know the mapping to solve the problem...

  33. Valid kernels (Mercer’s theorem) Define the matrix If K is symmetric, K = KT, and positive semi-definite, thenK[x(i),x(j)] is a valid kernel.

  34. Examples of kernels First, Gaussian kernel. Second, polynomial kernel. With d=1 we have linear SVM. Linear SVM often used with good success on highdimensional data (e.g. text classification).

  35. Example: Robot color vision(Competition 1999) Classify the Lego pieces into red, blue, and yellow. Classify white balls, black sideboard, and green carpet.

  36. What the camera sees (RGB space) Yellow Red Green

  37. Mapping RGB (3D) to rgb (2D)

  38. Lego in normalized rgb space x2 Output is 6D: {red, blue, yellow, green, black, white} x1 Input is 2D

  39. MLP classifier E_train = 0.21% E_test = 0.24% 2-3-1 MLP Levenberg- Marquardt Training time (150 epochs): 51 seconds

  40. SVM classifier E_train = 0.19% E_test = 0.20% SVM with g = 1000 Training time: 22 seconds

More Related