1 / 36

Neural Networks

Neural Networks. Marco Loog. Previously in ‘Statistical Methods’. Agents can handle uncertainty by using the methods of probability and decision theory But first they must learn their probabilistic theories of the world from experience. Previously in ‘Statistical Methods’. Key Concepts :

anevay
Download Presentation

Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural Networks Marco Loog

  2. Previously in ‘Statistical Methods’... • Agents can handle uncertainty by using the methods of probability and decision theory • But first they must learn their probabilistic theories of the world from experience...

  3. Previously in ‘Statistical Methods’... • Key Concepts : • Data : evidence, i.e., instantiation of one or more random variables describing the domain • Hypotheses : probabilistic theories of how the domain works

  4. Previously in ‘Statistical Methods’... • Outline • Bayesian learning • Maximum a posteriori and maximum likelihood learning • Instance-based learning • Neural networks...

  5. Outline • Some slides from last week... • Network structure • Perceptrons • Multilayer Feed-Forward Neural Networks • Learning Networks?

  6. Neural Networks and Games

  7. Neural Networks and Games

  8. Neural Networks and Games

  9. Neural Networks and Games

  10. Neural Networks and Games

  11. Neural Networks and Games

  12. Neural Networks and Games

  13. So First... Neural Networks • According to Robert Hecht-Nielsen, a neural network is simply “a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs” Simply... • We skip the biology for now • And provide the bare basics

  14. Input units Hidden units Output units Network Structure

  15. Feed-forward networks Recurrent networks Feedback from output units to input Network Structure

  16. Feed-Forward Network • Feed-forward network = a parameterized family of nonlinear functions • g is activation function • Ws are weights to be adapted, i.e., the learning

  17. Activation Functions • Often have form of a step function [a threshold] or sigmoid • N.B. thresholding = ‘degenerated’ sigmoid

  18. Perceptrons • Single-layer neural network • Expressiveness • Perceptron with g = step function can learn AND, OR, NOT, majority, but not XOR

  19. Learning in Sigmoid Perceptrons • The idea is to adjust the weights so as to minimize some measure of error on the training set • Learning is optimization of the weights • This can be done using general optimization routines for continuous spaces

  20. Learning in Sigmoid Perceptrons • The idea is to adjust the weights so as to minimize some measure of error on the training set • Error measure most often used for NN is the sum of squared errors

  21. Learning in Sigmoid Perceptrons • Error measure most often used for NN is the sum of squared errors • Perform optimization search by gradient descent • Weight update rule [ is learning rate]

  22. Simple Comparison

  23. Some Remarks • [Thresholded] perceptron learning rule converges to a consistent function for any linearly separable data set • [Sigmoid] perceptron output can be interpreted as conditional probability • Also interpretation in terms of maximum likelihood [ML] estimation possible

  24. Network with hidden units Adding hidden layers enlarges the hypothesis space Most common : single hidden layer Multilayer Feed-Forward NN

  25. Expressiveness • 2-input perceptron • 2-input single-hidden-layer neural network [by ‘adding’ perceptron outputs]

  26. Expressiveness • With a single, sufficiently large, hidden layer it is possible to approximate any continuous function • With two layers, discontinuous functions can be approximated as well • For particular networks it is hard to say what exactly can be represented

  27. Learning in Multilayer NN • Back-propagation is used to perform weight updates in the network • Similar to perceptron learning • Major difference is that output error is clear, but how to measure the error at the nodes in the hidden layers? • Additionally, should deal with multiple outputs

  28. Learning in Multilayer NN • At output layer weight-update rule is the similar as for perceptron [but then for multiple outputs i]where • Idea of back-propagation : every hidden unit contributes some fraction to the error of the output node to which it connects

  29. Learning in Multilayer NN • [...] contributes some fraction to the error of the output node to which it connects • Thus errors are divided according to connection strength [or weights] • Update rule :

  30. E.g. • Training curve for 100 restaurant examples : exact fit

  31. Learning NN Structures? • How to find the best network structure? • Too big results in ‘lookup table’ behavior / overtraining • Too small in ‘undertraining’ / not exploiting the full expressiveness • Possibility : try different structures and validate using, for example, cross-validation • But which different structures to consider? • Start with fully connected network and remove nodes : optimal brain damage • Growing larger networks [from smaller ones], e.g. tiling and NEAT

  32. Learning NN Structures : Topic for later Lecture?

  33. Finally, Some Remarks • NN = possibly complex nonlinear function with many parameters that have to be tuned • Problems : slow convergence, local minima • Back-propagation explained, but other optimization schemes are possible • Perceptron can handle linear separable functions • Multilayer NN can represent any kind of function • Hard to come up with optimal network • Learning rate, initial weights, etc. have to be set • NN : not much magic there... “Keine Hekserei, nur Behändigkeit!”

  34. And with that Disappointing Message... • We take a break...

More Related