1 / 53

Artificial Neural Network

Artificial Neural Network. Yalong Li Some slides are from http ://www.cs.cmu.edu/~tom/10701_sp11/slides/NNets-701-3_24_2011_ann.pdf. Structure. Motivation Artificial neural networks Learning: Backpropagation Algorithm Overfitting Expressive Capabilities of ANNs Summary.

jorn
Download Presentation

Artificial Neural Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial Neural Network Yalong Li Some slides are from http://www.cs.cmu.edu/~tom/10701_sp11/slides/NNets-701-3_24_2011_ann.pdf

  2. Structure • Motivation • Artificial neural networks • Learning: Backpropagation Algorithm • Overfitting • Expressive Capabilities of ANNs • Summary

  3. Some facts about our brain • Performance tends to degrade gracefully under partial damage • Learn (reorganize itself) from experience • Recovery from damage is possible • Performs massively parallel computations extremely efficiently • For example, complex visual perception occurs within less than 100 ms, that is, 10 processing steps!(processing speed of synapses about 100hz) • Supports our intelligence and self-awareness

  4. Neural Networks in the Brain • Cortex, midbrain, brainstem and cerebellum • Visual System • 10 or 11 processing stages have been identified • feedforward • earlier processing stages (near the sensory input) to later ones (near the motor output) • feedback

  5. Neurons and Synapses • Basic computational unit in the nervous system is the nerve cell, or neuron.

  6. Synaptic Learning • One way brain learn is by altering the strengths of connections between neurons, and by adding or deleting connections between neurons • LTP(long-term potentiation) • Long-Term Potentiation: • An enduring (>1 hour) increase in synaptic efficacy that results from high-frequency stimulation of an afferent (input) pathway • The efficacy of a synapse can change as a result of experience, providing both memory and learning through long-term potentiation.One way this happens is through release of more neurotransmitter. • Hebbs Postulate: • "When an axon of cell A... excites[s] cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells so that A's efficiency as one of the cells firing B is increased.“ • Points to note about LTP: • Synapses become more or less important over time (plasticity) • LTP is based on experience • LTP is based only on local information (Hebb's postulate)

  7. Brain ? ?

  8. Structure • Motivation • Artificial neural networks • Backpropagation Algorithm • Overfitting • Expressive Capabilities of ANNs • Summary

  9. Artificial Neural Networks to learn f: X • f might be non-linear function • X (vector of) continuous and/or discrete vars • Y (vector of) continuous and/or discrete vars • Represent by network of logistic units • Each unit is a logistic function unit output • MLE: train weights of all units to minimize sum of squared errors of predicted network outputs • MAP: train to minimize sum of squared errors plus weight magnitudes

  10. Artificial Neural Networks to learn f: X f: x-> y • f(*) is: Nonlinear activation function, for classification Identity, for regression • depends on parameters and then to allow these parameters to be adjusted, along with the coefficients {wj} • Sigmoid function can be logistic or tanh

  11. Artificial Neural Networks to learn f: X aj: activations h(*): nonlinear function : activation function, determined by the nature of the data and the assumed distribution of target variables

  12. Artificial Neural Networks to learn f: X How to define • for standard regression, is the identity: yk = ak • for multiple binary classification, each output unit activation is transformed using a logistic sigmoid function, so that: yk =(ak) (a) = 1/(1+exp(-a)) • For multiclass problems, a softmax activation of the form:

  13. Artificial Neural Networks to learn f: X Why is that There is a natural choice of both output unit activation function and matching error function, according to the type of problem being solved. • Regression: linear outputs, Error = sum-of-squares error • (Multiple independent)binary classifications: logistic sigmoid outputs, Error = cross-entropy error function • Multiclass classification: softmax output, Error = multiclass cross-entropy error function Two classes ? Derivative of the error function with respect to the activation for a particular ouput take the form: A probablilistic interpretation to the network outputs is given in book PRML, M.Bishop.

  14. Multilayer Networks of Sigmoid Units

  15. Multilayer Networks of Sigmoid Units

  16. Connectionist Models Consider humans: • Neuron switching time ~.001 second • Number of neurons ~1010fs • Connections per neuron ~104-5 • Scene recognition time ~.1 second • 100 inference steps doesn’t seem like enough → Much parallel compution Properties of artificial neural nets(ANN’s): • Many neuron-like threshold switching units • Many weighted interconnections among units • Highly parallel, distributed process

  17. Structure • Motivation • Artificial neural networks • Learning: Backpropagation Algorithm • Overfitting • Expressive Capabilities of ANNs • Summary

  18. Backpropagation Algorithm • Looks for the minium of the error function in weight space using the method of gradient descent. • The combination of weights which minimizes the error function is considered to be a solution of the learning problem.

  19. Sigmoid unit

  20. Error Gradient for a Sigmoid Unit

  21. Gradient Descent

  22. Incremental(Stochastic) Gradient Descent

  23. Backpropagation Algorithm(MLE)

  24. Backpropagation Algorithm(MLE) Derivation of the BP rule: Error: Notation: Goal:

  25. Backpropagation Algorithm(MLE) For ouput unit j:

  26. Backpropagation Algorithm(MLE) For hidden unit j: -

  27. More on Backpropagation

  28. Structure • Motivation • Artificial neural networks • Learning: Backpropagation Algorithm • Overfitting • Expressive Capabilities of ANNs • Summary

  29. Overfitting in ANNs

  30. Dealing with Overfitting

  31. Dealing with Overfitting

  32. K-Fold Cross Validation

  33. Leave-Out-One Cross Validation

  34. Structure • Motivation • Artificial neural networks • Backpropagation Algorithm • Overfitting • Expressive Capabilities of ANNs • Summary

  35. Expressive Capabilities of ANNs • Single Layer: Preceptron • XOR problem • 8-3-8 problem

  36. Single Layer: Perceptron

  37. Single Layer: Perceptron • Representational Power of Perceptrons hyperplane decision surface in the n-dimensional space of instances wx = 0 • Linear separable sets • Logical: and, or, … • How to learn w ?

  38. Single Layer: Perceptron • Nonliear sets of examples?

  39. Multi-layer perceptron, XOR • k = y1 AND NOT y2 = (x1 OR x2) AND NOT (x1 AND X2) = x1 XOR x2 Boundary: x1 + x2 + 0.5 = 0 x1 +x2 – 1.5 = 0

  40. Multi-layer perceptron

  41. Expressive Capabilities of ANNs

  42. Leaning Hidden Layer Representations • 8-3-8 problem

  43. Leaning Hidden Layer Representations • 8-3-8 problem

  44. Leaning Hidden Layer Representations • 8-3-8 problem Auto Encoder?

  45. Training

  46. Training

  47. Training

  48. Neural Nets for Face Recognition

  49. Leaning Hidden Layer Representations

  50. Leaning Hidden Layer Representations

More Related