Neural Networks - PowerPoint PPT Presentation

adamdaniel
neural networks l.
Skip this Video
Loading SlideShow in 5 Seconds..
Neural Networks PowerPoint Presentation
Download Presentation
Neural Networks

play fullscreen
1 / 62
Download Presentation
Neural Networks
363 Views
Download Presentation

Neural Networks

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Neural Networks Mundhenk and Itti, 2008 CS460, Fall 2008, L. Itti

  2. What are neural networks • Neural Networks are diverse and do many things • Some are meant to solve AI problems such as classification. • Some are meant to simulate the workings of the brain and the nervous system for biological research. CS460, Fall 2008, L. Itti

  3. What are neural networks • Neural networks use many nodes (neurons) connected by edges (axons, dendrites) to transform an input into a desired output • The neurons and their edges are adjusted, frequently using gradual changes to train the neural network. • Neural networks can be trained many ways • Actor / Critic Learning • external reinforcement • Hebbian Learning (association) • Internal reinforcement • Linear Algebra (closed form solution) • We tend to use gradient descent like learning to make incremental changes over time. • We can introduce Boltzmann like mechanisms in some networks CS460, Fall 2008, L. Itti

  4. Examples of neural networks (we won’t cover but are important) • Classical McCulloch-Pitts model (1943) • The first neural networks devised. • Very basic binary neuron model designed to test early feasibility of neural computation. • Hopfield Networks (1982) • Is an associative network • Can be used to solve TSP (but not very well) • Kohonen Networks (1982) • Classify things based on similarity • Needs a metric for the properties of things in order to say how similar they are. CS460, Fall 2008, L. Itti

  5. Single Layer Perceptron Accountants Engineers Good at Math Hair Dressers Likes Star Trek CS460, Fall 2008, L. Itti

  6. Single Layer Perceprton Accountants Engineers Good at Math Rj Ri Hair Dressers Rk Likes Star Trek CS460, Fall 2008, L. Itti

  7. Single Layer Perceptron Engineer/Hair Dresser/Accountant? y1 yc … wki … x0 x1 xd Likes Star Trek Good at Math CS460, Fall 2008, L. Itti

  8. Single Layer Perceptron Training involves minimizing the error in the network. As such, we seek to minimize the difference between the expected output and what actually comes out of the network. We do this by changing the weights in the network in a logical fashion. Target Variable (output) Vector of Activation (input) Some Learning Rate Update by gradient descent (Perceptron Criterion) CS460, Fall 2008, L. Itti

  9. Notes about Single Layer Perceptrons • Classification problem must be linearly separable (more on that next slide) • Can be solved using a closed form linear algebra solution, no need to actually train. • Are simple to create and use, but are limited in power. CS460, Fall 2008, L. Itti

  10. Linear Separability • Sometimes, some areas cannot be separated by a single line • Can we draw a single line that divides the state Ohionois from Indiana? • Single layer perceptrons fail at this since they draw single lines of boundaries. CS460, Fall 2008, L. Itti

  11. Single Layer Perceptron Accountants College Students Engineers Good at Math No longer linearly Separable! Likes Star Trek CS460, Fall 2008, L. Itti

  12. Solutions: Multi Layer Perceptron(Back Propagation Neural Network in practice) Accountants College Students Engineers Good at Math Likes Star Trek CS460, Fall 2008, L. Itti

  13. Multi Layer Perceptron y1 yc Sigmoid S S wkj Weight Sigmoid S S S z0 z1 zm wij Weight input Note: the bias has been absorbed into the computation weights. x0 x1 xd Good At Math Likes Star Trek CS460, Fall 2008, L. Itti

  14. Multi Layer Perceptron • The importance of adding the sigmoid after each node has additional importance for a multi layer perceptron • A multi layer perceptron is the product of two linear systems, the product of linear systems is a linear system, thus without a sigmoid non-linearity, adding a second layer buys us nothing. • A non linearity is the icing on the multi-layer cake that is the multi-layer perceptron. • The sigmoid on the final output layer is optional depending on the application CS460, Fall 2008, L. Itti

  15. Multi Layer Perceptron • Main Problem: • Assignment of credit in the “hidden” layers • How do you find the weights for the hidden layers when the output is transformed by the next layer. ? ? ? ? CS460, Fall 2008, L. Itti

  16. Assignment of Credit • Example: • A Mad scientist wants to make billions of dollars by controlling the stock market. He will do this by controlling the stock purchases of several wealthy people. The scientist controls information that can be given by insider CEO’s to brokers who can forward this information to wealthy people. He also has a device to control how much different people can trust each other. • Using his ability to input insider information and control trust between people, he will control the purchases by wealthy individuals and manipulate the stock market. CS460, Fall 2008, L. Itti

  17. Planted Insider Information Fat Cat CEO’s Information Weighted by trust Brokers Information Weighted by trust Rich Dudes Purchases CS460, Fall 2008, L. Itti

  18. Idea • Isolate information pathways that produce the purchases you desire, do nothing. Isolate pathways that produce purchases you do not desire, alter trust. • Carefully adjust trust over several attempts until purchases are what you desire (gradient descent). • Remember: adjustments are (usually) proportional to the sum of squared error differences. • Adjust weights proportional for what you got out of the network based upon what you expected. • Note: There are other methods for determining error. CS460, Fall 2008, L. Itti

  19. Information Weighted by trust Weaken Purchases S S S Sigmoid 65% Correct 25% Correct 95% Correct Weight Change Do Some Do Much Do Not Much CS460, Fall 2008, L. Itti

  20. High error contributor Low error contributor Information Weighted by trust Purchases S S S Sigmoid 65% Correct 25% Correct 95% Correct CS460, Fall 2008, L. Itti

  21. Backward Propagation Planted Insider Information Weaken Do Some Do Some Do Much Do Not Much CS460, Fall 2008, L. Itti

  22. Single Update Simple Case: The change in weights are done by subtracting the error rate from weight modified by a learning rate and the input variable. or Batch Error Learning Rate Layer One Layer Two Error Term Derivative of Activation Function (Sigmoid) CS460, Fall 2008, L. Itti

  23. NSL Matlab Back Propagation Model CS460, Fall 2008, L. Itti

  24. Pretty Outputs Layer 1 Layer 2 CS460, Fall 2008, L. Itti

  25. Double Layer Perceptron • Any given decision boundary can be approximated arbitrarily closely by a two-layer network having sigmoidal activation functions. • May use logistic sigmoid or tanh sigmoid. • May be thought of as a set of linear functions with a unknown basis functions connected to it. (Thus, if you know the basis functions, you don’t need a double layer perceptron) • Is frequently considered the second best solution to many problems. Thus its very flexible, but not necessarily optimal. CS460, Fall 2008, L. Itti

  26. Multi Layer Perceptron ( Neural Networks for Pattern Recognition, Christopher M. Bishop, 1995, Oxford University Press, Oxford ) CS460, Fall 2008, L. Itti

  27. ( Neural Networks for Pattern Recognition, Christopher M. Bishop, 1995, Oxford University Press, Oxford ) CS460, Fall 2008, L. Itti

  28. What about biological models? • Sometimes we create neural networks to simulate a brain (human or animal) • These models need to be biologically feasible so that we can draw conclusions from them about the working of a natural neural mechanism • Are frequently complex and computationally expensive, but not always. • Sometimes have direct applications. • Biological models sometimes reverse engineer processes in the human brain which we don’t know how to do. CS460, Fall 2008, L. Itti

  29. Example – Prey Selection • This is the Didday / Amari-Arbib model of prey selection. • How do animals such as dragon flies know how to select and snap at prey? • An insect may contrast against a background such a sky. • The activity on the retina should be maximum, but how does it stick out? CS460, Fall 2008, L. Itti

  30. Amari-Arbib Winner-Take-All Model (MaxSelector) This is the model with global inhibition (TMB2 Sec. 3.4) that restructures the Didday model (TMB2 Sec. 3.3) which has a whole layer of inhibitory neurons CS460, Fall 2008, L. Itti

  31. The Two NSL Modules of the Maximum Selector Model MaxSelector module (parent module) with two interconnected modules Ulayer and Vlayer (child modules). CS460, Fall 2008, L. Itti

  32. Rate of water flow Spigot Visualizing theMaxSelector Leaky Beaker Spigot Scale Spigot Control Weight in scale controls spigot flow. Is faster for more weight Trap Door Weight CS460, Fall 2008, L. Itti

  33. Trap door remains open unless flow can get back above threshold Visualizing the MaxSelector Below a certain threshold, trap door opens Inflow and outflow balance at 1 CS460, Fall 2008, L. Itti

  34. Ulayer Max Selector Leak Const Time const Const Vlayer Step Function CS460, Fall 2008, L. Itti

  35. Dominey,Arbib Joseph Model CS460, Fall 2008, L. Itti

  36. Review… • Takes in visual features • Decides where to saccade (move eyes) to, given input features • Can learn sequences of eye movements • Uses dopamine (A neurotransmitter) modulated reinforcement learning CS460, Fall 2008, L. Itti

  37. A little bio review • The brain contains many neurotransmitters. Today we are interested in: • Dopamine – Related to reinforcement learning and reward • GABA – Is a general purpose neural inhibitor • Other related neurotransmitters: Acetylcholine and Norepinephrine may also be related to reinforcement learning but will not be covered. CS460, Fall 2008, L. Itti

  38. Dopamine fun-facts • Is related to reinforcement learning • Is related to reward • Is particularly present in SNc and Striatum • Connects strongly with GABAergic interneurons and may provide indirect inhibition via its connection to GABAergic interneurons. • Is implicated in Parkinson’s disease and schizophrenia • Dopamine agonists include amphetamines as well as the precursor DOPA used to treat Parkinson's disease. • Dopamine antagonists include “typical” neuroleptics used to treat schizophrenia (e.g. Thorazine, Haldol) CS460, Fall 2008, L. Itti

  39. GABA fun-facts • Is short for g-aminobutyric acid • Is related to inhibition of neural activity • GABAergic interneurons provide most of the brains inhibition • Act as brakes and gates in the brain • Is implicated in anxiety and perhaps epilepsy • GABA agonists include Benzodiazipines used to treat anxiety (e.g. Valium) and epilepsy CS460, Fall 2008, L. Itti

  40. A little bio review cont’ • We will be interested in few major parts of the brain (for instance) • Basal Ganglia (BG) – Plays major role in critic and reinforcement • Prefrontal Cortex (PFC) – Is associated with working memory and task related storage • Inferotemporal Cortex (IT) – Is related to the “what” and feature understanding • Posterior Parietal Cortex (PP) – Is related to the “How” and feature location CS460, Fall 2008, L. Itti

  41. Where are these Things???? BG Caudate Thalamus PP Superior Colliculus SNr/SNc PFC IT CS460, Fall 2008, L. Itti

  42. What this model should do • Learn proper motor reaction to some perceived stimulus. • In this case a visual feature tells us where we should saccade to • Learn a sequence of motor reactions • We augment the model such that it not only saccades to the correct location, but can repeat sequences of saccades • Utilize reinforcement learning to build meaningful neural connections to create correct saccades • We will reinforce connections from IT to learn correct saccades • We will reinforce reciprocal connections with PFC to learn sequences CS460, Fall 2008, L. Itti

  43. Why this is cool • Demonstrates how we could learn complex sequences such as eye movements, speech or grasping movements. • Take note on how this model may be generalized! • Shows how dopamine may work in the brain and gives us clues to its workings including its involvement in several disease processes. CS460, Fall 2008, L. Itti

  44. Model Generalization: Scenario • You know that monkeys make better stock purchases than brokers so you want to create a machine which uses monkeys to make stock purchases for you. You get a bunch of monkeys which will pull on a lever when they see stock x’s P/E graph in the wall street journal. The monkeys will always pull the same, but you don’t know how much they will pull. This will cause an action of how much of stock x to purchase or sell. CS460, Fall 2008, L. Itti

  45. Lever Pulling Monkeys See bar graph Stock X’s P/E CS460, Fall 2008, L. Itti

  46. Lever Pulling Monkeys See bar graph Stock X’s P/E Lever causes a current to run down some wires Increases/Decreases peanut smell - Hamsters run faster/slower Increases/Decreases flow of water, makes buckets heavers/lighter CS460, Fall 2008, L. Itti

  47. Lever Pulling Monkeys See bar graph Stock X’s P/E Lever causes a current to run down some wires Increases/Decreases peanut smell - Hamsters run faster/slower Increases/Decreases flow of water, makes buckets heavers/lighter Tips scale SELL X BUY X CS460, Fall 2008, L. Itti Tells How much to buy or sell

  48. Lever Pulling Monkeys See bar graph Stock X’s P/E Lever causes a current to run down some wires Increases/Decreases peanut smell - Hamsters run faster/slower Punishment/Reward Decrease connections active During wrong choice, increase Less active ones Increases/Decreases flow of water, makes buckets heavers/lighter Was this what you wanted? Tips scale SELL X BUY X CS460, Fall 2008, L. Itti Tells How much to buy or sell

  49. IT V4 Weights Caudate SNc/Striatum SNr FEF Superior Colliculus Left Right CS460, Fall 2008, L. Itti Where to Saccade to? Note: This is way over simplified!

  50. Model Working Overview (Single Saccade) • Abstract visual features come in through IT. • Signal is sent to caudate and cause a random saccade. • PP/FEF notes how for off saccade is and signals to create dopamine reinforcement. Weights from IT are adjusted according to reinforcement signal. • We use hebbian and anti-hebbian rules to update weights • The next saccade it less random and causes a gradient in reinforcement. • IT connection weights to caudate are adjusted until the saccade goes to where it is supposed to. • The only learning at this stage is on the IT to caudate weights. CS460, Fall 2008, L. Itti