1 / 24

Advancing Motivated Learning with Goal Creation

This paper explores enhancements to motivated learning through the creation of abstract goals and motivations. It discusses bias calculation, probabilistic goal selection, and determining desired resource levels. A comparison to reinforcement learning algorithms is also presented.

Download Presentation

Advancing Motivated Learning with Goal Creation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advancing Motivated Learning with Goal Creation James Graham1, Janusz A. Starzyk1,2, Zhen Ni3 and Haibo He3 1School of Electrical Engineering and Computer Science Ohio University, Athens, OH, USA 2University of Information Technology and Management Rzeszow, Poland 3Electrical, Computer, and Biomedical Engineering University of Rhode Island, Kingston, RI, USA

  2. Overview • Introduction • Enhancements to Motivated Learning • Bias calculation • Use of desirability and availability • Probabilistic goal selection • Desired Resource Levels • Resource level as an optimization problem • Resource dependencies • Desirability calulations • Comparison to RL algorithms • Conclusions

  3. Motivated Learning • Controlled by underlying “primitive” motivations • Builds on motivations to create additional “abstract” motivations Motivation Hierarchy • Unlike in RL, focus is not on maximizing externally set rewards, but on intrinsic rewards and creating mission related new goals and motivations. Intrinsic Intrinsic Intrinsic Extrinsic

  4. Improvements to ML • Bias/Pain calculations • Resource availability • Learning to select actions • Probabilistic goal selection • Determining desired resource levels

  5. Significance of bias signals • Initially we only have primitive needs (no biases) • Bias is a foundation for the creation of new needs0 • Bias is a preference for or aversion to something (resource or action) • Bias results from an existing need being helped or hurt by a resource or action • Level of bias is measured related to the availability of a resource or likelihood of an action

  6. Bias based on availability and desirability • Availability based bias Rd is a desired resource value (at a sensory input si) Rc is a current resource value A is the Availability calculation dc is the current distance to another agentdd is a desired (a comfortable) distance to another agent

  7. Bias based on availability and desirability • Bias for a desired resource • Bias for a desired action • Bias for an undesired action • Bias for an undesired resource

  8. Probabilistic goal selection • Uses normalized wPG weights to select actions based on probability. • However, previous wPG calculation could lead to weight saturation at αg, so we used the following: • This causes the weights to saturate at (3/π)atan(ds/dŝ) • (ds/dŝ) measures how useful action is at restoring resource

  9. Probabilistic goal selection • WPG weights • Weights will saturate as determined by (ds/dŝ ) tend toward zero

  10. Probabilistic goal selection • Here we show how wbp weights are affected by the different goal selction approaches. Without probabilistic selection With probabilistic selection

  11. Determining desired resource levels • Desired values should be set according to the agent’s needs. • To begin, the agent is given the initial “primitive” resource level, Rdp. • The agent must learn the rate at which “desired” resources are used (∆p). • The agent can use its knowledge of the environment to set the desired resource levels. • Resource levels are established only for resources that the agent cares about. • The frequency of performing tasks cannot be too great as the agent’s time is limited. The agent also needs to “learn”.

  12. Determining desired resource levels • To establish the optimum level of desired resources we solve the optimization problem • subject to constraints • and sum of all frequencies is less than 1 • where the restoration frequency is

  13. Determining desired resource levels - example • The agent starts with levels for multiple resource set to the initially observed environment state. • As it learns to use specific resources it adjusts the levels at which it wants to maintain said resources. • Each resource equilibrates to a different level

  14. Reinforcement Learning • Reinforcement learning maximizes external reward • Learns approximating value functions • Usually a single function • May include “subgoal” generation and “curiosity” • Primarily reactive • Objectives are set by the designer

  15. Motivated Learning • Controlled by underlying motivations • Uses existing motivations to create additional “abstract” motivations • ML focus is not on maximizing externally set objectives (as is RL), but on learning new motivations, and building and supporting its internal reward system • Minimax – minimize pain • Primarily deliberative

  16. Comparison to other RL algorithms • Algorithms tested: • Q-learning • SARSA • Hierarchical RL – MAXQ • Neural Fitted Q Iteration (NFQ) • TD-FALCON

  17. Comparison to other RL algorithms – test environment • Testing environment is a simplified version of what we use in NeoAxis. • In NeoAxis we have pains, tasks, triggering pains, and (maybe) NACs. • Comparison test is a “Black Box” that has no NACs and is run as a simplified environment making RL algorithms more compatible and easier to interface.

  18. Comparison to other RL algorithms - results • Algorithms tested: Q-learning, SARASA, HRL, ML ML HRL, Q-Learning, SARSA NFQ TD-Falcon

  19. NFQ Results • Note highlighted lines and see both when the occur and their general profile

  20. Conclusion • Designed and implemented several enhancements to the Motivated Learning architecture • Bias calculations • Goal Selection • Setting desired resource levels • Compared ML to several RL algorithms using a basic test environment and simple reward scenario. • ML achieved higher average reward faster than other algorithms tested

  21. Questions?

  22. Bias signal calculation for resources • For resource related pain Rd is a desired resource value (at a sensory input si) Rc is a current resource value ε is a small positive number γ regulates how quickly pain increases δr=1when the resource is desired, δr=-1when it is not; δr=0otherwise

  23. Learning and selecting actions • Goals are selected based on pain-goal weights: • δpindicates how the associated pain changed • ∆a, outside of μgensures the weights stay below the ceiling of αg=1 • μg determines the rate of chance

  24. Comparing Reinforcement Learning to Motivated Learning • Compare ML and RL

More Related