IJCNN, International Joint Conference on Neural Networks , San Jose 2011. Motivated Learning i n Autonomous Systems. Pawel Raif Silesian University of Technology, Poland, Janusz A. Starzyk Ohio University, USA,. Outline. Reinforcement Learning (RL)
Motivated LearninginAutonomous Systems
Silesian University of Technology, Poland,
Janusz A. Starzyk
Ohio University, USA,
PROBLEMS IN „REAL WORLD” APPLICATIONS like in AUTONOMOUS SYSTEMS
„curse of dimensionality”
lack of motivation for development
ML can combine internal goal creation system (GCS)
and reinforcement learning (RL).
How to motivate a machine?
We suggest that the hostility of the environment,
is the most effective motivational factor.
An intelligent agent learnshow
to survive in a hostile environment.
1. ML agent is independent: it can act autonomously in its environment and is able to choose its own way of development.
2. ML agent’sinterface tothe environment is the same as RL agent’s.
3. Environment is hostile to the agent.
4. Hostility may be active or passive (depleted resources).
5. Environment is fully observable.
by solving a lower level pain
Hierarchy of resources(and possible agent’s goals):
Resources are distributed all over the „grid world”.
The most abstract
The least abstract
Agent must localize resources and learn how to utilize them
This environment is:
Resources present in the environment
can be used to satisfy the agent’s needs
Resources are distributed all over the„grid world”.
Perception of resources
Internal need signals
By discovering useful resources and their dependencies,
learned hierarchy of internal goals expresses the environment complexity.
Subjective sense of „lack of resources”
Relationships between internal goals doesn’t have to be a linear hierarchy.
They may constitute a tree structure or a complex network of resource dependencies.
Top level resources
By discovering subsequent resources and their dependencies, the complexity of internal goal network grows.
BUT each system may have unique experiences
(reflecting personal history of development)
Designer’s specified needs
Every resource discovered by the agentbecomes a potential goal and is assigned avalue function „level”.
Goal Creation System establishes new goals and switches agent’s activity between them.
RL algorithm learns value functions on different levels.
at the beginning …
Initially the agent uses many iterations to reach a goal (red dots).
Sometimes it abandons the goal when another pain dominates.
Final runs are shorter and more successful.
… and at the end.
Comparing Primitive Pain Levels of RL & ML
Initially RL agent learns better.
Its performance deteriorates as the resources are depleted
Moving average of the primitive pain signal.
Effectiveness in terms of cumulative reward:
Reward determined by the designer of the experiment.
Single value function
Objectives set by designer
Maximizes the reward
Learning effort increases with complexity
Multiple value functions
One for each goal
Sets its own objectives
Solves minimax problem
Learns better in complex environment than RL
Acts when needed
Motivated learning method, based on goal creation system, can improve learning of autonomus agents in special class of problems.
ML is especially useful in complex, dynamic environments where it works according to learned hierarchy of goals.
Individual goals use well known reinforcement learning algorithms to learn their corresponding value functions.
ML concerns building internal representations of useful environment percepts, through interaction with the environment.
ML switches machine’s attention and sets intended goals becoming an important mechanism for a cognitive system.
„The real danger is not that computers will begin to think like man, but that man will begin to think like computers.”
Sydney J. Harris