1 / 14

Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

Warning: Long title…. Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents. Vinay Papudesi and Manfred Huber. Staged skill learning involves: To Begin: “Skills” are innate reflexes and raw representation of the world. The Process:

elie
Download Presentation

Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Warning: Long title… Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents VinayPapudesi and Manfred Huber

  2. Staged skill learning involves: • To Begin: • “Skills” are innate reflexes and raw representation of the world. • The Process: • Abstract away details of learnt skills • Use these abstractions as part of a higher-level representation: • Behavioural results • Affordances • Rinse and repeat Introduction

  3. State representation encodes only those aspects of the environmental state owing behavioural and reward implications in the context of its current capabilities. • A compact representation • Becomes more and more abstract over time • But how to model this?... The Developmental Learner

  4. Three yummy flavours: • External (World) State Space (…maps to…) • Internal State Space (…composed of…) • Action State Spaces • Internal and External spaces are good friends: • Si ← I(Se) Where: Internal state =Si External state =Se Mapping function =I • Objective: Don’t hard-code mapping function, automate it! • Internal State Space is a vector of Action Spaces, one for each action the agent provides… State-Spaces

  5. An action space is defined as a vector of paired(indicator, predicator) conditions. • Conditions are task-agnostic • Can be reused for learning different tasks • Improvement over previous work • When an action is performed: • Signals a transition between internal states, S1 → S2. • Observes an outcome from the world, oʹ. • Two conditions are constructed: • Indicator: Cind(S2) = oʹ • Predicator: Cpre(S1) = oʹ Action SPace

  6. World state space is potentially vast • Must measure outcome somehow • Genetic Algorithms (GAs) are used to train hierarchical, rule-based, classifiers • What if an outcome cannot be accurately measured? • Classifiers simply flag world state as non-deterministic. • Outcome is thus a triple type: (success%, failure%, undetermined) Outcomes, Genetic Algorithms, Non-Determinism, oh my!

  7. ‘Find’ action “Rotate 360° or until an object is visible”

  8. With the abstract state space constructed, the agent can now learn optimal policies for completing tasks. • Treat the problem as a Markov Decision Process (MDP). • From some internal state the agent must select an appropriate action to progress toward completing the task optimally. • Reinforcement learning is used to compute such policies: • Select the policy which maximises the expected future return. • Future reward is estimated from prior experience. Tasks

  9. Must acquire a Task Model • Agent interacts with environment, recording experiences as it does so. • The internal source and destination states get updated with new conditions. • The reward function is re-computed as the average reinforcement value over all the recorded experiences pertaining to the chosen action. • Will eventually converge on the true model The Task model

  10. Not all tasks can be optimally represented with this approach. • Actions are individually encapsulated, knowledge contained within them is not shared among them. • E.g. ‘GOTO’ and ‘PICK’ • Solution is to build ‘bipartition’ states • Allow the GOTO task a condition on whether the item can be PICKed. • … but only if the reward for doing so is significant and the condition is statistically stable (low variance) and deterministic. Task-Specific Conditions

  11. Left: • A hard-coded, expert-designed state space and policy. • Right: • Dynamically acquired equivalent. Results - Foraging

  12. As the agent interacts with the environment the proposed algorithm maintains a near-constant state space complexity. The representation is continually abstracted. Results – State Space Size

  13. The presented technique is comparable to manually-designed behaviour. • Domain specific models are slow to converge. • Their state spaces are more complex= harder to learn. Results – Policy Performance

  14. The paper describes an approach that constructs an abstract internal state space that is grounded in the set of actions that the agent provides. Reinforcement learning aids in selecting actions to complete tasks. By applying an inherently epigenetic design they have devised a developmental learner that produces results that are comparable to hand-rolled solutions. Task learning is performed in a bottom-up fashion (actions to tasks), but the representation of new tasks thereafter can be constructed from the top-down using previously acquired state abstractions. ConclusionarySentiments

More Related