trials and tribulations n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Trials and Tribulations PowerPoint Presentation
Download Presentation
Trials and Tribulations

Loading in 2 Seconds...

play fullscreen
1 / 13

Trials and Tribulations - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Trials and Tribulations. Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm. Subject of Investigation. How humans integrate visual object properties into their action policy when learning a novel visuomotor task. BubblePop !

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Trials and Tribulations' - kitty


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
trials and tribulations

Trials and Tribulations

Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm

subject of investigation
Subject of Investigation
  • How humans integrate visual object properties into their action policy when learning a novel visuomotor task.
    • BubblePop!
  • Problem: Too many possible questions…
  • Solution: Motivate behavioral research by looking at modeling difficulties.
    • Nonobvious crossroads
approach
Approach
  • Since the task has a scalar performance signal, model must utilize reinforcement learning.
    • Temporal Difference Back Propagation
  • Start with an extremely simplified version of the task and add back the complexity once you have a successful model.
  • Analyze the representational and architectural constraints necessary for each model.
first steps dummy world
First Steps: Dummy World
  • 5x5 grid-world
  • 4 possible actions
    • Up, down, left, right
  • 1 unmoving target
  • Starting locations of target and agent randomly assigned
  • Fixed reward upon reaching target and a new target generated
  • Epoch ends after fixed number of steps
dummy world architectures
Dummy World Architectures

Expected Reward

1

8 Hidden Layer

context

(ego only)

25 units for the grid

4 Actions

The whole grid (allocentric), or agent centered (egocentric)

building in symmetry
Building in symmetry
  • Current architectures learn each action independently.
  • ‘Up’ is like ‘Down’, but different.
    • It shifts the world
  • 1 action, 4 different inputs
    • “In which rotation of the world would you rather go ‘up’ in?”
world scaling
World scaling
  • Scaled grid size up to 10x10
    • Not as unrealistic as one might think… (tile coding)
  • Scaled number of targets
    • Difference from 1 to 2, but not from 2 to many.
  • Confirmed ‘winning-est’ representation
  • Added memory
no low hanging fruit the ripeness problem
No low hanging fruit:The ripeness problem
  • Added a ‘ripeness’ dimension to target, and changed the reward function:

If target.ripeness>.60

reward = 1;

Else

reward = -.66667;

How the problem occurs:

  • At a high temperature you move randomly.
  • The random pops net zero reward.
  • The temperature lowers and you ignore the target entirely.
a psychologically plausible solution
A psychologically plausible solution
  • No feedback for almost ripe
  • So how could we anneal our ripeness criterion?
  • Anneal the amount you care about unripe pops.
  • Differentiate internal and extern reward functions
future directions
Future directions
  • Investigate how the type of ripeness difficulty impacts computational demands.
    • Difficulty due to reward schedule vs. perceptual acuity vs. redundancy vs. conjunctive-ness vs. ease of prediction
  • How to handle the ‘Feature Binding ‘Problem’ in this context
    • Emergent binding through deep learning?
  • Just keep increasing complexity and see what problems crop up.
    • If the model gets to human level performance without a hitch, then that’d be pretty good to.
summary discussion
Summary& discussion
  • Egocentric representations pay off in this domain, even with the added memory cost.
    • In any domain with a single agent?
  • Symmetries in the action space can be exploited to greatly expedite learning
    • Could there be a general mechanism for detecting such symmetries?
  • Difficult reward functions might be learnt via annealing internal reward signals.
    • How could we have this annealing emerge from the model?