adaptive intelligent mobile robots n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Adaptive Intelligent Mobile Robots PowerPoint Presentation
Download Presentation
Adaptive Intelligent Mobile Robots

Loading in 2 Seconds...

play fullscreen
1 / 27

Adaptive Intelligent Mobile Robots - PowerPoint PPT Presentation


  • 151 Views
  • Uploaded on

Adaptive Intelligent Mobile Robots. Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT. Two projects. Making reinforcement learning work on real robots Solving huge problems dynamic problem reformulation explicit uncertainty management. Reinforcement learning.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Adaptive Intelligent Mobile Robots' - Antony


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
adaptive intelligent mobile robots

Adaptive Intelligent Mobile Robots

Leslie Pack Kaelbling

Artificial Intelligence Laboratory

MIT

two projects
Two projects
  • Making reinforcement learning work on real robots
  • Solving huge problems
    • dynamic problem reformulation
    • explicit uncertainty management
reinforcement learning
Reinforcement learning
  • given a connection to the environment
  • find a behavior that maximizes long-run reinforcement

Environment

Observation

Reinf

Action

why reinforcement learning
Why reinforcement learning?
  • Unknown or changing environments
  • Easier for human to provide reinforcement function than whole behavior
q learning
Q-Learning
  • Learn to choose actions because of their long-term consequences
    • Given experience:
    • Given a state s , take the action a that maximizes
does it work
Does it Work?
  • Yes and no.
    • Successes in simulated domains: backgammon, elevator scheduling
    • Successes in manufacturing and juggling with strong constraints
    • No strong successes in more general online robotic learning
why is rl on robots hard
Why is RL on robots hard?
  • Need fast, robust supervised learning
  • Continuous input and action spaces
  • Q-learning slow to propagate values
  • Need strong exploration bias
making rl on robots easier
Making RL on robots easier
  • Need fast, robust supervised learning
    • locally weighted regression
  • Continuous input and action spaces
    • search and caching of optimal action
  • Q-learning slow to propagate values
    • model-based acceleration
  • Need strong exploration bias
    • start with human-supplied policy
start with human provided policy
Start with human-provided policy

action

Human

Policy

Environment

state

do supervised policy learning
Do supervised policy learning

Human

Policy

action

Train

Policy

s

a

Environment

state

when the policy is learned let it drive
When the policy is learned, let it drive

Human

Policy

Train

action

Policy

Environment

state

q learning1
Q-Learning

Train

action

Policy

RL

s

Q-Value

v

a

D

Environment

state

acting based on q values
Acting based on Q values

s

Q-Value

max

index

a1

Q-Value

a2

a

Q-Value

an

letting the q learner drive

s

Q-Value

v

a

Letting the Q-learner drive

Train

Policy

action

RL

max

D

Environment

state

train policy with max q values

s

Q-Value

v

a

Train policy with max Q values

Train

action

Policy

RL

max

s’

D

Environment

state

add model learning
Add model learning

Train

action

Policy

RL

s

Q-Value

v

a

Train

s

s

Model

a

r

D

Environment

state

when model is good train q with it
When model is good, train Q with it

Train

action

Policy

RL

s

Q-Value

v

a

Train

s’

Model

a’

D

Environment

state

other forms of human knowledge
Other forms of human knowledge
  • hard safety constraints on action choices
  • partial models or constraints on models
  • value estimates or value orderings on states
we will have succeeded if
We will have succeeded if
  • It takes less human effort and total development time to
    • provide prior knowledge
    • run and tune the learning algorithm
  • than to
    • write and debug the program without learning
test domain
Test domain
  • Indoor mobile-robot navigation and delivery tasks
    • quick adaptation to new buildings
    • quick adaptation to sensor change or failure
    • quick incorporation of human information
solving huge problems
Solving huge problems
  • We have lots of good techniques for small-to-medium sized problems
    • reinforcement learning
    • probabilistic planning
    • Bayesian inference
  • Rather than scale them to tackle huge problems directly, formulate right-sized problems on the fly
dynamic problem reformulation
Dynamic problem reformulation

working

memory

perception

action

reformulation strategy
Reformulation strategy
  • Dynamically swap variables in and out of working memory
    • constant sized problem always tractable
    • adapt to changing situations, goals, etc
  • Given more time pressure, decrease problem size
  • Given less time pressure, increase problem size
slide24

Multiple-resolution plans

Fine view of near-term high-probability events

Coarse view of distant low-probability events

slide25

Information gathering

  • Explicit models of the robot’s uncertainty allow information gathering actions
    • drive to top of hill for better view
    • open a door to see what’s inside
    • ask a human for guidance

Two miles up this road

Where is the supply depot?

explicit uncertainty modeling
Explicit uncertainty modeling
  • POMDP work gives us theoretical understanding
  • Derive practical solutions from
    • learning explicit memorization policies
    • approximating optimal control
huge domain experiments
Huge-domain experiments
  • Simulation of very complex task environment
    • large number of buildings and other geographical structures
    • concurrent, competing tasks such as
      • surveillance
      • supply delivery
      • self-preservation
    • other agents from whom information can be gathered