learning to navigate through crowded environments n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Learning to Navigate Through Crowded Environments PowerPoint Presentation
Download Presentation
Learning to Navigate Through Crowded Environments

Loading in 2 Seconds...

play fullscreen
1 / 22

Learning to Navigate Through Crowded Environments - PowerPoint PPT Presentation


  • 170 Views
  • Uploaded on

Learning to Navigate Through Crowded Environments. Peter Henry 1 , Christian Vollmer 2 , Brian Ferris 1 , Dieter Fox 1 Tuesday, May 4, 2010 1 University of Washington, Seattle, USA 2 Ilmenau University of Technology, Germany. The Goal. Enable robot navigation within crowded environments.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Learning to Navigate Through Crowded Environments' - garrett-dickerson


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
learning to navigate through crowded environments
Learning to Navigate Through Crowded Environments

Peter Henry1, Christian Vollmer2, Brian Ferris1, Dieter Fox1

Tuesday, May 4, 2010

1University of Washington, Seattle, USA

2Ilmenau University of Technology, Germany

the goal
The Goal

Enable robot navigation within crowded environments

motivation
Motivation
  • Robots should move naturally and predictably within crowded environments
    • Move amongst people in a socially transparent way
    • More efficient and safer motion
  • Humans trade off various factors
    • To move with the flow
    • To avoid high density areas
    • To walk on the left/right side
    • To reach the goal
challenge
Challenge
  • Humans naturally balance between various factors
    • Relatively easy to list factors
    • But they can’t specify how they are making the tradeoff
  • Previous work typically uses heuristics and parameters are hand-tuned
    • Shortest path with collision avoidance [Burgard, et al., AI 1999]
    • Track and follow a single person [Kirby, et al., HRI 2007]
    • Follow people moving in same direction [Mueller, et al., CogSys 2008]
contribution
Contribution
  • Learn how humans trade off various factors
  • A framework for learning to navigate as humans do within crowded environments
  • Extension of Maximum Entropy Inverse Reinforcement Learning [Ziebart, et al., AAAI 2008] to incorporate:
    • Limited locally observable area
    • Dynamic crowd flow features
markov decision processes
Markov Decision Processes

S0

  • States
  • Actions
  • Rewards / Costs
  • (Transition Probabilities)
  • (Discount Factor)

S1

S2

S3

Goal

navigating in a crowd as an mdp
Navigating in a Crowd as an MDP
  • States si
    • In crowd scenario: Grid cell + orientation
  • Actions ai,j from sito sj
    • In crowd scenario: Move to adjacent cell
  • Cost = An unknown linear combination of action features
    • Cost weights to be learned: θ
    • Path: τ
    • Features: fτ
inverse reinforcement learning
Inverse Reinforcement Learning
  • Inverse Reinforcement Learning (IRL):
    • Given: The MDP structure and a set of example paths
    • Find: The reward function resulting in the same behavior
    • (Also called “Inverse Optimal Control”)
  • Has been previously applied with success
    • Lane changing [Abbeel ICML 2004]
    • Parking lot navigation [Abbeel IROS 2008]
    • Driving route choice and prediction [Ziebart AAAI 2008]
    • Pedestrian route prediction [Ziebart IROS 2009]
maximum entropy irl
Maximum Entropy IRL
  • Exponential distribution over paths:
  • Learning:
  • Gradient: Match observed and expected feature counts
locally observable features
Locally Observable Features
  • It is unrealistic to assume the agent has global knowledge of the crowd
    • Contrast: Continuum Crowd Simulator explicitly finds a global solution for the entire crowd
    • We do assume knowledge of the map itself
  • Training: Only provide flow features for small radius around current position
    • Assumes that these are the features available to the “expert”
    • A single demonstration path becomes many small demonstrations of locally motivated paths
locally observable dynamic features
Locally Observable Dynamic Features
  • Crowd flow changes as the agent moves
  • Locally observable dynamic feature training:
    • Update flow features within local horizon
    • Compute feature gradient within grid
    • Perform stochastic update of weights
    • Take the next step of the observed path
locally observable dynamic irl
Locally Observable Dynamic IRL
  • The path probability decomposes into many short paths over the current features in the locally observable horizon

Features for actions within horizon at time t

Decompose over timesteps

Local Horizon

locally observable dynamic gradient
Locally Observable Dynamic Gradient
  • Uses current estimate of features at time t
  • Computes gradient only within local horizon H

Expected features for actions within H

Observed features within H

map and features
Map and Features
  • Each grid cell encompasses 8 oriented states
    • Allows for flow features relative to orientation
  • Features
    • Distance
    • Crowd flow speed and direction
    • Crowd density
    • (many others possible…)
  • Chosen as being reasonable to obtain from current sensors
experimental setup
Experimental Setup

We used ROS [Willow Garage] to integrate the crowd simulator and IRL learning and planner

  • Extract individual crowd traces and observable features
  • Learn feature weights with our IRL algorithm
  • Use weights for a simulated robot in test scenarios
    • Planning is A* search
    • Re-planning occurs every grid cell with updated features
    • The robot is represented to the crowd simulator as just another person for realistic reactions from the crowd
quantitative results
Quantitative Results
  • Measure similarity to “human” path
    • Shortest Path (baseline): Ignores crowd
    • Learned Path: The path from our learned planner
  • Mean / Maximum Difference: Over all path cells, difference to closest “human” path cell

(Difference is significant at p=0.05 level)

future work
Future Work
  • Train on real crowd data
    • Overhead video + tracking?
    • Wearable sensors to mimic robot sensor input?
  • Implement on actual robot
    • Is the method effective for raw sensor data?
    • Which are the most useful features?
  • Pedestrian prediction
    • Compare / incorporate other recent work [Ziebart IROS 2009]
conclusion
Conclusion
  • We have presented a framework for learning to imitate human behavior from example traces
  • We learn weights that produce paths matching observed behavior from whatever features are made available
  • Our inverse reinforcement learning algorithm handles locally observable dynamic features
  • Resulting paths are more similar to observed human paths