html5-img
1 / 24

Probabilistic Robotics

Probabilistic Robotics. Planning and Control: Markov Decision Processes. Motivation. Robot perception not the ultimate goal: Choose right sequence of actions to achieve goal. Planning/control applications: Navigation, Surveillance, Monitoring, Collaboration etc.

raja-mckay
Download Presentation

Probabilistic Robotics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Robotics Planning and Control: Markov Decision Processes Based on slides from the book's website

  2. Motivation • Robot perception not the ultimate goal: • Choose right sequence of actions to achieve goal. • Planning/control applications: • Navigation, Surveillance, Monitoring, Collaboration etc. • Ground, air, sea, underground! • Action selection non-trivial in real-world problems: • State non-observable. • Action non-deterministic. • Require dynamic performance.

  3. Problem Classes • Deterministic vs. stochastic actions. • Classical approaches assume known action outcomes. Actions typically non-deterministic. Can only predict likelihoods of outcomes. • Full vs. partial observability. • State of the system completely observable – never happens in real-world applications. • Build representation of the world by performing actions and observing outcomes. • Current and anticipated uncertainty.

  4. Deterministic, Fully Observable

  5. Stochastic, Fully Observable • No noise in Motion models. • Noisy motion models.

  6. Stochastic, Partially Observable

  7. Markov Decision Process (MDP) r=1 s2 0.01 0.9 0.7 0.1 0.3 0.99 r=0 s3 0.3 s1 r=20 0.3 0.4 0.2 s5 s4 r=0 r=-10 0.8

  8. Markov Decision Process (MDP) • Given: • States: • Actions: • Transition probabilities: • Reward / payoff function: • Wanted: • Policy that maximizes the future expected reward.

  9. Policies • Policy (general case): • All past data mapped to control commands. • Policy (fully observable case): • State mapped to control commands.

  10. Rewards • Expected cumulative payoff: • Maximize sum of future payoffs! • Discount factor : reward in the future worth less due to inflation! • T=1: greedy policy – look at next step. Discount does not matter as long as ! • 1<T<∞: finite horizon case, typically no discount. • T=∞: infinite-horizon case, finite reward if .

  11. Optimal Policies • Expected cumulative payoff of policy: • Optimal policy: • 1-step optimal policy: • Value function of 1-step optimal policy:

  12. 2-step Policies • Optimal policy: • Value function:

  13. T-step Policies • Optimal policy: • Value function:

  14. Infinite Horizon • Optimal policy: • Bellman equation. • Necessary and sufficient condition for optimal policy.

  15. Value Iteration • For all x do • EndFor • Repeat until convergence • For all x do • EndFor • EndRepeat • Return • Action choice: • After convergence, choose greedy policy!

  16. Value Iteration – Discrete Case • Replace integration by summation  • For all x do • EndFor • Repeat until convergence • For i = 1 to N do • EndFor • EndRepeat • Return • Action choice:

  17. Value Iteration for Motion Planning

  18. Value Function and Policy Iteration • Often the optimal policy has been reached long before the value function has converged. • Policy iteration calculates a new policy based on the current value function and then calculates a new value function based on this policy. • This process often converges faster to the optimal policy.

  19. Value-Iteration Game  • Move from start state (color1) to end state (color2). • Maximize reward. • Four actions. • Twelve colors. • Exploration and exploitation.

  20. Value Iteration • Explore first and then exploit! • One-step look ahead values? • N-step look ahead? • Optimal policy?

  21. Value Iteration – A few steps…

  22. Value Iteration – A few steps…

  23. Value Iteration – A few steps…

  24. Value Iteration – A few steps…

More Related