1 / 24

Probabilistic Robotics

Probabilistic Robotics. Planning and Control: Markov Decision Processes. Motivation. Robot perception not the ultimate goal: Choose right sequence of actions to achieve goal. Planning/control applications: Navigation, Surveillance, Monitoring, Collaboration etc.

amal-boyer
Download Presentation

Probabilistic Robotics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Robotics Planning and Control: Markov Decision Processes Based on slides from the book's website

  2. Motivation • Robot perception not the ultimate goal: • Choose right sequence of actions to achieve goal. • Planning/control applications: • Navigation, Surveillance, Monitoring, Collaboration etc. • Ground, air, sea, underground! • Action selection non-trivial in real-world problems: • State non-observable. • Action non-deterministic. • Require dynamic performance.

  3. Problem Classes • Deterministic vs. stochastic actions. • Classical approaches assume known action outcomes. Actions typically non-deterministic. Can only predict likelihoods of outcomes. • Full vs. partial observability. • State of the system completely observable – never happens in real-world applications. • Build representation of the world by performing actions and observing outcomes. • Current and anticipated uncertainty.

  4. Deterministic, Fully Observable

  5. Stochastic, Fully Observable • No noise in Motion models. • Noisy motion models.

  6. Stochastic, Partially Observable

  7. Markov Decision Process (MDP) r=1 s2 0.01 0.9 0.7 0.1 0.3 0.99 r=0 s3 0.3 s1 r=20 0.3 0.4 0.2 s5 s4 r=0 r=-10 0.8

  8. Markov Decision Process (MDP) • Given: • States: • Actions: • Transition probabilities: • Reward / payoff function: • Wanted: • Policy that maximizes future expected reward.

  9. Policies • Policy (general case): • All past data mapped to control commands. • Policy (fully observable case): • State mapped to control commands.

  10. Rewards • Expected cumulative payoff: • Maximize sum of future payoffs! • Discount factor : future reward is worth less! • T=1: greedy policy. Discount does not matter as long as ! • 1<T<∞: finite horizon case, typically no discount. • T=∞: infinite-horizon case, finite reward if

  11. Optimal Policies • Expected cumulative payoff of policy: • Optimal policy: • 1-step optimal policy: • Value function of 1-step optimal policy:

  12. Value Functions • Value function for specific policy (Bellman equation for )

  13. Optimal Value Functions • Optimal policy: • Bellman optimality equations. • Necessary and sufficient condition for optimal policy.

  14. Value Iteration • For all x do • EndFor • Repeat until convergence • For all x do • EndFor • EndRepeat • Action choice:

  15. Value Iteration – Discrete Case • For all x do • EndFor • Repeat until convergence • For all x do • EndFor • EndRepeat • Action choice:

  16. Value Iteration for Motion Planning

  17. Value Function and Policy Iteration • Often the optimal policy has been reached long before the value function has converged. • Policy iteration calculates a new policy based on the current value function and then calculates a new value function based on this policy. • This process often converges faster to the optimal policy.

  18. Value-Iteration Game  • Move from start state (color1) to end state (color2). • Maximize reward. • Four actions. • Twelve colors. • Exploration and exploitation.

  19. Value Iteration • Explore first and then exploit! • One-step look ahead values? • N-step look ahead? • Optimal policy?

  20. Value Iteration – A few steps…

  21. Value Iteration – A few steps…

  22. Value Iteration – A few steps…

  23. Value Iteration – A few steps…

  24. More Information • See Chapter 14 of textbook. • Better resource: Chapters 3-4 of Reinforcement Learning textbook by Sutton and Barto. http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html

More Related