1 / 67

Partially Observable Markov Decision Process (Chapter 15 & 16)

Partially Observable Markov Decision Process (Chapter 15 & 16). José Luis Peralta. Contents. POMDP Example POMDP Finite World POMDP algorithm Practical Considerations Approximate POMDP Techniques. Partially Observable Markov Decision Processes (POMDP). POMDP:

thora
Download Presentation

Partially Observable Markov Decision Process (Chapter 15 & 16)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

  2. Contents • POMDP • Example POMDP • Finite World POMDP algorithm • Practical Considerations • Approximate POMDP Techniques

  3. Partially Observable Markov Decision Processes(POMDP) • POMDP: • Uncertainty in Measurements  State • Uncertainty in Control Effects • Adapt previous Value Iteration Algorithm (VI-VIA)

  4. Partially Observable Markov Decision Processes(POMDP) • POMDP: • World can't be sensed directly • Measurements: incomplete, noisy, etc. • Partial Observability • Robot has to estimate a posterior distribution over a possible world state.

  5. Partially Observable Markov Decision Processes(POMDP) • POMDP: • Algorithm to find optimal control policy exit for FINITE WORLD: • State space • Action space • Space of observation • Planning horizon • Computation is complex • For continuous case there are approximations All Finite

  6. Partially Observable Markov Decision Processes(POMDP) • The algorithm we are going to study all based in Value Iteration (VI). with • The same as previous but is not observable • Robot has to make decision in the BELIEF STATE • Robot’s internal knowledge about the state of the environment • Space of posteriori distribution over state

  7. Partially Observable Markov Decision Processes(POMDP) • So with • Control Policy

  8. Partially Observable Markov Decision Processes(POMDP) • Belief  bel  • Each value in POMDP is function of entire probability distribution • Problems: • State Space finite  Belief Space continuous • State Space continuous  Belief Space infinitely-dimensional continuum • Also complexity in calculate the Value Function Because of the integral over all the distribution

  9. Partially Observable Markov Decision Processes(POMDP) • At the end  optimal solution exist for Interesting Special Case of Finite World: • state space; action space; space of observations; planning horizon  All finite • Solution of VF are Piecewise Linear Function over the belief space • The previous arrive because • Expectation is a linear operation • Ability to select different controls in different parts

  10. Example POMDP 2 States: 3 Control Actions:

  11. Example POMDP When execute payoff: Dilemma  opposite payoff in each state  knowledge of the state translate directly into payoff

  12. Example POMDP To acquire knowledge robot has control (Cost of waiting, cost of sensing, etc.) affects the state of the world in non-deterministic manner:

  13. Example POMDP • Benefit  Before each control decision, the robot can sense. By sensing robot gains knowledge about the state • Make better control decisions • High payoff expectation • In the case of control action , robot sense without terminal action

  14. Example POMDP • The measurement model is governed by the following probability distribution:

  15. Example POMDP This example is easy to graph over the belief space (2 states) • Belief state

  16. Example POMDP • Control Policy • Function that maps the unit interval [0;1] to space of all actions Example

  17. Example POMDP – Control Choice • Control Choice (When to execute what control?) • First consider the immediate payoff . • Payoff now is a function of belief state So for , the expected payoff Payoff in POMDPs

  18. Example POMDP – Control Choice

  19. Example POMDP – Control Choice

  20. Example POMDP – Control Choice

  21. Example POMDP – Control Choice • First we calculate • the robot simply selects the action of highest expected payoff Piecewise Linear convex Function Maximum of individual payoff function

  22. Example POMDP – Control Choice • First we calculate • the robot simply selects the action of highest expected payoff Piecewise Linear convex Function Maximum of individual payoff function

  23. Example POMDP – Control Choice • First we calculate • the robot simply selects the action of highest expected payoff Transition occurs when in Optimal Policy

  24. Example POMDP - Sensing • Now we have perception • What if the robot can sense before it chooses control? • How it affects the optimal Value Function Sensing info about State enable choose better control action In previous example Expected payoff How better will this be after sensing?

  25. Example POMDP – Control Choice Belief after sensing as a function of the belief before sensing Given by Bayes Rule Finally

  26. Example POMDP – Control Choice How this affects the Value Function?

  27. Example POMDP – Control Choice Mathematically That is just replacing by in the Value Function

  28. Example POMDP – Control Choice However our interest is the complete Expected Value Function after sensing, that consider also the probability of sensing the other measurement. This is given by:

  29. Example POMDP – Control Choice An this results in

  30. Example POMDP – Control Choice Mathematically

  31. Example POMDP - Prediction To plan at a horizon larger than we have to take this into consideration and project our value function accordingly According to our transition probability model If If In between the expectation is linear

  32. Example POMDP – Prediction An this results in

  33. Example POMDP – Prediction And adding and we have:

  34. Example POMDP – Prediction Mathematically cost Fix!!

  35. Example POMDP – Pruning Full backup : Impractical!!! Efficient approximate POMDP needed

  36. Finite World POMDP algorithm To understand this read Mathematical Derivation of POMDPs pp.531-536 in [1]

  37. Finite World POMDP algorithm To understand this read Mathematical Derivation of POMDPs pp.531-536 in [1]

  38. Example POMDP – Practical Considerations It looks easy let’s try something more “real”… Probabilistic Robot “RoboProb”

  39. Example POMDP – Practical Considerations It looks easy let’s try something more “real”… Probabilistic Robot “RoboProb” 11 States: 0.8 5 Control Actions: 0.1 0.1 Sense without moving Transition Model

  40. Example POMDP – Practical Considerations It looks easy let’s try something more “real”… Probabilistic Robot “RoboProb” “Reward”  Payoff The same set for all control action Example

  41. Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Transition Probability Example  0.8 0.1 0.1

  42. Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Transition Probability Example 

  43. Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Measurement Probability

  44. Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Belief States Impossible to graph!!

  45. Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Each linear function results from executing control , followed by observing measurement , and then executing control .

  46. Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Defining Measurement Probability Defining “Reward” Payoff Defining Transition Probability Merging Transition (Control) Probability

  47. Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Setting Beliefs Executing Sensing Executing

  48. Example POMDP – Practical Considerations Now What…? Probabilistic Robot “RoboProb” Calculating The real problem is to compute 

  49. Example POMDP – Practical Considerations The real problem is to compute  Key factor in this update is the conditional probability This probability specifies a distribution over probability distributions. Given a belief and a control action , the outcome is a distribution over distributions. Because belief is also based on the next measurement, the measurement itself is generated stochastically.

  50. Example POMDP – Practical Considerations The real problem is to compute  So we make Contain only on non-zero term =

More Related