1 / 13

Unveiling POMDPs: Optimizing Agent's Decision Strategy

Delve into Partially Observable Markov Decision Procedures (POMDPs) and learn to optimize decision-making strategies for agents in various applications such as teaching, medicine, and industrial engineering. Explore techniques to solve POMDPs effectively and maximize rewards through globally or locally optimizing key parameters. Gain insights into learning with a model, updating beliefs, and understanding the complexity of POMDP value functions.

tilly
Download Presentation

Unveiling POMDPs: Optimizing Agent's Decision Strategy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 10% Probability we are wrong 10% Probability we misheard once 1% Probability we misheard twice

  2. Douglas Aberdeen, National ICT Australia 2003 Anthony R. Cassandra, Leslie Kaelbling, and Michael Littman, NCAI 1995 Partially Observable Markov Decision Process (POMDP) by Sailesh Prabhu Department of Computer Science Rice University

  3. Applications • Teaching • Medicine • Industrial Engineering

  4. Overview • Describe a Partially Observable Markov Decision Procedure (POMDP) • Consider the agent • Solve the POMDP like we solved MDPs

  5. Reward Partial Observability Control/Action Describing an MDP using a POMPDP: How

  6. Probability The Agent Internal State θ Observation Control I have a load I don't have a load Parametrized policy: Observation Parameter Control Internal State

  7. Probability The Agent Current State Φ Observation Future State I have a load I don't have a load Parametrized policy: Parametrized I-State Transition: Observation Internal State Internal State

  8. Recap The agent 1) updates internal states and 2) acts.

  9. Solve POMDP • Globally or locally optimize θ and Φ • Maximize long-term average reward: • Alternatively, maximize discounted sum of rewards: • Suitably mixing:

  10. Learning with a Model • The agent knows the model , , • Observation/action history: • Belief state 1/3 1/3 1/3 Goal 1/2 1/2 1

  11. Learning with a Model • Update beliefs: • Long-term value of a belief state • Define:

  12. Finite Horizon POMDP • The value function is piecewise linear and convex • Represent it as

  13. Complexity • Exponential number of state variables: • Exponential number of belief states: • PSPACE-Hard • NP-Hard

More Related