10% Probability we are wrong
This presentation is the property of its rightful owner.
Sponsored Links
1 / 13

10% Probability we are wrong PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

10% Probability we are wrong. 10% Probability we misheard once. 1% Probability we misheard twice. Douglas Aberdeen, National ICT Australia 2003 Anthony R. Cassandra, Leslie Kaelbling, and Michael Littman, NCAI 1995. Partially Observable Markov Decision Process (POMDP). by Sailesh Prabhu

Download Presentation

10% Probability we are wrong

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


10 probability we are wrong

10% Probability we are wrong

10% Probability we misheard once

1% Probability we misheard twice


10 probability we are wrong

Douglas Aberdeen, National ICT Australia 2003

Anthony R. Cassandra, Leslie Kaelbling, and Michael Littman, NCAI 1995

Partially Observable Markov Decision Process (POMDP)

by Sailesh Prabhu

Department of Computer Science

Rice University


Applications

Applications

  • Teaching

  • Medicine

  • Industrial Engineering


Overview

Overview

  • Describe a Partially Observable Markov Decision Procedure (POMDP)

  • Consider the agent

  • Solve the POMDP like we solved MDPs


10 probability we are wrong

Reward

Partial Observability

Control/Action

Describing an MDP using a POMPDP:

How


The agent

Probability

The Agent

Internal State

θ

Observation

Control

I have a load

I don't have a load

Parametrized policy:

Observation

Parameter

Control

Internal State


The agent1

Probability

The Agent

Current State

Φ

Observation

Future State

I have a load

I don't have a load

Parametrized policy:

Parametrized I-State Transition:

Observation

Internal State

Internal State


Recap

Recap

The agent 1) updates internal states and 2) acts.


Solve pomdp

Solve POMDP

  • Globally or locally optimize θ and Φ

  • Maximize long-term average reward:

  • Alternatively, maximize discounted sum of rewards:

  • Suitably mixing:


Learning with a model

Learning with a Model

  • The agent knows the model , ,

  • Observation/action history:

  • Belief state

1/3

1/3

1/3

Goal

1/2

1/2

1


Learning with a model1

Learning with a Model

  • Update beliefs:

  • Long-term value of a belief state

  • Define:


Finite horizon pomdp

Finite Horizon POMDP

  • The value function is piecewise linear and convex

  • Represent it as


Complexity

Complexity

  • Exponential number of state variables:

  • Exponential number of belief states:

  • PSPACE-Hard

  • NP-Hard


  • Login