1 / 18

POMDP

POMDP. Simple Example. Andrey Andreyevich Markov. POMDP – Instance general description. Problem description: A robot is situated in the following course: S={ S 1 , S 2 , S 3 } However, it cannot tell in which state it starts. Instance parameters – The actions set. The actions set:

zlata
Download Presentation

POMDP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. POMDP Simple Example AndreyAndreyevich Markov

  2. POMDP – Instance general description Problem description: A robot is situated in the following course: S={ S1, S2, S3 } However, it cannot tell in which state it starts.

  3. Instance parameters – The actions set The actions set: Move clockwise => am Stay put => as Activate colour sensor => ac A={ am, as, ac }

  4. Instance parameters – The score function The score function: R(s, a) The (immediate) reward for doing aA at state S. R( S2, as )= R( S3, as )=-1 R( S1, as )=+1 R( S1, ac )=R( S2, ac )=R( S3, ac )=-0.1 R( S1, am )=R( S2, am )=R( S3, am )=0 * We will use γ=0.5

  5. Instance parameters – The traverse function The transition function: Tr(s, a, s’) The probability to reach s’ from s using a. Tr(S1, am, S2)=0.9 Tr(S1, am, S1)=0.1 Tr(S1, am, S3)=0 Tr(S2, am, S3)=0.9 Tr(S2, am, S2)=0.1 Tr(S2, am, S1)=0 Tr(S3, am, S1)=0.9 Tr(S3, am, S3)=0.1 Tr(S3, am, S2)=0 Tr(S1, as, S1)=1 Tr(S1, as, S2)=Tr(S1, as, S3)=0 Tr(S2, as, S2)=1 Tr(S2, as, S3)=Tr(S2, as, S1)=0 Tr(S3, as, S3)=1 Tr(S3, as, S1)=Tr(S3, as, S2)=0 Tr(S1, ac, S1)=1 Tr(S1, ac, S2)=Tr(S1, ac, S3)=0 Tr(S2, ac, S2)=1 Tr(S2, ac, S3)=Tr(S2, ac, S1)=0 Tr(S3, ac, S3)=1 Tr(S3, ac, S1)=Tr(S3, ac, S2)=0

  6. Instance parameters – The observation set & function The observations set: Black observation => ob Red observation => or Ω={ ob, or } The observation function: O(s, a, o) The probability to observe oΩ after doing a and reaching s O(S1, am, ob)=O(S1, am, or)=0.5 O(S2, am, ob)=O(S2, am, or)=0.5 O(S3, am, ob)=O(S3, am, or)=0.5 O(S1, as, ob)=O(S1, as, or)=0.5 O(S2, as, ob)=O(S2, as, or)=0.5 O(S3, as, ob)=O(S3, as, or)=0.5 O(S1, ac, or)=0.8 O(S1, ac, ob)=0.2 O(S2, ac, ob)=0.8 O(S2, ac, or)=0.2 O(S3, ac, ob)=0.8 O(S3, ac, or)=0.2

  7. Calculations – notes Notation - the belief states probabilities: We will use the following notations: P(s1)=p P(s2)=q P(s3)=1-p-q b={P(s1)=p, P(s2)=q } (short for (p,q,1-p-q). Notation - the next belief state: We will use the following notation: for the resulting belief state (i.e. the next belief state) after applying action a at the belief state b and observing o. Start state: We start at the belief state: bi={P(s1)=⅓, P(s2)=⅓ }.

  8. 1 step policy schema: Concrete policies: The belief state value function: The initial belief state value:

  9. The belief state value function (1-step)

  10. 2 steps policy – 1st round schema: Concrete policy (1 out of 27): The belief state value function: The initial belief state value:

  11. Calculations – The new belief state & observations probabilities We will calculate: a) b)

  12. Calculations – Ob new belief state & probability

  13. Calculations – Ob new belief state & probability (con’t)

  14. Calculations – Or new belief state & probability

  15. Calculations – Or new belief state & probability (con’t)

  16. Calculations – The belief state value function

  17. 2 steps policy – 1st round schema: Concrete policy (1 out of 27): The belief state value function: The initial belief state value:

  18. The belief state value function (2-step)

More Related