An Introduction to PO-MDP. Presented by Alp Sardağ. MDP. Components: State Action Transition Reinforcement Problem: choose the action that makes the right tradeoffs between the immediate rewards and the future gains, to yield the best possible solution Solution:
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Q(x,a) Q(x,a) +(r+ maxbQ(y,b) - Q(x,a))
where learning rate and discount rate.
cbf: current belief state, a:action, o:observation
GOAL:for each iteration of value iteration, find a finite
number of linear segments that make up the value function
a2 is the best
a1 is the bestPO-MDP Value Iteration Example
V(a1,b) = 0.25x1+0.75x0 = 0.25
0.6x0.8 + 0.25x0.7 + 0.15x1.2 = 0.835