probabilistic robotics n.
Download
Skip this Video
Download Presentation
Probabilistic Robotics

Loading in 2 Seconds...

play fullscreen
1 / 15

Probabilistic Robotics - PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on

Probabilistic Robotics. Planning and Control: Markov Decision Processes. Problem Classes. Deterministic vs. stochastic actions Full vs. partial observability. Deterministic, fully observable. Stochastic, Fully Observable. Stochastic, Partially Observable. Markov Decision Process (MDP).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Probabilistic Robotics' - hasana


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
probabilistic robotics

Probabilistic Robotics

Planning and Control:

Markov Decision Processes

problem classes
Problem Classes
  • Deterministic vs. stochastic actions
  • Full vs. partial observability
markov decision process mdp
Markov Decision Process (MDP)

r=1

s2

0.01

0.9

0.7

0.1

0.3

0.99

r=0

s3

0.3

s1

r=20

0.3

0.4

0.2

s5

s4

r=0

r=-10

0.8

markov decision process mdp1
Markov Decision Process (MDP)
  • Given:
  • States x
  • Actions u
  • Transition probabilities p(x‘|u,x)
  • Reward / payoff functionr(x,u)
  • Wanted:
  • Policy p(x) that maximizes the future expected reward
rewards and policies
Rewards and Policies
  • Policy (general case):
  • Policy (fully observable case):
  • Expected cumulative payoff:
    • T=1: greedy policy
    • T>1: finite horizon case, typically no discount
    • T=infty: infinite-horizon case, finite reward if discount < 1
policies contd
Policies contd.
  • Expected cumulative payoff of policy:
  • Optimal policy:
  • 1-step optimal policy:
  • Value function of 1-step optimal policy:
2 step policies
2-step Policies
  • Optimal policy:
  • Value function:
t step policies
T-step Policies
  • Optimal policy:
  • Value function:
infinite horizon
Infinite Horizon
  • Optimal policy:
  • Bellman equation
  • Fix point is optimal policy
  • Necessary and sufficient condition
value iteration
Value Iteration
  • for all x do
  • endfor
  • repeat until convergence
    • for all x do
    • endfor
  • endrepeat
value function and policy iteration
Value Function and Policy Iteration
  • Often the optimal policy has been reached long before the value function has converged.
  • Policy iteration calculates a new policy based on the current value function and then calculates a new value function based on this policy.
  • This process often converges faster to the optimal policy.