Hierarchical Reinforcement Learning. Mausam. [A Survey and Comparison of HRL techniques]. The Outline of the Talk. MDPs and Bellman’s curse of dimensionality. RL: Simultaneous learning and planning. Explore avenues to speed up RL. Illustrate prominent HRL methods.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
[A Survey and Comparison of HRL techniques]
What action next?
Slide courtesy Dan Weld
Episodic MDP ´ MDP with absorbing goals
** bounds R for
Find a policy (S!A), which:
|S| : exponential in number of
features in the domain*.
What action next?Decision Making while Learning*
* Known as
Optimal policy is the action with maximum Q* value.
New estimate of Q value
Old estimate of Q value
where experience tuple is hs,a,s’,r,Ni
r = accumulated discounted reward while action a was executing.
g is a gate
bi is a behaviour
*Can be a multi-
*Can be a policy
over lower level
ReturnMachine: Movee + Collision Avoidance
: End of hallway
End of hallway
Children of a task are unordered
Reward received while navigating
Reward received after navigation
Q([s,m],n) = V([s,n]) +C([s,m],n) + Cex([s,m])
* Can define
eqns for both
**Adv. of using
Images courtesy various sources
"... consider maze domains. Reinforcement learning researchers, including this author, have spent countless years of research solving a solved problem! Navigating in grid worlds, even with stochastic dynamics, has been far from rocket science since the advent of search techniques such as A*.” -- David Andre