1 / 56

Space-Indexed Dynamic Programming: Learning to Follow Trajectories

Space-Indexed Dynamic Programming: Learning to Follow Trajectories. J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science Department Stanford University July 2008, ICML. TexPoint fonts used in EMF.

colum
Download Presentation

Space-Indexed Dynamic Programming: Learning to Follow Trajectories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science DepartmentStanford University July 2008, ICML TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAA

  2. Outline • Reinforcement Learning and Following Trajectories • Space-indexed Dynamical Systems and Space-indexed Dynamic Programming • Experimental Results

  3. Reinforcement Learning and Following Trajectories

  4. Trajectory Following • Consider task of following trajectory in a vehicle such as a car or helicopter • State space too large to discretize, can’t apply tabular RL/dynamic programming

  5. Trajectory Following • Dynamic programming algorithms w/ non-stationary policies seem well-suited to task • Policy Search by Dynamic Programming (Bagnell, et. al), Differential Dynamic Programming (Jacobson and Mayne)

  6. Dynamic Programming t=1 Divide control task into discrete time steps

  7. Dynamic Programming t=1 t=2 Divide control task into discrete time steps

  8. Dynamic Programming t=4 t=5 t=3 t=1 t=2 Divide control task into discrete time steps

  9. Dynamic Programming t=4 t=5 t=3 t=1 t=2 Proceeding backwards in time, learn policies fort = T, T-1, …, 2, 1

  10. Dynamic Programming t=4 t=5 t=3 t=1 t=2 Proceeding backwards in time, learn policies fort = T, T-1, …, 2, 1

  11. Dynamic Programming t=4 t=5 t=3 t=1 t=2 Proceeding backwards in time, learn policies fort = T, T-1, …, 2, 1

  12. Dynamic Programming t=4 t=5 t=3 t=1 t=2 Proceeding backwards in time, learn policies fort = T, T-1, …, 2, 1

  13. Dynamic Programming t=4 t=5 t=3 t=1 t=2 Key Advantage:Policies are local (only need to perform well over small portion of state space)

  14. Problems with Dynamic Programming Problem #1: Policies from traditional dynamic programming algorithms are time-indexed

  15. Problems with Dynamic Programming Supposed we learned policy assuming this distribution over states

  16. Problems with Dynamic Programming But, due to natural stochasticity of environment, car is actually here at t = 5

  17. Problems with Dynamic Programming Resulting policy will perform very poorly

  18. Problems with Dynamic Programming Partial Solution: Re-indexingExecute policy closest to current location, regardless of time

  19. Problems with Dynamic Programming Problem #2: Uncertainty over future states makes it hard to learn any good policy

  20. Problems with Dynamic Programming Dist. over states at time t = 5 Due to stochasticity, large uncertainty over states in distant future

  21. Problems with Dynamic Programming Dist. over states at time t = 5 DP algorithms require learning policy that performs well over entire distribution

  22. Space-Indexed Dynamic Programming • Basic idea of Space-Indexed Dynamic Programming (SIDP): Perform DP with respect to space indices (planes tangent to trajectory)

  23. Space-Indexed Dynamical Systems and Dynamic Programming

  24. Difficulty with SIDP • No guarantee that taking single action will move to next plane along trajectory • Introduce notion of space-indexed dynamical system

  25. Time-Indexed Dynamical System • Creating time-indexed dynamical systems:

  26. Time-Indexed Dynamical System • Creating time-indexed dynamical systems: current state

  27. Time-Indexed Dynamical System • Creating time-indexed dynamical systems: control action current state

  28. Time-Indexed Dynamical System • Creating time-indexed dynamical systems: control action time derivative of state current state

  29. Time-Indexed Dynamical System • Creating time-indexed dynamical systems: Euler integration

  30. Space-Indexed Dynamical Systems • Creating space-indexed dynamical systems: • Simulate forward until whenever vehicle hits next tangent plane space index d+1 space index d

  31. Space-Indexed Dynamical Systems space index d+1 space index d • Creating space-indexed dynamical systems:

  32. Space-Indexed Dynamical Systems space index d+1 space index d • Creating space-indexed dynamical systems: (Positive solution exists as long as controller makes some forward progress)

  33. Space-Indexed Dynamical Systems • Result is a dynamical system indexed by spatial-index variable d rather than time • Space-indexed dynamic programming runs DP directly on this system

  34. Space-Indexed Dynamic Programming d=1 Divide trajectory into discrete space planes

  35. Space-Indexed Dynamic Programming d=1 d=2 Divide trajectory into discrete space planes

  36. Space-Indexed Dynamic Programming d=4 d=5 d=3 d=1 d=2 Divide trajectory into discrete space planes

  37. Space-Indexed Dynamic Programming d=4 d=5 d=3 d=1 d=2 Proceeding backwards, learn policies ford = D, D-1, …, 2, 1

  38. Space-Indexed Dynamic Programming d=4 d=5 d=3 d=1 d=2 Proceeding backwards, learn policies ford = D, D-1, …, 2, 1

  39. Space-Indexed Dynamic Programming d=4 d=5 d=3 d=1 d=2 Proceeding backwards, learn policies ford = D, D-1, …, 2, 1

  40. Space-Indexed Dynamic Programming d=4 d=5 d=3 d=1 d=2 Proceeding backwards, learn policies ford = D, D-1, …, 2, 1

  41. Problems with Dynamic Programming Problem #1: Policies from traditional dynamic programming algorithms are time-indexed

  42. Space-Indexed Dynamic Programming Space indexed DP: always executes policy based on current spatial index Time indexed DP: can execute policy learned for different location

  43. Problems with Dynamic Programming Problem #2: Uncertainty over future states makes it hard to learn any good policy

  44. Space-Indexed Dynamic Programming Dist. over states at time t = 5 Dist. over states at index d = 5 Space indexed DP: much tighter distribution over future states Time indexed DP: wide distribution over future states

  45. Space-Indexed Dynamic Programming Dist. over states at time t = 5 Dist. over states at index d = 5 t(5): Space indexed DP: much tighter distribution over future states Time indexed DP: wide distribution over future states

  46. Experiments

  47. Task: following race track trajectory in RC car with randomly placed obstacles Experimental Domain

  48. Experimental Setup • Implemented space-indexed version of PSDP algorithm • Policy chooses steering angle using SVM classifier (constant velocity) • Used simple textbook model simulator of car dynamics to learn policy • Evaluated PSDP time-indexed, time-indexed with re-indexing and space-indexed

  49. Time-Indexed PSDP

  50. Time-Indexed PSDP w/ Re-indexing

More Related