1 / 25

LEAP Algorithm Reinforcement Learning with Adaptive Partitioning

LEAP Algorithm Reinforcement Learning with Adaptive Partitioning. Tsufit Cohen Eyal Radiano Supervised by: Andrey Bernstein. Agenda. Intro Q-learning Leap Algorithm Simulation LEAP vs Q-Learn Conclusions. Intro. Reinforcement Learning Learn optimal policy by trying

tory
Download Presentation

LEAP Algorithm Reinforcement Learning with Adaptive Partitioning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LEAP AlgorithmReinforcement Learning with Adaptive Partitioning Tsufit Cohen Eyal Radiano Supervised by: Andrey Bernstein

  2. Agenda • Intro • Q-learning • Leap • Algorithm • Simulation • LEAP vs Q-Learn • Conclusions

  3. Intro • Reinforcement Learning • Learn optimal policy by trying • Reward for “Good” steps • Performance improvement סוכן

  4. Q-learn • Definitions: • Key specification : • Table representation • לכתוב את הנוסחא של Q • ואת ההגדרות זה מעמוד 6 במאמר • כדי שנוכל להסביר מה זה Q • Exploration policy: epsilon greedy • אולי לפצל את זה לשני שקפים

  5. LEAP Learning Entity (LE) Adaptive Partitioning • Key specifications : • Macro States • Multi Partitioning (each partition is called LE) • Pruning and Joining

  6. Algorithm • Action Selection • Incoherence Criterion • JLE Generation • Update • Pruning Mechanism

  7. Action Selection

  8. Action Selection ( Cont. )

  9. Incoherence Criterion

  10. JLE Generation

  11. Update

  12. Pruning Mechanism

  13. Changes and Add-ons to the Algorithm • Change the order of pruning and updating • Epsilon Greedy policy starts from 0.9 • Boundary condition – Q=0 for End of game.

  14. LE CList<macrostate> Macro_list Int* ID_arr_ Int order Basic LE JLE CList<JLE>* Sons_lists_arr Implementation • Key Operation : • Finding Active LE List for a given state • Finding a macro state within a LE • Add/Remove JLE and/or macro state • Data Structures • Basic LE • JLE inheritance

  15. Basic LE 1 Basic LE 2 Basic LE 3 Basic LE array: Basic LE 1 - magnification: macro list, Id array, order Sons list array pointer to JLEs list in order 1 (empty) pointer to JLEs list in order 2 pointer to JLEs list in order 3 General Data Structure Implementation

  16. Basic LE array: Basic LE X Basic LE Y Basic LE Z Sons list array: Sons list array: Sons list array: 0 2 0 2 0 2 1 1 1 JLE XY JLE XYZ JLE YZ JLE XZ 3D Grid World Implementation Example

  17. חלוקה לפי - x חלוקה לפי -y Simulation 1 – 2D Grid World Start point • Environment Properties: • Size: 20x20 • Step cost: -1 • Award: +2 • Available Moves: Up, Down, Left, Right • Wall Bumping – No movement. • Award Taking – Start a new episode. • Basic LEs: X,Y prize

  18. Simulation 1 Results - Policy start Prize

  19. Results – Average Reward & refined macrostates count

  20. start prize Simulation 2 – Grid Word with an obstacle • Environment Properties : • Size : 5x5 • Step Cost: -1 • Award: +2 • Obstacle: -3

  21. Simulation 2 – Grid Word with an obstacle

  22. Simulation 2 Results start • Note: the policy changes – Due to Epsilon

  23. LEAP vs Q-Learn

  24. Conclusions • Memory reduction • Complexity of implementation • Deviation from optimal policy

  25. Questions ?

More Related