1 / 12

Reinforcement Learning and Genetic Algorithms

Reinforcement Learning and Genetic Algorithms. Staffan Järn. Intelligent learning algortithm Doesn’t require the presence of a teacher The algorithm is given a reward (a reinforcement) for good actions

lynsey
Download Presentation

Reinforcement Learning and Genetic Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reinforcement Learning and Genetic Algorithms Staffan Järn

  2. Intelligent learning algortithm • Doesn’t require the presence of a teacher • The algorithm is given a reward (a reinforcement) for good actions • The algortithm tries to figure out what is the best action to take in a given state, without knowing the final optimal solution. • The actions are based on rewards and penalties. Reinforcement learning

  3. Robot control • Elevator scheduling (search for patterns) • Telecommunications (finding networks) • Games (Chess, Backgammon) • Financial trading Areas

  4. Gridworld (4 x 12) • The walker (agent) is supposed to find the shortest or safest way to the finish, without falling into the cliff (blue area) • Falling into to cliff gives 100 penalty points, and the walker has to start over again Cliffwalker program in Matlab

  5. Q-learning algorithm • Matrix, called the Q-matrix • 48 x 4 matrix (12x4 gridworld) x 4 (four directions) • The Q-matrix contains a ”price” for taking a certain action • Initialized randomly in the beginning • The walker has two options: • Take the optimal action, according to smallest Q-value • Explore the gridworld by taking a random step (cannot walk into the wall) • Q-value is updated according to the equation every time the walker takes an action

  6. The new value in the Q-matrix for the previous state and taking the previously taken action will be updated based on: what it was before multiplied by (1-α), plus a factor (alfa) multiplied by the sum of the cost to take a step (usually 1, cliff 100) and another factor (gamma) multiplied by the best action the walker can take (optimal action) Best action New value Previous step Sum of the cost Gamma = reward factor Alfa = learning factor

  7. SARSA-algorithm • Another way of updating the Q-matrix • Not based on the next optimal move, but on the next actual move • Means that it will take into account the risk of falling into the cliff, and will eventually arrive at a safer path •  Longer, but safer path

  8. is based on 3 parameters • learning factor, the higher the faster the walker learns • reward factor, the higher the more reward is give for good actions • exploration factor, a high value leads to more randomness • In the following example these values were used: • α= [0.1], γ=[1], ε=[0.05] The program...

  9. Fig 1) Q-learning, the 100-th walk Fig 2) Q-learning, optimal solution Results Fig 3) SARSA, the 100-th walk Fig 4) SARSA, optimal solution

  10. Random steps over the cliff Results

  11. GA can be applied to the Cliffwalker problem by: • replacing the Reinforcement learning algorithm by GA’s to find the best path in the gridworld, or • finding the best learning parameters for the Reinforcement learning algorithm • The conclusion is that GA’s will probably not improve the results remarkably to Reinforcement learning algortihms. Since it will very soon find out which are the best parameters.. Genetic Algorithms

  12. Reinforcement Learning (pdf), Jonas Waller [2005] • Cliffwalkerprogram, Jonas Waller [2005] • Reinforcement Learning, An Introduction. Sutton and Barto Sources

More Related