1 / 15

Reinforcement Learning [Intro]

Reinforcement Learning [Intro]. Marco Loog. Introduction. How can an agent learn if there is no teacher around who tells it with every action what’s right and what’s wrong?

jabir
Download Presentation

Reinforcement Learning [Intro]

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reinforcement Learning[Intro] Marco Loog

  2. Introduction • How can an agent learn if there is no teacher around who tells it with every action what’s right and what’s wrong? • E.g., an agent can learn how to play chess by supervised learning, given that examples of states and their correct actions are provided • But what if these examples are not available?

  3. Introduction • But what if these examples are not available? • Through random moves, i.e., exploratory behavior, agent may be able to infer knowledge about the environment it is in • But what is good and what is bad? = necessary knowledge to decide what to do in order to reach its goal

  4. Introduction • But what is good and what is bad? = necessary knowledge to decide what to do in order to reach its goal • ‘Rewarding’ the agent when it did something good and ‘punishing’ it when it did something bad is called reinforcement • Task of reinforcement learning is to use observed rewards to learn a [best] policy for the environment

  5. E.g. [D. Terzopoulos et al.]

  6. E.g. [T. Streeter]

  7. E.g. [K. Sims]

  8. Reinforcement Learning • Use observed rewards to learn an [almost?] optimal policy for an environment • Reward R(s) assigns to every state s a number • Utility of an environment history is [as an example] the sum of the rewards received • Policy describes agent’s action from any state s in order to reach the goal • Optimal policy is policy with highest expected utility

  9. Rewards, Utilities, &c. +1 -1

  10. Rewards, Utilities, &c. +1 -1

  11. Reinforcement Learning • How to learn a policy like the previous one? • Complicating factors • Normally, both the environment and the reward function are unknown • In many complex domains reinforcement learning is the only feasible way to success

  12. Reinforcement Learning • Might be considered to encompass all of AI : an agent is dropped off somewhere and it should itself figure everything out • We will concentrate on simple settings and agent designs to keep things manageable • E.g. fully observable environment

  13. 3 Agent Designs • Utility-based agents : learns a utility function based on which it chooses actions • Q-learning agent : learns an action value function given the expected utility of taking a given action in a given state • Reflex agent : learns a policy that maps directly from states to actions

  14. More • Next week...

More Related