Reinforcement learning
This presentation is the property of its rightful owner.
Sponsored Links
1 / 79

Reinforcement Learning PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on
  • Presentation posted in: General

Reinforcement Learning. 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning. Outline. Motivation Applications Markov Decision Processes Q-learning Examples. How to program a robot to ride a bicycle?. Reinforcement Learning: The Idea.

Download Presentation

Reinforcement Learning

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Reinforcement learning

Reinforcement Learning

16 January 2009

RG Knowledge Based Systems

Hans Kleine Büning


Outline

Outline

  • Motivation

  • Applications

  • Markov Decision Processes

  • Q-learning

  • Examples


Reinforcement learning

How to program

a robot to ride

a bicycle?


Reinforcement learning the idea

Reinforcement Learning: The Idea

  • A way of programming agents by reward and punishment without specifying how the task is to be achieved


Learning to ride a bicycle

Environment

state

€€€

€€€

action

Learning to Ride a Bicycle


Learning to ride a bicycle1

States:

Angle of handle bars

Angular velocity of handle bars

Angle of bicycle to vertical

Angular velocity of bicycle to vertical

Acceleration of angle of bicycle to vertical

Learning to Ride a Bicycle


Learning to ride a bicycle2

Environment

state

€€€

€€€

action

Learning to Ride a Bicycle


Learning to ride a bicycle3

Actions:

Torque to be applied to the handle bars

Displacement of the center of mass from the bicycle’s plan (in cm)

Learning to Ride a Bicycle


Learning to ride a bicycle4

Environment

state

€€€

€€€

action

Learning to Ride a Bicycle


Reinforcement learning

Angle of bicycle to vertical is greater than 12°

no

yes

Reward = -1

Reward = 0


Learning to ride a bicycle5

Learning To Ride a Bicycle

Reinforcement

Learning


Reinforcement learning applications

Reinforcement Learning: Applications

  • Board Games

    • TD-Gammon program, based on reinforcement learning, has become a world-class backgammon player

  • Mobile Robot Controlling

    • Learning to Drive a Bicycle

    • Navigation

    • Pole-balancing

    • Acrobot

  • Sequential Process Controlling

    • Elevator Dispatching


Key features of reinforcement learning

Key Features of Reinforcement Learning

  • Learner is not told which actions to take

  • Trial and error search

  • Possibility of delayed reward:

    • Sacrifice of short-term gains for greater long-term gains

  • Explore/Exploit trade-off

  • Considers the whole problem of a goal-directed agent interacting with an uncertain environment


The agent environment interaction

The Agent-Environment Interaction

  • Agent and environment interact at discrete time steps: t = 0,1, 2, …

    • Agent observes state at step t : st2 S

    • produces action at step t: at2A

    • gets resulting reward : rt +12 ℜ

    • and resulting next state: st +12 S


The agent s goal

The Agent’s Goal:

  • Coarsely, the agent’s goal is to get as much reward as it

    can over the long run

    Policy  is

  • a mapping from states to action (s) = a

  • Reinforcement learning methods specify how the agent changes its policy as a result of experience experience


Deterministic markov decision process

Deterministic Markov Decision Process


Example

Example


Example corresponding mdp

Example: Corresponding MDP


Example corresponding mdp1

Example: Corresponding MDP


Example corresponding mdp2

Example: Corresponding MDP


Example policy

Example: Policy


Value of policy and rewards

Value of Policy and Rewards


Value of policy and agent s task

Value of Policy and Agent’s Task


Nondeterministic markov decision process

Nondeterministic Markov Decision Process

P = 0.8

P = 0.1

P = 0.1


Nondeterministic markov decision process1

Nondeterministic Markov Decision Process


Nondeterministic markov decision process2

Nondeterministic Markov Decision Process


Example with south easten wind

Example with South-Easten Wind


Example with south easten wind1

Example with South-Easten Wind


Methods

Methods

Model (reward function and transition

probabilities) is known

Model (reward function or transition

probabilities) is unknown

discrete states

continuous

states

discrete states

continuous

states

Dynamic

Programming

Value

Function

Approximation

+

Dynamic

Programming

Reinforcement

Learning,

Monte Carlo

Methods

Valuation

Function

Approximation

+

Reinforcement

Learning


Q learning algorithm

Q-learning Algorithm


Q learning algorithm1

Q-learning Algorithm


Example1

Example


Example q table initialization

Example: Q-table Initialization


Example episode 1

Example: Episode 1


Example episode 11

Example: Episode 1


Example episode 12

Example: Episode 1


Example episode 13

Example: Episode 1


Example episode 14

Example: Episode 1


Example q table

Example: Q-table


Example episode 15

Example: Episode 1


Episode 1

Episode 1


Example q table1

Example: Q-table


Example episode 2

Example: Episode 2


Example episode 21

Example: Episode 2


Example episode 22

Example: Episode 2


Example q table after convergence

Example: Q-table after Convergence


Example value function after convergence

Example: Value Function after Convergence


Example optimal policy

Example: Optimal Policy


Example optimal policy1

Example: Optimal Policy


Q learning

Q-learning


Convergence of q learning

Convergence of Q-learning


Blackjack

Blackjack

  • Standard rules of blackjack hold

  • State space:

    • element[0] - current value of player's hand (4-21)

    • element[1] - value of dealer's face­-up card (2-11)

    • element[2] - player does not have usable ace (0/1)

  • Starting states:

    • player has any 2 cards (uniformly distributed), dealer has any 1 card (uniformly distributed)

  • Actions:

    • HIT

    • STICK

  • Rewards:

    • ­1 for a loss

    • 0 for a draw

    • 1 for a win


Blackjack optimal policy

Blackjack: Optimal Policy


Reinforcement learning example

States

Grids

Actions

Left

Up

Right

Down

Rewards

Bonus 20

Food 1

Predator -10

Empty grid -0.1

Transition probabilities

0.80 – agent goes where he intends to go

0.20 – to any other adjacent grid or remains where it was (in case he is on the board of the grid world he goes to the other side)

Reinforcement Learning: Example


Reinforcement learning example1

Reinforcement Learning: Example


Reinforcement learning example2

Reinforcement Learning: Example


Reinforcement learning example3

Reinforcement Learning: Example


Reinforcement learning example4

Reinforcement Learning: Example


Reinforcement learning example5

Reinforcement Learning: Example


Reinforcement learning example6

Reinforcement Learning: Example


Reinforcement learning example7

Reinforcement Learning: Example


Reinforcement learning example8

Reinforcement Learning: Example


Reinforcement learning example9

Reinforcement Learning: Example


Reinforcement learning example10

Reinforcement Learning: Example


Reinforcement learning example11

Reinforcement Learning: Example


Reinforcement learning example12

Reinforcement Learning: Example


Reinforcement learning example13

Reinforcement Learning: Example


Reinforcement learning example14

Reinforcement Learning: Example


Reinforcement learning example15

Reinforcement Learning: Example


Reinforcement learning example16

Reinforcement Learning: Example


Reinforcement learning example17

Reinforcement Learning: Example


Reinforcement learning example18

Reinforcement Learning: Example


Reinforcement learning example19

Reinforcement Learning: Example


Reinforcement learning example20

Reinforcement Learning: Example


Reinforcement learning example21

Reinforcement Learning: Example


Reinforcement learning example22

Reinforcement Learning: Example


Reinforcement learning example23

Reinforcement Learning: Example


Reinforcement learning example24

Reinforcement Learning: Example


Reinforcement learning example25

Reinforcement Learning: Example


  • Login