Reinforcement learning
This presentation is the property of its rightful owner.
Sponsored Links
1 / 79

Reinforcement Learning PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on
  • Presentation posted in: General

Reinforcement Learning. 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning. Outline. Motivation Applications Markov Decision Processes Q-learning Examples. How to program a robot to ride a bicycle?. Reinforcement Learning: The Idea.

Download Presentation

Reinforcement Learning

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Reinforcement Learning

16 January 2009

RG Knowledge Based Systems

Hans Kleine Büning


Outline

  • Motivation

  • Applications

  • Markov Decision Processes

  • Q-learning

  • Examples


How to program

a robot to ride

a bicycle?


Reinforcement Learning: The Idea

  • A way of programming agents by reward and punishment without specifying how the task is to be achieved


Environment

state

€€€

€€€

action

Learning to Ride a Bicycle


States:

Angle of handle bars

Angular velocity of handle bars

Angle of bicycle to vertical

Angular velocity of bicycle to vertical

Acceleration of angle of bicycle to vertical

Learning to Ride a Bicycle


Environment

state

€€€

€€€

action

Learning to Ride a Bicycle


Actions:

Torque to be applied to the handle bars

Displacement of the center of mass from the bicycle’s plan (in cm)

Learning to Ride a Bicycle


Environment

state

€€€

€€€

action

Learning to Ride a Bicycle


Angle of bicycle to vertical is greater than 12°

no

yes

Reward = -1

Reward = 0


Learning To Ride a Bicycle

Reinforcement

Learning


Reinforcement Learning: Applications

  • Board Games

    • TD-Gammon program, based on reinforcement learning, has become a world-class backgammon player

  • Mobile Robot Controlling

    • Learning to Drive a Bicycle

    • Navigation

    • Pole-balancing

    • Acrobot

  • Sequential Process Controlling

    • Elevator Dispatching


Key Features of Reinforcement Learning

  • Learner is not told which actions to take

  • Trial and error search

  • Possibility of delayed reward:

    • Sacrifice of short-term gains for greater long-term gains

  • Explore/Exploit trade-off

  • Considers the whole problem of a goal-directed agent interacting with an uncertain environment


The Agent-Environment Interaction

  • Agent and environment interact at discrete time steps: t = 0,1, 2, …

    • Agent observes state at step t : st2 S

    • produces action at step t: at2A

    • gets resulting reward : rt +12 ℜ

    • and resulting next state: st +12 S


The Agent’s Goal:

  • Coarsely, the agent’s goal is to get as much reward as it

    can over the long run

    Policy  is

  • a mapping from states to action (s) = a

  • Reinforcement learning methods specify how the agent changes its policy as a result of experience experience


Deterministic Markov Decision Process


Example


Example: Corresponding MDP


Example: Corresponding MDP


Example: Corresponding MDP


Example: Policy


Value of Policy and Rewards


Value of Policy and Agent’s Task


Nondeterministic Markov Decision Process

P = 0.8

P = 0.1

P = 0.1


Nondeterministic Markov Decision Process


Nondeterministic Markov Decision Process


Example with South-Easten Wind


Example with South-Easten Wind


Methods

Model (reward function and transition

probabilities) is known

Model (reward function or transition

probabilities) is unknown

discrete states

continuous

states

discrete states

continuous

states

Dynamic

Programming

Value

Function

Approximation

+

Dynamic

Programming

Reinforcement

Learning,

Monte Carlo

Methods

Valuation

Function

Approximation

+

Reinforcement

Learning


Q-learning Algorithm


Q-learning Algorithm


Example


Example: Q-table Initialization


Example: Episode 1


Example: Episode 1


Example: Episode 1


Example: Episode 1


Example: Episode 1


Example: Q-table


Example: Episode 1


Episode 1


Example: Q-table


Example: Episode 2


Example: Episode 2


Example: Episode 2


Example: Q-table after Convergence


Example: Value Function after Convergence


Example: Optimal Policy


Example: Optimal Policy


Q-learning


Convergence of Q-learning


Blackjack

  • Standard rules of blackjack hold

  • State space:

    • element[0] - current value of player's hand (4-21)

    • element[1] - value of dealer's face­-up card (2-11)

    • element[2] - player does not have usable ace (0/1)

  • Starting states:

    • player has any 2 cards (uniformly distributed), dealer has any 1 card (uniformly distributed)

  • Actions:

    • HIT

    • STICK

  • Rewards:

    • ­1 for a loss

    • 0 for a draw

    • 1 for a win


Blackjack: Optimal Policy


States

Grids

Actions

Left

Up

Right

Down

Rewards

Bonus 20

Food 1

Predator -10

Empty grid -0.1

Transition probabilities

0.80 – agent goes where he intends to go

0.20 – to any other adjacent grid or remains where it was (in case he is on the board of the grid world he goes to the other side)

Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


Reinforcement Learning: Example


  • Login