- By
**zinna** - Follow User

- 150 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Reinforcement Learning' - zinna

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Outline

- Motivation
- Applications
- Markov Decision Processes
- Q-learning
- Examples

Reinforcement Learning: The Idea

- A way of programming agents by reward and punishment without specifying how the task is to be achieved

Angle of handle bars

Angular velocity of handle bars

Angle of bicycle to vertical

Angular velocity of bicycle to vertical

Acceleration of angle of bicycle to vertical

Learning to Ride a BicycleTorque to be applied to the handle bars

Displacement of the center of mass from the bicycle’s plan (in cm)

Learning to Ride a BicycleReinforcement Learning: Applications

- Board Games
- TD-Gammon program, based on reinforcement learning, has become a world-class backgammon player

- Mobile Robot Controlling
- Learning to Drive a Bicycle
- Navigation
- Pole-balancing
- Acrobot

- Sequential Process Controlling
- Elevator Dispatching

Key Features of Reinforcement Learning

- Learner is not told which actions to take
- Trial and error search
- Possibility of delayed reward:
- Sacrifice of short-term gains for greater long-term gains

- Explore/Exploit trade-off
- Considers the whole problem of a goal-directed agent interacting with an uncertain environment

The Agent-Environment Interaction

- Agent and environment interact at discrete time steps: t = 0,1, 2, …
- Agent observes state at step t : st2 S
- produces action at step t: at2A
- gets resulting reward : rt +12 ℜ
- and resulting next state: st +12 S

The Agent’s Goal:

- Coarsely, the agent’s goal is to get as much reward as it
can over the long run

Policy is

- a mapping from states to action (s) = a
- Reinforcement learning methods specify how the agent changes its policy as a result of experience experience

Methods

Model (reward function and transition

probabilities) is known

Model (reward function or transition

probabilities) is unknown

discrete states

continuous

states

discrete states

continuous

states

Dynamic

Programming

Value

Function

Approximation

+

Dynamic

Programming

Reinforcement

Learning,

Monte Carlo

Methods

Valuation

Function

Approximation

+

Reinforcement

Learning

Blackjack

- Standard rules of blackjack hold
- State space:
- element[0] - current value of player's hand (4-21)
- element[1] - value of dealer's face-up card (2-11)
- element[2] - player does not have usable ace (0/1)

- Starting states:
- player has any 2 cards (uniformly distributed), dealer has any 1 card (uniformly distributed)

- Actions:
- HIT
- STICK

- Rewards:
- 1 for a loss
- 0 for a draw
- 1 for a win

Grids

Actions

Left

Up

Right

Down

Rewards

Bonus 20

Food 1

Predator -10

Empty grid -0.1

Transition probabilities

0.80 – agent goes where he intends to go

0.20 – to any other adjacent grid or remains where it was (in case he is on the board of the grid world he goes to the other side)

Reinforcement Learning: Example
Download Presentation

Connecting to Server..