t picos especiais em aprendizagem n.
Skip this Video
Loading SlideShow in 5 Seconds..
Tópicos Especiais em Aprendizagem PowerPoint Presentation
Download Presentation
Tópicos Especiais em Aprendizagem

Loading in 2 Seconds...

play fullscreen
1 / 36

Tópicos Especiais em Aprendizagem - PowerPoint PPT Presentation

  • Uploaded on

Tópicos Especiais em Aprendizagem. Prof. Reinaldo Bianchi Centro Universitário da FEI 2012. Objetivo desta Aula. Esta é uma aula mais informativa. Aprendizado por Reforço: Planning and Learning . Relational Reinforcement Learning Uso de Heurísticas para acelerar o AR .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Tópicos Especiais em Aprendizagem' - shen

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
t picos especiais em aprendizagem

Tópicos Especiais em Aprendizagem

Prof. Reinaldo Bianchi

Centro Universitário da FEI


objetivo desta aula
Objetivo desta Aula

Esta é uma aula mais informativa

  • Aprendizado por Reforço:
    • Planning and Learning.
    • RelationalReinforcement Learning
    • Uso de Heurísticas para acelerar o AR.
    • DimensionsofRL - Conclusões.
  • Aula de hoje:
    • Capítulos 9 e 10 do Sutton & Barto.
    • Artigo Relational RL, MLJ, de 2001
    • Tese de Doutorado Bianchi.
planning and learning

Planning and Learning

Capítulo 9 do Sutton e Barto

  • Use of environment models.
  • Integration of planning and learning methods.
  • Model: anything the agent can use to predict how the environment will respond to its actions
    • Distribution model: description of all possibilities and their probabilities
      • e.g.,
    • Sample model: produces sample experiences
      • e.g., a simulation model
  • Both types of models can be used to produce simulated experience
  • Often sample models are much easier to come by
  • Planning: any computational process that uses a model to create or improve a policy.
  • Planning in AI:
    • state-space planning
    • plan-space planning (e.g., partial-order planner)
planning in rl
Planning in RL
  • We take the following (unusual) view:
    • all state-space planning methods involve computing value functions, either explicitly or implicitly.
    • they all apply backups to simulated experience.
planning cont
Planning Cont.
  • Classical DP methods are state-space planning methods.
  • Heuristic search methods are state-space planning methods.
q learning planning
Q-Learning Planning
  • A planning method based on Q-learning:

Random-Sample One-Step Tabular Q-Planning

learning planning and acting
Learning, Planning, and Acting
  • Two uses of real experience:
    • model learning: to improve the model
    • direct RL: to directly improve the value function and policy
  • Improving value function and/or policy via a model is sometimes called indirect RL or model-based RL. Here,we call it planning.
direct vs indirect rl
Indirect methods:

make fuller use of experience: get better policy with fewer environment interactions

Direct methods


not affected by bad models

Direct vs. Indirect RL

But they are very closely related and can be usefully combined:

- planning, acting, model learning, and direct RL can occur simultaneously and in parallel

the dyna q algorithm
The Dyna-Q Algorithm

direct RL

model learning


dyna q on a simple maze
Dyna-Q on a Simple Maze

rewards = 0 until goal, when =1

prioritized sweeping
Prioritized Sweeping
  • Which states or state-action pairs should be generated during planning?
  • Work backwards from states whose values have just changed:
    • Maintain a queue of state-action pairs whose values would change a lot if backed up, prioritized by the size of the change
    • When a new backup occurs, insert predecessors according to their priorities
    • Always perform backups from first in queue
  • Moore and Atkeson 1993; Peng and Williams, 1993
prioritized sweeping vs dyna q
Prioritized Sweeping vs. Dyna-Q

Both use N=5 backups per

environmental interaction

  • Emphasized close relationship between planning and learning
  • Important distinction between distribution models and sample models
  • Looked at some ways to integrate planning and learning
    • synergy among planning, acting, model learning
  • Distribution of backups: focus of the computation
    • trajectory sampling: backup along trajectories
    • prioritized sweeping
    • heuristic search
  • Size of backups: full vs. sample; deep vs. shallow
relational reinforcement learning

Relational Reinforcement Learning

Baseadoem aula de TayfunGürel

relational representations
Relational Representations
  • In most applications the state space is too large.
  • Generalization over states is essential
  • Many states are similar for some aspects
  • Representations for RL have to be enriched for generalization
  • RRL was proposed (Dzeroski, DeRaedt, Blockeel 1998)
blocks world
Blocks World

An action: move(a,b)


clear(a)  S

clear(b)  S

s1={clear(b), clear(a), on(b,c), on(c,floor),

on(a,floor) }

relational representations1
Relational Representations
  • With Relational Representations for states:
    • Abstraction from details

q_value(0.72) = goal_unstack , numberofblocks(A) ,

action_move(B,C), height(D,E), E=2, on(C,D),!.

  • Flexible to goal changes
    • Retraining from the beginning is not necessary
  • Transfer of experience to more complex domains
relational reinforcement learning1
Relational Reinforcement Learning
  • How does it work?
  • An integration of RF with ILP:
  • Do forever:
    • Use Q-learning to generate sample Q values for sample states action pairs
    • Generalize them using ILP (in this case TILDE)
tilde top down induction of logical decision trees
TILDE (Top Down Induction of LogicalDecision Trees )
  • A generalization of Q values is represented by a logical decision tree
  • Logical Decision Trees:
    • Nodes are First Order Logic atoms (Prolog Queries as Tests )
      • e.g. on (A , c) : is there any block on c
    • Training Data is a relational database or a Prolog knowledge base
logical decision tree vs decision tree
Logical Decision Tree vs. Decision Tree

Decision Tree and Logical Decision Tree

deciding whether blocks are stacked


TILDE Algorithm

Declarative bias:

e.g. on(+,-)

Background knowledge:

A prolog program:

An example part of the program can be:


Q-RRL Algorithm

Examples generated by Q-RR learning

  • Tested for three different goals:
    • 1. one-stack
    • 2. on(a,b)
    • 3. unstack
  • Tested for the following as well:
    • Fixed number of blocks
    • Number of blocks changed after learning
    • Number of blocks changed while learning
  • P-RRL vs. Q-RRL
results fixed number of blocks
Results: Fixed number of blocks

Accuracy: percentage of correctly classified (s,a) pairs (optimal-non-optimal)

Accuracy of random policies

  • RRL has satisfying initial results, needs more research
  • RRL is more successful when number of blocks is increased (generalizes better to more complex domains)
  • Theoretical research proving why it works is still missing