t picos especiais em aprendizagem n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Tópicos Especiais em Aprendizagem PowerPoint Presentation
Download Presentation
Tópicos Especiais em Aprendizagem

Loading in 2 Seconds...

play fullscreen
1 / 36

Tópicos Especiais em Aprendizagem - PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on

Tópicos Especiais em Aprendizagem. Prof. Reinaldo Bianchi Centro Universitário da FEI 2012. Objetivo desta Aula. Esta é uma aula mais informativa. Aprendizado por Reforço: Planning and Learning . Relational Reinforcement Learning Uso de Heurísticas para acelerar o AR .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Tópicos Especiais em Aprendizagem' - shen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
t picos especiais em aprendizagem

Tópicos Especiais em Aprendizagem

Prof. Reinaldo Bianchi

Centro Universitário da FEI

2012

objetivo desta aula
Objetivo desta Aula

Esta é uma aula mais informativa

  • Aprendizado por Reforço:
    • Planning and Learning.
    • RelationalReinforcement Learning
    • Uso de Heurísticas para acelerar o AR.
    • DimensionsofRL - Conclusões.
  • Aula de hoje:
    • Capítulos 9 e 10 do Sutton & Barto.
    • Artigo Relational RL, MLJ, de 2001
    • Tese de Doutorado Bianchi.
planning and learning

Planning and Learning

Capítulo 9 do Sutton e Barto

objetivos
Objetivos
  • Use of environment models.
  • Integration of planning and learning methods.
models
Models
  • Model: anything the agent can use to predict how the environment will respond to its actions
    • Distribution model: description of all possibilities and their probabilities
      • e.g.,
    • Sample model: produces sample experiences
      • e.g., a simulation model
  • Both types of models can be used to produce simulated experience
  • Often sample models are much easier to come by
planning
Planning
  • Planning: any computational process that uses a model to create or improve a policy.
  • Planning in AI:
    • state-space planning
    • plan-space planning (e.g., partial-order planner)
planning in rl
Planning in RL
  • We take the following (unusual) view:
    • all state-space planning methods involve computing value functions, either explicitly or implicitly.
    • they all apply backups to simulated experience.
planning cont
Planning Cont.
  • Classical DP methods are state-space planning methods.
  • Heuristic search methods are state-space planning methods.
q learning planning
Q-Learning Planning
  • A planning method based on Q-learning:

Random-Sample One-Step Tabular Q-Planning

learning planning and acting
Learning, Planning, and Acting
  • Two uses of real experience:
    • model learning: to improve the model
    • direct RL: to directly improve the value function and policy
  • Improving value function and/or policy via a model is sometimes called indirect RL or model-based RL. Here,we call it planning.
direct vs indirect rl
Indirect methods:

make fuller use of experience: get better policy with fewer environment interactions

Direct methods

simpler

not affected by bad models

Direct vs. Indirect RL

But they are very closely related and can be usefully combined:

- planning, acting, model learning, and direct RL can occur simultaneously and in parallel

the dyna q algorithm
The Dyna-Q Algorithm

direct RL

model learning

planning

dyna q on a simple maze
Dyna-Q on a Simple Maze

rewards = 0 until goal, when =1

prioritized sweeping
Prioritized Sweeping
  • Which states or state-action pairs should be generated during planning?
  • Work backwards from states whose values have just changed:
    • Maintain a queue of state-action pairs whose values would change a lot if backed up, prioritized by the size of the change
    • When a new backup occurs, insert predecessors according to their priorities
    • Always perform backups from first in queue
  • Moore and Atkeson 1993; Peng and Williams, 1993
prioritized sweeping vs dyna q
Prioritized Sweeping vs. Dyna-Q

Both use N=5 backups per

environmental interaction

summary
Summary
  • Emphasized close relationship between planning and learning
  • Important distinction between distribution models and sample models
  • Looked at some ways to integrate planning and learning
    • synergy among planning, acting, model learning
summary1
Summary
  • Distribution of backups: focus of the computation
    • trajectory sampling: backup along trajectories
    • prioritized sweeping
    • heuristic search
  • Size of backups: full vs. sample; deep vs. shallow
relational reinforcement learning

Relational Reinforcement Learning

Baseadoem aula de TayfunGürel

relational representations
Relational Representations
  • In most applications the state space is too large.
  • Generalization over states is essential
  • Many states are similar for some aspects
  • Representations for RL have to be enriched for generalization
  • RRL was proposed (Dzeroski, DeRaedt, Blockeel 1998)
blocks world
Blocks World

An action: move(a,b)

precondition:

clear(a)  S

clear(b)  S

s1={clear(b), clear(a), on(b,c), on(c,floor),

on(a,floor) }

relational representations1
Relational Representations
  • With Relational Representations for states:
    • Abstraction from details

q_value(0.72) = goal_unstack , numberofblocks(A) ,

action_move(B,C), height(D,E), E=2, on(C,D),!.

  • Flexible to goal changes
    • Retraining from the beginning is not necessary
  • Transfer of experience to more complex domains
relational reinforcement learning1
Relational Reinforcement Learning
  • How does it work?
  • An integration of RF with ILP:
  • Do forever:
    • Use Q-learning to generate sample Q values for sample states action pairs
    • Generalize them using ILP (in this case TILDE)
tilde top down induction of logical decision trees
TILDE (Top Down Induction of LogicalDecision Trees )
  • A generalization of Q values is represented by a logical decision tree
  • Logical Decision Trees:
    • Nodes are First Order Logic atoms (Prolog Queries as Tests )
      • e.g. on (A , c) : is there any block on c
    • Training Data is a relational database or a Prolog knowledge base
logical decision tree vs decision tree
Logical Decision Tree vs. Decision Tree

Decision Tree and Logical Decision Tree

deciding whether blocks are stacked

slide28

TILDE Algorithm

Declarative bias:

e.g. on(+,-)

Background knowledge:

A prolog program:

An example part of the program can be:

slide29

Q-RRL Algorithm

Examples generated by Q-RR learning

experiments
EXPERIMENTS
  • Tested for three different goals:
    • 1. one-stack
    • 2. on(a,b)
    • 3. unstack
  • Tested for the following as well:
    • Fixed number of blocks
    • Number of blocks changed after learning
    • Number of blocks changed while learning
  • P-RRL vs. Q-RRL
results fixed number of blocks
Results: Fixed number of blocks

Accuracy: percentage of correctly classified (s,a) pairs (optimal-non-optimal)

Accuracy of random policies

conclusion
Conclusion
  • RRL has satisfying initial results, needs more research
  • RRL is more successful when number of blocks is increased (generalizes better to more complex domains)
  • Theoretical research proving why it works is still missing