New ties wp2 agent and learning mechanisms
Download
1 / 18

NEW TIES WP2 Agent and learning mechanisms - PowerPoint PPT Presentation


  • 45 Views
  • Uploaded on

NEW TIES WP2 Agent and learning mechanisms. Decision making and learning. Agents have a controller (decision tree, DQT) Input: situation (as perceived = seen/heard/interpr’d Output: action Decision making = using DQT Learning = modifying DQT

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' NEW TIES WP2 Agent and learning mechanisms' - wynne-stanton


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
New ties wp2 agent and learning mechanisms

NEW TIES WP2 Agent and learning mechanisms


Decision making and learning
Decision making and learning

  • Agents have a controller (decision tree, DQT)

    • Input: situation (as perceived = seen/heard/interpr’d

    • Output: action

  • Decision making = using DQT

  • Learning = modifying DQT

  • Decisions also depend on inheritable “attitude genes” (learned through evolution)


Example of a dqt

B

0.5

0.5

VISUAL:

FRONT

FOOD

REACHABLE

BAG:

FOOD

T

T

NO

YES

YES

NO

A

A

A

1.0

0.6

0.2

0.2

EAT

MOVE

TURN

LEFT

TURN

RIGHT

A

1.0

0.6

0.2

0.2

PICKUP

MOVE

TURN

LEFT

TURN

RIGHT

Legend

B

Bias

T

Test

A

Action

Decision

Genetic bias

Boolean choice

0.2

YES

Example of a DQT


Interaction evolution individual learning
Interaction evolution & individual learning

  • Bias node with n children each with bias bi

  • Bias ≠ probability

    • Bias bi is learned, changing (name: learned bias)

    • Genetic bias gi is inherited, part of genome, constant

  • Actual probability of choosing child x:

    p(b,g) = b + (1 - b) ∙ g

  • Learned and inherited behaviour are linked through formula


Dqt nodes parameters cont d
DQT nodes & parameters cont’d

  • Test node language: native concepts + emerging concepts

  • Native: see_agent, see_mother, see_food, have_food, see_mate, …

  • New concepts can emerge by categorisation (discrimination game)


Learning the heart of the emergence engine
Learning: the heart of the emergence engine

  • Evolutionary learning:

    • not within an agent (not during lifetime), over generations

    • by variation + selection

  • Individual learning:

    • within one agent, during lifetime

    • by reinforcement learning

  • Social learning:

    • during lifetime, in interacting agents

    • by sending/receiving + adopting knowledge pieces


Types of learning properties
Types of learning: properties

  • Evolutionary learning:

    • Agent does not create new knowledge during lifetime

    • Basic DQTree + genetic biases are inheritable

    • “knowledge creator” = crossover and mutation

  • Individual learning:

    • Agent does create new knowledge during lifetime

    • DQTree + learned biases are modified

    • “knowledge creator” = reinforcement learning (driven by rewards)

    • Individually learnt knowledge dies with its host agent

  • Social learning:

    • Agent imports knowledge already created elsewhere (new? not new?)

    • Adoption of imported knowledge ≈ crossover

    • Importing knowledge pieces

      • can save effort for recipient

      • can create novel combinations

    • Exporting knowledge helps its preservation after death of host


Present status of types of learning
Present status of types of learning

  • Evolutionary learning:

    • Demonstrated in 2 NT scenarios

    • Autonomous selection/reproduction causes problems with population stability (im/explosion)

  • Individual learning:

    •  code, but never demonstrated in NT scenarios

  • Social learning:

    • Under construction/design based on the “telepathy” approach

    • Communication protocols + adoption mechanisms needed


Evolution variation operators
Evolution: variation operators

  • Operators for DQT:

    • Crossover = subtree swap

    • Mutation =

      • Substitute subtree with random sub-tree

      • Change concepts in test nodes

      • Change bias on an edge

  • Operators for attitude genes:

    • Crossover = full arithmetic xover

    • Mutation =

      • Add Gaussian noise

      • Replace with random value


Evolution selection operators
Evolution: selection operators

  • Mate selection:

    • Mate action chosen by DQT

    • Propose – accept proposal

    • Adulthood OK

  • Survivor selection:

    • Dead if too old ( ≥ 80 years)

    • Dead if zero energy


Experiment simple world setup environment
Experiment: Simple worldSetup: Environment

  • World size: 200 x 200 grid cells

  • Agents and food (no tokens, roads, etc). Both are variable in number.

  • Initial distribution of agents (500): in upper left corner

  • Initial distribution of food (10000): 5000 in upper left and lower right corner.


Experiment simple world setup agents
Experiment: Simple worldSetup: Agents

  • Native knowledge (concepts and DQT sub trees)

    • Navigating (random walk)

    • Eating (identify, pickup and eat plants)

    • Mating (identify mates, propose/agree)

  • Random DQT-tree branches

    • Differs per agent

    • Based on the “pool” of native concepts


  • Experiment simple world
    Experiment: Simple world

    Simulation continued for 3 months real time to test stability


    Experiment poisonous food setup environment
    Experiment: Poisonous FoodSetup: Environment

    • Two types of food: poisonous (decreases energy) and edible (increases energy)

    • World size: 200 x 200 grid cells

    • Agents and food (no tokens, roads, etc). Both are variable in number.

    • Initial distribution of agents (500): uniform random over the grid space.

    • Initial distribution of food (10000): 5000 of each type of food uniform random over the same grid space as the agents.


    Experiment poisonous food setup agent
    Experiment: Poisonous FoodSetup: Agent

    • Native knowledge

      • Identical to simple world experiment

  • Additional native knowledge

    • Can distinguish poisonous from edible plants

    • Relation with eating/picking up is not present

  • No random DQT-tree branches


  • Experiment poisonous food measures
    Experiment: Poisonous FoodMeasures

    • Population size

    • Welfare (energy)

    • Number of poisonous and edible plants

    • Complexity of controller (nr. of nodes)

    • Age




    ad