Machine learning and ilp for multi agent systems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 113

Machine Learning and ILP for Multi-Agent Systems PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Machine Learning and ILP for Multi-Agent Systems. Daniel Kudenko & Dimitar Kazakov Department of Computer Science University of York, UK. ACAI-01, Prague, July 2001. Why Learning Agents?. Agent designers are not able to foresee all situations that the agent will encounter.

Download Presentation

Machine Learning and ILP for Multi-Agent Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Machine Learning and ILP for Multi-Agent Systems

Daniel Kudenko & Dimitar Kazakov

Department of Computer Science

University of York, UK

ACAI-01, Prague, July 2001

Why Learning Agents?

  • Agent designers are not able to foresee all situations that the agent will encounter.

  • To display full autonomy Agents need to learn from and adapt to novel environments.

  • Learning is a crucial part of intelligence.

A Brief History





Machine Learning

















  • Principles of Machine Learning (ML)

  • ML for Single Agents

  • ML for Multi-Agent Systems

  • Inductive Logic Programming for Agents

What is Machine Learning?

  • Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. [Mitchell 97]

  • Example: T = “play tennis”, E = “playing matches”, P = “score”

Types of Learning

  • Inductive Learning (Supervised Learning)

  • Reinforcement Learning

  • Discovery (Unsupervised Learning)

Inductive Learning

[An inductive learning] system aims at determining a description of a given concept from a set of concept examples provided by the teacher and from background knowledge. [Michalski et al. 98]

Inductive Learning

Examples of

Category C1

Examples of

Category C2

Examples of

Category Cn

Inductive Learning



(Procedure to Classify

New Examples)

Inductive Learning Example

Ammo: low

Monster: near

Light: good


Ammo: low

Monster: far

Light: medium

Category: ¬shoot

Ammo: high

Monster: far

Light: good


Inductive Learning


If (Ammo = high) and

(light {medium, good})

then shoot;


Performance Measure

  • Classification accuracy on unseen test set.

  • Alternatively: measure that incorporates cost of false-positives and false-negatives (e.g. recall/precision).

Where’s the knowledge?

  • Example (or Object) language

  • Hypothesis (or Concept) language

  • Learning bias

  • Background knowledge

Example Language

  • Feature-value vectors, logic programs.

  • Which features are used to represent examples (e.g., ammunition left)?

  • For agents: which features of the environment are fed to the agent (or the learning module)?

  • Constructive Induction: automatic feature selection, construction, and generation.

Hypothesis Language

  • Decision trees, neural networks, logic programs, …

  • Further restrictions may be imposed, e.g., depth of decision trees, form of clauses.

  • Choice of hypothesis language influences choice of learning methods and vice versa.

Learning bias

  • Preference relation between legal hypotheses.

  • Accuracy on training set.

  • Hypothesis with zero error on training data is not necessarily the best (noise!).

  • Occam’s razor: the simpler hypothesis is the better one.

Inductive Learning

  • No “real” learning without language or learning bias.

  • IL is search through space of hypotheses guided by bias.

  • Quality of hypothesis depends on proper distribution of training examples.

Inductive Learning for Agents

  • What is the target concept (i.e., categories)?

  • Example: do(a), ¬do(a) for specific action a.

  • Real-valued categories/actions can be discretized.

  • Where does the training data come from and what form does it take?

Batch vs Incremental Learning

  • Batch Learning: collect a set of training examples and compute hypothesis.

  • Incremental Learning: update hypothesis with each new training example.

  • Incremental learning more suited for agents.

Batch Learning for Agents

  • When should (re-)computation of hypothesis take place?

  • Example: after experienced accuracy of hypothesis drops below threshold.

  • Which training examples should be used?

  • Example: sequences of actions that led to success.

Eager vs. Lazy learning

  • Eager learning: commit to hypothesis computed after training.

  • Lazy learning: store all encountered examples and perform classification based on this database (e.g. nearest neighbour).

Active Learning

  • Learner decides which training data to receive (i.e. generates training examples and uses oracle to classify them).

  • Closed Loop ML: learner suggests hypothesis and verifies it experimentally. If hypothesis is rejected, the collected data gives rise to a new hypothesis.

Black-Box vs. White-Box

  • Black-Box Learning: Interpretation of the learning result is unclear to a user.

  • White-Box Learning: Creates (symbolic) structures that are comprehensible.

Reinforcement Learning

  • Agent learns from environmental feedback indicating the benefit of states.

  • No explicit teacher required.

  • Learning target: optimal policy (i.e., state-action mapping)

  • Optimality measure: e.g., cumulative discounted reward.

Q Learning

Value of a state: discounted cumulative reward

V(st) = i  0ir(st+i,at+i)

0   < 1 is a discount factor ( = 0 means that only immediate reward is considered).

r(st+i ,at+i) is the reward determined by performing actions specified by policy .

Q(s,a) = r(s,a) + V*((s,a))

Optimal Policy:

*(s) = argmaxa Q(s,a)

Q Learning

Initialize all Q(s,a) to 0

In some state s choose some action a. Let s’ be the resulting state.

Update Q:

Q(s,a) = r +  maxa’ Q(s’,a’)

Q Learning

  • Guaranteed convergence towards optimum (state-action pairs have to be visited infinitely often).

  • Exploration strategy can speed up convergence.

  • Basic Q Learning does not generalize: replace state-action table with function approximation (e.g. neural net) in order to handle unseen states.

Pros and Cons of RL

  • Clearly suited to agents acting and exploring an environment.

  • Simple.

  • Engineering of suitable reward function may be tricky.

  • May take a long time to converge.

  • Learning result may be not transparent (depending on representation of Q function).

Combination of IL and RL

  • Relational reinforcement learning [Dzeroski et al. 98]: leads to more general Q function representation that may still be applicable even if the goals or environment change.

  • Explanation-based learning and RL [Dietterich and Flann, 95].

  • More ILP and RL: see later.

Unsupervised Learning

  • Acquisition of “useful” or “interesting” patterns in input data.

  • Usefulness and interestingness are based on agent’s internal bias.

  • Agent does not receive any external feedback.

  • Discovered concepts are expected to improve agent performance on future tasks.

Learning and Verification

  • Need to guarantee agent safety.

  • Pre-deployment verification for non-learning agents.

  • What to do with learning agents?

Learning and Verification[Gordon ’00]

  • Verification after each self-modification step.

  • Problem: Time-consuming.

  • Solution 1: use property-preserving learning operators.

  • Solution 2: use learning operators which permit quick (partial) re-verification.

Learning and Verification

What to do if verification fails?

  • Repair (multi)-agent plan.

  • Choose different learning operator.

Learning in Multi-Agent Systems

  • Classification

  • Social Awareness.

  • Communication

  • Role Learning.

  • Distributed Learning.

Types of Multi-Agent Learning[Weiss & Dillenbourg 99]

  • Multiplied Learning: No interference in the learning process by other agents (except for exchange of training data or outputs).

  • Divided Learning: Division of learning task on functional level.

  • Interacting Learning: cooperation beyond the pure exchange of data.

Social Awareness

  • Awareness of existence of other agents and (eventually) knowledge about their behavior.

  • Not necessary to achieve near optimal MAS behavior: rock sample collection [Steels 89].

  • Can it degrade performance?

Levels of Social Awareness [Vidal&Durfee 97]

  • 0-level agent: no knowledge about existence of other agents.

  • 1-level agent: recognizes that other agents exist, model other agents as 0-level.

  • 2-level agent: has some knowledge about behavior of other agents and their behavior; model other agents as 1-level agents.

  • k-level agent: model other agents as (k-1)-level.

Social Awareness and Q Learning

  • 0-level agents already learn implicitly about other agents.

  • [Mundhe and Sen, 00]: study of two Q learning agents up to level 2.

  • Two 1-level agents display slowest and least effective learning (worse than two 0-level agents).

Agent models and Q Learning

  • Q: S  An R, where n is the number of agents.

  • If other agent’s actions are not observable, need assumption for actions of other agents.

  • Pessimistic assumption: given an agent’s action choice other agents will minimize reward.

  • Optimistic assumption: other agents will maximize reward.

Agent Models and Q Learning

  • Pessimistic Assumption leads to overly cautious behavior.

  • Optimistic Assumption guarantees convergence towards optimum [Lauer & Riedmiller ‘00].

  • If knowledge of other agent’s behavior available, Q value update can be based on probabilistic computation [Claus and Boutilier ‘98]. But: no guarantee of optimality.

Q Learning and Communication[Tan 93]

Types of communication:

  • Sharing sensation

  • Sharing or merging policies

  • Sharing episodes


  • Communication generally helps

  • Extra sensory information may hurt

Role Learning

  • Often useful for agents to specialize in specific roles for joint tasks.

  • Pre-defined roles: reduce flexibility, often not easy to define optimal distribution, may be expensive.

  • How to learn roles?

  • [Prasad et al. 96]: learn optimal distribution of pre-defined roles.

Q Learning of roles

  • [Crites&Barto 98]: elevator domain; regular Q learning; no specialization achieved (but highly efficient behavior).

  • [Ono&Fukumoto 96]: Hunter-Prey domain, specialization achieved with greatest mass merging strategy.

Q Learning of Roles [Balch 99]

  • Three types of reward function: local performance-based, local shaped, global.

  • Global reward supports specialization.

  • Local reward supports emergence of homogeneous behaviors.

  • Some domains benefit from learning team heterogeneity (e.g., robotic soccer), others do not (e.g., multi-robot foraging).

  • Heterogeneity measure: social entropy.

Distributed Learning

  • Motivation: Agents learning a global hypothesis from local observations.

  • Application of MAS techniques to (inductive) learning.

  • Applications: Distributed Data Mining [Provost & Kolluri ‘99], Robotic Soccer.

Distributed Data Mining

  • [Provost& Hennessy 96]: Individual learners see only subset of all training examples and compute a set of local rules based on these.

  • Local rules are evaluated by other learners based on their data.

  • Only rules with good evaluation are carried over to the global hypothesis.


[Mitchell 97] T. Mitchell. Machine Learning. McGraw Hill, 1997.

[Michalski et al. 98] R.S. Michalski, I. Bratko, M. Kubat. Machine Learning and Data Mining: Methods and Applications. Wiley, 1998.

[Dietterich&Flann 95] T. Dietterich and N.Flann. Explanation-based Learning and Reinforcement Learning. In Proceedings of the Twelfth International Conference on Machine Learning, 1995.

[Dzeroski et al. 98] S. Dzeroski, L. DeRaedt, and H. Blockeel. Relational Reinforcement Learning. In: Proceedings of the Eighth International Conference on Inductive Logic Programming ILP-98. Springer, 1998.

[Gordon 00] D. Gordon: Asimovian Adaptive Agents. Journal of Artificial Intelligence Research, 13, 2000.

[Weiss & Dilelnbourg 99] G. Weiss and P. Dillenbourg. What is ‘Multi’ in Multi-Agent Learning? In P. Dillenbourg (ed.), Collaborative Learning. Cognitive and Computational Approaches. Pergamon Press, 1999.

[Vidal & Durfee 97] J.M. Vidal and E. Durfee. Agents Learning about Agents: A Framework and Analysis. In Working Notes of the AAAI-97 workshop on Multiagent Learning, 1997.

[Mundhe & Sen 00] M. Mundhe and S. Sen. Evaluating Concurrent Reinforcement Learners. Proceedings of the Fourth International Conference on Multiagent Systems, IEEE Press, 2000.

[Claus & Boutillier 98] C. Claus and C. Boutillier. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems. AAAI 98.

[Lauer & Riedmiller 00] M. Lauer and M. Riedmiller. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems. In Proceedings of the Seventeenth International Conference in Machine Learning, 2000.


[Tan 93] M. Tan. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In: Proceedings of the Tenth International Conference on Machine Learning, 1993.

[Prasad et al. 96] M.V.N. Prasad, S.E. Lander and V.R. Lesser. Learning Organizational Roles for Negotiated Search. International Journal of Human-Computer Studies, 48(1), 1996.

[Ono & Fukomoto 96] N. Ono and K. Fukomoto. A Modular Approach to Multi-Agent Reinforcement Learning. Proceedings of the First International Conference on Multi-Agent Systems, 1996.

[Crites & Barto 98] R. Crites and A. Barto. Elevator Group Control Using Multiple Reinforcement Learning Agents. Machine Learning, 1998.

[Balch 99] T. Balch. Reward and Diversity in Multi-Robot Foraging. Proceedings of the IJCAI-99 Workshop on Agents Learning About, From, and With other Agents, 1999.

[Provost & Kolluri 99] F. Provost and V. Kolluri. "A Survey of Methods for Scaling Up Inductive Algorithms." Data Mining and Knowledge Discovery3, 1999.

[Provost & Hennessy 96] F. Provost and D. Hennessy. Scaling up: Distributed Machine Learning with Cooperation. AAAI 96, 1996.


Machine Learning and ILP for MAS: Part II

Integration of ML and Agents

ILP and its potential for MAS

Agent Applications of ILP

Learning, Natural Selection and Language

Machine Learning and ILP for MAS: Part II

Integration of ML and Agents

ILP and its potential for MAS

Agent Applications of ILP

Learning, Natural Selection and Language

From Machine Learning to Learning Agents

Machine Learning: Learning as the only goal

Classic Machine Learning

Active Learning

Closed Loop Machine Learning

Learning as one of many goals: Learning Agent(s)

Integrating Machine Learning into the Agent Architecture

  • Time constraints on learning

  • Synchronisation between agents’ actions

  • Learning and Recall

Time Constraints on Learning

  • Machine Learning alone:

    • predictive accuracy matters, time doesn’t (just a price to pay)

  • ML in Agents

    • Soft deadlines: resources must be shared with other activities (perception, planning, control)

    • Hard deadlines: imposed by environment: Make up your mind now! (or they’ll eat you)

Doing Eager vs. Lazy Learning under Time Pressure

  • Eager Learning

    • Theories typically more compact…

    • …and faster to use

    • Takes more time to learn – do it when the agent is idle

  • Lazy Learning

    • Knowledge acquired at (almost) no cost

    • May be much slower when a test example comes

“Clear-cut” vs. Any-time Learning

Consider two types of algorithms:

  • Running a prescribed number of steps guarantees finding a solution

    • can use worst case complexity analysis to find an upper bound on the execution time

  • Any-time algorithms

    • a longer run may result in a better solution

    • don’t know an optimal solution when they see one

    • example: Genetic Algorithms

    • policies: halt learning to meet hard deadlines or when cost outweighs expected improvements of accuracy

Time Constraints on Learning in Simulated Environments

  • Consider various cases:

    • Unlimited time for learning

    • Upper bound on time for learning

    • Learning in real time

  • Gradually tightening the constraints makes integration easier

  • Not limited to simulations: real-world problems have similar setting

    • e.g., various types of auctions

Synchronisation  Time Constraints

Learning and Recall

Agent must strike a balance between:

  • Learning, which updates the model of the world

  • Recall, which applies existing model of the world to other tasks

Learning and Recall (2)

Recall current model of world to choose and carry out an action

Observe the action outcome

Update sensory information

Learn new model of the world

Learning and Recall (3)

Update sensory information

Recall current model of world to choose and carry out an action

Learn new model of the world

  • In theory, the two can run in parallel

  • In practice, must share limited resources

Learning and Recall (4)

Possible strategies:

  • Parallel learning and recall at all times

  • Mutually exclusive learning and recall

    • After incremental, eager learning, examples are discarded…

    • …or kept if batch or lazy learning used

  • Cheap on-the-fly learning (preprocessing), off-line computationally expensive learning

    • reduce raw information, change object language

    • analogy with human learning and the role of sleep

Machine Learning and ILP for MAS: Part II

Integration of ML and Agents

ILP and its potential for MAS

Agent Applications of ILP

Learning, Natural Selection and Language

Machine Learning Revisited

ML can be seen as the task of:

  • taking a set of observations represented in a given object/data language and

  • representing (the information in) that set in another language called concept/hypothesis language.

    A side effect of this step – the ability to deal with unseen observations.

Object and Concept Language

  • Object Language: (x,y,+/-).

  • Concept Language: any ellipse (5 param.)









Machine Learning Biases

  • The concept/hypothesis language specifies the language bias, which limits the set of all concepts/hypotheses that can be expressed/considered/learned.

  • The preference bias allows us to decide between two hypotheses if they both classify the training data equally.

  • The search bias defines the order in which hypotheses will be considered.

    • Important if one does not search the whole hypothesis space.

Preference Bias, Search Bias & Version Space

Version space: the subset of hypotheses that have zero training error.

most gen. concept






most spec. concept




Inductive Logic Programming

Based on three pillars:

  • Logic Programming(LP) to represent data and concepts (i.e., object and concept language)

  • Background Knowledge to extend the concept language

  • Induction as learning method

LP as ILP Object Language

  • A subset of First Order Predicate Logic (FOPL) called Logic Programming.

  • Often limited to ground facts, i.e., propositional logic (cf. ID3 etc.).

  • In the latter case, data can be represented as a single table.

ILP Object Language Example

LP as ILP Concept Language

  • The concept language of ILP is relations expressed as Horn clauses, e.g.:

    equal(X,X).greater(X,Y) :- X > Y.

  • Cf. propositional logic representation:(arg1=1 & arg2=1)or(arg1=2 & arg2=2)...

    • Tedious for finite domains and impossible otherwise.

  • Most often there is one target predicate (concept) only.

    • exceptions exist, e.g., Progol 5.

  • Modes in ILP

    Used to distinguish between

    • input attributes (mode +)

    • output attributes (mode -) of the predicate learned.

    • Mode # used to describe attributes that must contain a constant in the predicate definition.

    • E.g., use mode car_type(+,+,#) to learncar_type(Doors,Roof,sports_car):-Doors =< 2, Roof = convertible.

    Modes in ILP

    Used to distinguish between

    • input attributes (mode +)

    • output attributes (mode -) of the predicate learned.

    • Mode # used to describe attributes that must contain a constant in the predicate definition.

    • E.g., use mode car_type(+,+,#) to learncar_type(Doors,Roof,sports_car):-Doors =< 2, Roof = convertible.

    Modes in ILP

    Used to distinguish between

    • input attributes (mode +)

    • output attributes (mode -) of the predicate learned.

    • Mode # used to describe attributes that must contain a constant in the predicate definition.

    • E.g., use mode car_type(+,+,#) to learncar_type(Doors,Roof,sports_car):-Doors =< 2, Roof = convertible.

    Modes in ILP

    Used to distinguish between

    • input attributes (mode +)

    • output attributes (mode -) of the predicate learned.

    • Mode # used to describe attributes that must contain a constant in the predicate definition.

    • E.g., use mode car_type(-,-,#) to learncar_type(Doors,Roof,sports_car):-(Doors = 1 ; Doors = 2), Roof = convertible.

    Types in ILP

    • Specify the range for each argument

    • User-defined types represented as unary predicates:colour(blue). colour(red). colour(black).

    • Built-in types also provided:nat/1, real/1, any/1 in Progol.

    • These definitions may or may not be generative: colour(X) instantiates X,nat(X) does not.

    ILP Types and Modes: Example

    Positive Only Learning

    • A way of dealing with domains where no negative examples are available.

      • Learn the concept of non-self-destructive actions.

    • The trivial definition “Anything belongs to the target concept” looks all right !

    • Trick: generate random examples and treat them as negative.

      • Requires generative type definitions.

    Background Knowledge

    • Only very simple math. relations, such as identity and “greater than” used so far:equal(X,X).greater(X,Y) :- X > Y.

    • These can also be easily hard-wired in the concept language of propositional learners.

    • ILP’s big advantage: one can extend the concept language with user-defined concepts or background knowledge.

    Background Knowledge (2)

    • The use of certain BK predicates may be a necessary condition for learning the right hypothesis.

    • Redundant or irrelevant BK slows down the learning.

    • Example

      BK:prod(Miles,Price,Threshold):- Miles * Price < Threshold.

      Modes: modeh(1,gbc(#model,+miles,+price))?modeb(1,prod(+miles,+price,#threshold))?

      Th:gbc(z3,Miles,Price) :- prod(Miles,Price,250000001).

    Choice of Background Knowledge

    In an ideal world one should start from a complete model of the background knowledge of the target population. In practice, even with the most intensive anthropological studies, such a model is impossible to achieve. We do not even know what it is that we know ourselves. The best that can be achieved is a study of the directly relevant background knowledge, though it is only when a solution is identified that one can know what is or is not relevant.

    The Critical Villager, Eric Dudley

    ILP Preference Bias

    • Typically a trade-off between generality and complexity:

      • cover as many positive examples (and as few negative ones) as you can…

      • …with as simple a theory as possible

    • Some ILP learners allow the users to specify their own preference bias.

    Induction in ILP

    • Bottom-up (least general generalisation)

      • Map a term into a variable

      • Drop a literal from the clause body

    • Top-down (refinement operator)

      • Instantiate a variable

      • Add a literal to the clause body

    • Mixed techniques (e.g., Progol)

    Example of Induction



    Training examples:

    p(b,a).p(f,g).:- p(i,j).


    p(b,a) :- q(b).

    p(X,Y) :- q(X).


    Induction in Progol

    • For each training example

      • Find the most general theory (clause) T

      • Find the most specific theory (clause) 

      • Search the space in between in a top-down fashion:

    T = p(X,Y)

    = p(X,a) :- q(X).

    p(X,Y) :- q(X)


    Summary of ILP Basics

    • Symbolic

    • Eager

    • Knowledge-oriented (white-box) learner

    • Complex, flexible hypothesis space

    • Based on Induction

    Learning Pure Logic Programs vs. Decision Lists

    • Pure logic programs: the order of clauses is irrelevant, and they must not contradict each other.

    • Decision lists: the concept language includes the predicate cut (!).

    • The use of decision lists can make for simpler (more concise) theories.

    Decision List Example





    Updating Decision Lists with Exceptions

    action(Cat,caesar,run):- !.




    Updating Decision Lists with Exceptions

    • Could be very beneficial in agents when immediate updating of the agent’s knowledge is important: just add the exception at the top of the list.

    • Computationally inexpensive – does not need to modify the rest of the list.

    • Exceptions could be compiled into rules when agent is inactive.

    Replacing Exceptions with Rules: Before

    action(Cat,caesar,run):- !.

    action(Cat,rex,run):- !.

    action(Cat,rusty,run):- !.


    Replacing Exceptions with Rules: After





    Eager ILP vs. Analogical Prediction

    • Eager Learning: learn theory, dispose of observations.

    • Lazy Learning:

      • keep all observations

      • compare new with old ones to classify

      • no explanation provided.

    • Analogical Prediction (Muggleton, Bain ‘98)

      • Combines the often higher accuracy of lazy learning with an intelligible, explicit hypothesis typical for ILP

      • Constructs a local theory for each new observation that is consistent with the largest number of training examples.

    Analogical Prediction Example









    Analogical Prediction Example










    Timing Analysis of Theories Learned with ILP

    • The more training examples, the more accurate the theory…

    • …but how long does it take to produce an answer ?

    • No theoretical work on the subject so far

    • Experiment shows nontrivial behaviour (reminding of the phase transitions observed in SAT learning).

    Timing Analysis of ILP Theories: Example

    • Kazakov, PhD Thesis:

    • left: simple theory with low coverage; succeeds or quickly fails high speed

    • middle: medium coverage, fragmentary theory, lots of backtracking low speed

    • right: general theory with high coverage; less backtracking  high speed

    Machine Learning and ILP for MAS: Part II

    Integration of ML and Agents

    ILP and its potential for MAS

    Agent Applications of ILP

    Learning, Natural Selection and Language

    Agent Applications of ILP

    Relational Reinforcement Learning (Džeroski, De Raedt, Driessens)

    • combines reinforcement learning with ILP

    • generalises over previous experience and goals (Q-table) to produce logical decision trees

    • results can be used to address new situations

    • Don’t miss the next talk (~11:40 –13:10h) !

    Agent Applications of ILP

    ILP for Verification and Validation of MAS (Jacob, Driessens, De Raedt)

    • Also uses FOPL decision trees

    • Observes agents’ behavour and represents it as a logical decision tree

    • The rules in the decision tree can be compared with the designers’ intentions

    • Test domain: RoboCup

    Agent Applications of ILP

    Reid & Ryan 2000:

    • ILP used to help hierarchical reinforcement learning

    • ILP constructs high-level features that help discriminate between (state,action) transitions with non-deterministic behaviour

    Agent Applications of ILP

    Matsui et al. 2000:

    • Proposed an ILP agent that avoids actions which will probably fail to achieve the goal.

    • Application domain: RoboCup

      Alonso & Kudenko ‘99:

    • ILP and EBL for conflict simulations.

    The York MA Environment

    • Species of 2D agents competing for renewable, limited resources.

    • Agents have simple hard-coded behaviour based on the notion of drives.

    • Each agent can optionally have an ILP (Progol) mind – a separate process receiving observations and suggesting actions.

    • Allows to select the values of inherited features through natural selection.

    The York MA Environment

    The York MA Environment

    • ILP hasn’t been used in experiments yet (to come soon).

    • A number of experiments using inheritance studied Kinship-driven Altruism among Agents.

    • The start-up project sponsored by Microsoft.

    • Undergraduate students involved so far: Lee Mallabone, Steve Routledge, John Barton.

    Machine Learning and ILP for MAS: Part II

    Integration of ML and Agents

    ILP and its potential for MAS

    Agent Applications of ILP

    Learning, Natural Selection and Language

    Learning and Natural Selection

    • In learning, search is trivial, choosing the right bias is hard.

    • But, the choice of learning bias is always external to the learner !

    • To find the best suited bias one could combine arbitrary choices of bias of with evolution and natural selection of the fittest individuals.

    Darwinian vs. Lamarckian Evolution

    • Darwinian evolution: nothing learned by the individual is encoded in the genes and passed on to the offspring.

    • The Baldwin effect: learning abilities (good biases) are selected in evolution because they give the individual a better chance in a dynamic environment.

    • What is passed on to the offspring is useful, but very general.

    Darwinian vs. Lamarckian Evolution (2)

    • Lamarckian Evolution: individual experience acquired in life can be inherited.

    • Not the case in nature.

    • Doesn’t mean we can’t use it.

    • The inherited concepts may be too specific and not of general importance.

    Learning and Language

    • Language uses concepts which are

      • specific enough to be useful to most/all speakers of that language

      • general enough to correspond to shared experience (otherwise, how would one know what the other is talking about !)

    • The concepts of a language serve as a learning bias which is “inherited” not in genes but through education.

    Communication and Learning

    • Language

      • helps one learn (in addition to inherited biases)

      • allows to communicate knowledge.

    • Distinguish between

      • Knowledge: things that one can explain by the means of a language to another.

      • Skills: the rest, require individual learning, cannot be communicated.

        If watching was enough to learn, the dog would have become a butcher. Bulgarian proverb.

    Communication and Learning (2)

    • In NLP, forgetting [examples] may be harmful (van den Bosch et al.)

    • An expert is someone who does not think anymore – he knows. Frank Lloyd Wright.

    • It may be difficult to communicate what one has learned because of

      • Limited bandwidth (for lazy learning)

      • The absence of appropriate concepts in the language (for black-box learning)

    Communication and Learning (3)

    In a society of communicating agents, less accurate white-box learning may be better than more accurate but expensive learning that cannot be communicated since the reduced performance could be outweighed by the much lower cost of learning.

    Our Current Research

    • Inductive Bias Selection (Shane Greenaway)

    • Role Learning (Spiros Kapetanakis)

    • Inductive Learning for Games (Alex Champandard)

    • Machine Learning of Natural Language in MAS (Mark Bartlett)

    The End

  • Login