Multi-Agent Strategic Modeling in a Robotic Soccer Domain

Multi-Agent Strategic Modeling in a Robotic Soccer Domain

Talk Outline • Overview of the Problem • Multi-Agent Strategy Discovering Algorithm • Results on the RoboCup Domain • Results on the 3vs2 Keepaway Domain* *not in the paper (latest results)!

Schema of Multi-Agent Strategy Discovering Algorithm (MASDA) Input:Basic domain knowledge (E.g. Basic soccer and RoboCup domain knowledge) MASDA Input: Multi-agent action sequence (E.g. A RoboCup game) Output: Strategic concepts(E.g. Describing a specificRoboCup game)

An example MAS problem:a RoboCup attack

Goal: Human description of strategic action concept left forward player dribbles from the left half of the middle third into thepenalty box left forward makes a pass into the the penalty box center forward in the center ofthepenalty box successfully shootsinto the right part of the goal.

Multi-Agent Strategy Discovering Algorithm (MASDA) I.1 I.2, I.3 II.1 II.2 II.3 III.1, III.2, III.3 Increasing abstraction

Step I. Data preprocessing:I.1. Detection of actions in raw data

Step I. Data preprocessing:I.2. Action sequence generation 

Step I. Data preprocessing:I.3. Introduction of domain knowledge 

Step II: Graphical description:II.1. Action graph creation L-MF:attack support L-MF:creating space L-MF:dribble C-MF:creating space C-MF:pass to player  C-MF:dribble

Step II: Graphical description:II.1. Action graph creation

161514131211109876543210 Step II: Graphical description:II.2. Abstraction process Abstraction

161514131211109876543210 Step II: Graphical description:II.3. Strategy selection Abstraction

Step III: Symbolic description learning:III.1. Generation of action descriptions LTeam.C-MF: Successful shoot LTeam.MF: Pass to player LTeam.R-FW: Pass to space LTeam.R-FW: Long dribble

Step III: Symbolic description learning:III.2. Generation of learning examples

Step III: Symbolic description learning:III.3. Rule induction • Each edge in a strategy represents one class. • 2-class learning problem: • positive examples: action instances for a given edge • negative examples: all other action instances • Induce rules for a positive class (i.e. edge) • Repeat for all edges in a strategy

Testing on the RoboCup Simulated League Domain • Input: • 10 RoboCup games: a fixed team vs. various opponent teams • Basic soccer knowledge (no knowledge about strategy, no tactics, and no rules of the game): • soccer roles (e.g. left-forward) • soccer actions (e.g. control dribble) • relations between players (e.g. behind) • playing-field areas (e.g. penalty box) • Output: • strategic concepts (shown on next slide) http://www.robocup.org/

RoboCup Domain: an example strategic concept LTeam.FW:Long dribble: RTeam.C-MF:Moving-away-slow RTeam.L-FB:Still RTeam.R-FB:Short-distance LTeam.FW:Pass to player:RTeam.R-FB:Immediate LTeam.FW:Successful shoot: RTeam.C-FW:Moving-away LTeam.R-FW:Short-distance LTeam.FW:Successful shoot (end): RTeam.RC-FB:Left RTeam.RC-FB:Moving-away-fast RTeam.R-FB:Long-distance

RoboCup Domain:testing methodology • Create a reference strategic concept on 10 RoboCup games • Leave-one-out cross validation to generate 10 learning tasks (learn: 9 games, test: 1 game) • positive examples: examples matching with a reference strategic concept • negative examples: all other examples • Generate strategic concepts on 9 learning games and test on the remaining game • Measure accuracy, recall and precision for a given strategy using: • only action description • only generated rules • both • Varying level of abstraction: 1-20

RoboCup Domain:analysis of 10 RoboCup games

3vs2 Keepaway Domain • Motivation: • RoboCup is too complex to play with learned concepts • In 3vs2 Keepwaway domain we are able play with learned concepts • Basic domain info: 5 agents, 3 high-level agent actions, 13 state variables http://www.cs.utexas.edu/~AustinVilla/sim/keepaway/ (Peter Stone et al.)

3vs2 Keepaway Domain • Measure average episode duration • Two handcoded reference strategies: • good strategy: hand (14s) - hold the ball till the nearest opponent is within 5m, then pass to the most open player • random: rand (5.2s) - randomly choose possible actions • Our task: learn rules for reference strategies and play as similar as possible • MASDA remains identical • Modified only domain knowledge: • roles (K1, K2, K3, T1, T2), • actions (hold, passK2, passK3) • 13 domain attributes

Testing Methodology Reference game with a known strategy MASDA(rule induction) Rulesare handcoded into the program Game with alearned strategy Compute average episode duration Comparison of episode duration Compute average episode duration

Episode duration comparison of reference and learned game

Visual comparison of reference and learned game reference game:handcoded (hand.avi) reference game: random (rand.avi) learned random (rand-pass4.avi) learned handoced(hand-holdpass2.avi)

if dist(K1, T1) > 5m => hold dist(K1, T1) <= 5m player K2 is not free => pass to K3 player K2 is free => pass to K2 Comparison of handcoded strategy and learned rules • DistK1T1 [6, 16)  DistK1T2  [6, 16) DistK1C  [6, 12)  MinAngK3K1T1T2  [0, 90)=> Hold • DistK1T1  [6, 12)  DistK1T2  [6, 16) DistK1K3  [10, 14)  DistK1K2  [8, 14) => Hold • MinDistK2T1T2  [12, 16)  DistK3C  [8, 16) DistK1T2  [2, 10)  DistK1T1  [0, 6)  MinAngK2K1T1T2  [15, 135) => pass to K2 • DistK1T1  [2, 6)  MinDistK3T1T2  [10, 16) DistK1K2  [10, 16)  DistK2C  [4, 14) DistK1T2  [2, 8)  MinAngK2K1T1T2  [0, 15) => pass to K3

Conclusion • We have designed a domain independent strategy learning algorithm (MASDA), which learns from action trace and basic domain knowledge • Successful implementation on: • RoboCup domain evaluated by human expert and cross validation. • 3vs2 Keepaway domain evaluated by comparing with two reference strategies thru episode duration, visual comparison and rule inspection

Questions http://dis.ijs.si/andraz/logalyzer/

RoboCup Domain:successful attack strategies L-FW:long dribble →L-FW:pass → FW:shoot L-FW:pass to player →FW:dribble → FW:shoot C-FW:long dribble → C-FW:pass → FW:dribble → FW:shoot R-FW: pass to player →FW:control dribble → FW:shoot R-FW:dribble →R-FW:pass to player → FW:shoot FW:pass to player →L-FW:control dribble → L-FW:shoot

Multi-Agent Strategic Modeling in a Robotic Soccer Domain

Multi-Agent Strategic Modeling in a Robotic Soccer Domain

Presentation Transcript

NetLogo: Design and Implementation of a Multi-Agent Modeling Environment

Market-Driven Multi-Agent Collaboration in Robot Soccer Domain

Market-Driven Multi-Agent Collaboration in Robot Soccer Domain

Playing Robotic Soccer

Playing Robotic Soccer

Domain Modeling

Modeling Teamwork in Multi-Agent Systems: The CAST Architecture

Modeling Command and Control in Multi-Agent Systems*

Multi-Domain Modeling

Trust in multi-agent systems

Multi-Domain Resilience

A Framework for Agent Collaboration in Multi-Agent Systems

A MULTI AGENT SUPPORT SYSTEM FOR MODELING, CONTROL ,DESIGN AND SIMULATION

Multi-agent fabrieksbesturing in Java.

Domain Modeling

Agent Communication in Multi Agent Systems

Modeling Teamwork in the CAST Multi-Agent System

Multi-agent Systems in Medicine

A Multi-Agent-Approach

Modeling Belief Reasoning in Multi-Agent Systems*

Modeling in the Frequency Domain

Domain Modeling