290 likes | 305 Views
Multi-Agent Strategic Modeling in a Robotic Soccer Domain. Talk Outline. Overview of the Problem Multi-Agent Strategy Discovering Algorithm Results on the RoboCup Domain Results on the 3vs2 Keepaway Domain *. * not in the paper ( latest results )!.
E N D
Talk Outline • Overview of the Problem • Multi-Agent Strategy Discovering Algorithm • Results on the RoboCup Domain • Results on the 3vs2 Keepaway Domain* *not in the paper (latest results)!
Schema of Multi-Agent Strategy Discovering Algorithm (MASDA) Input:Basic domain knowledge (E.g. Basic soccer and RoboCup domain knowledge) MASDA Input: Multi-agent action sequence (E.g. A RoboCup game) Output: Strategic concepts(E.g. Describing a specificRoboCup game)
Goal: Human description of strategic action concept left forward player dribbles from the left half of the middle third into thepenalty box left forward makes a pass into the the penalty box center forward in the center ofthepenalty box successfully shootsinto the right part of the goal.
Multi-Agent Strategy Discovering Algorithm (MASDA) I.1 I.2, I.3 II.1 II.2 II.3 III.1, III.2, III.3 Increasing abstraction
Step I. Data preprocessing:I.1. Detection of actions in raw data
Step I. Data preprocessing:I.2. Action sequence generation
Step I. Data preprocessing:I.3. Introduction of domain knowledge
Step II: Graphical description:II.1. Action graph creation L-MF:attack support L-MF:creating space L-MF:dribble C-MF:creating space C-MF:pass to player C-MF:dribble
161514131211109876543210 Step II: Graphical description:II.2. Abstraction process Abstraction
161514131211109876543210 Step II: Graphical description:II.3. Strategy selection Abstraction
Step III: Symbolic description learning:III.1. Generation of action descriptions LTeam.C-MF: Successful shoot LTeam.MF: Pass to player LTeam.R-FW: Pass to space LTeam.R-FW: Long dribble
Step III: Symbolic description learning:III.2. Generation of learning examples
Step III: Symbolic description learning:III.3. Rule induction • Each edge in a strategy represents one class. • 2-class learning problem: • positive examples: action instances for a given edge • negative examples: all other action instances • Induce rules for a positive class (i.e. edge) • Repeat for all edges in a strategy
Testing on the RoboCup Simulated League Domain • Input: • 10 RoboCup games: a fixed team vs. various opponent teams • Basic soccer knowledge (no knowledge about strategy, no tactics, and no rules of the game): • soccer roles (e.g. left-forward) • soccer actions (e.g. control dribble) • relations between players (e.g. behind) • playing-field areas (e.g. penalty box) • Output: • strategic concepts (shown on next slide) http://www.robocup.org/
RoboCup Domain: an example strategic concept LTeam.FW:Long dribble: RTeam.C-MF:Moving-away-slow RTeam.L-FB:Still RTeam.R-FB:Short-distance LTeam.FW:Pass to player:RTeam.R-FB:Immediate LTeam.FW:Successful shoot: RTeam.C-FW:Moving-away LTeam.R-FW:Short-distance LTeam.FW:Successful shoot (end): RTeam.RC-FB:Left RTeam.RC-FB:Moving-away-fast RTeam.R-FB:Long-distance
RoboCup Domain:testing methodology • Create a reference strategic concept on 10 RoboCup games • Leave-one-out cross validation to generate 10 learning tasks (learn: 9 games, test: 1 game) • positive examples: examples matching with a reference strategic concept • negative examples: all other examples • Generate strategic concepts on 9 learning games and test on the remaining game • Measure accuracy, recall and precision for a given strategy using: • only action description • only generated rules • both • Varying level of abstraction: 1-20
3vs2 Keepaway Domain • Motivation: • RoboCup is too complex to play with learned concepts • In 3vs2 Keepwaway domain we are able play with learned concepts • Basic domain info: 5 agents, 3 high-level agent actions, 13 state variables http://www.cs.utexas.edu/~AustinVilla/sim/keepaway/ (Peter Stone et al.)
3vs2 Keepaway Domain • Measure average episode duration • Two handcoded reference strategies: • good strategy: hand (14s) - hold the ball till the nearest opponent is within 5m, then pass to the most open player • random: rand (5.2s) - randomly choose possible actions • Our task: learn rules for reference strategies and play as similar as possible • MASDA remains identical • Modified only domain knowledge: • roles (K1, K2, K3, T1, T2), • actions (hold, passK2, passK3) • 13 domain attributes
Testing Methodology Reference game with a known strategy MASDA(rule induction) Rulesare handcoded into the program Game with alearned strategy Compute average episode duration Comparison of episode duration Compute average episode duration
Visual comparison of reference and learned game reference game:handcoded (hand.avi) reference game: random (rand.avi) learned random (rand-pass4.avi) learned handoced(hand-holdpass2.avi)
if dist(K1, T1) > 5m => hold dist(K1, T1) <= 5m player K2 is not free => pass to K3 player K2 is free => pass to K2 Comparison of handcoded strategy and learned rules • DistK1T1 [6, 16) DistK1T2 [6, 16) DistK1C [6, 12) MinAngK3K1T1T2 [0, 90)=> Hold • DistK1T1 [6, 12) DistK1T2 [6, 16) DistK1K3 [10, 14) DistK1K2 [8, 14) => Hold • MinDistK2T1T2 [12, 16) DistK3C [8, 16) DistK1T2 [2, 10) DistK1T1 [0, 6) MinAngK2K1T1T2 [15, 135) => pass to K2 • DistK1T1 [2, 6) MinDistK3T1T2 [10, 16) DistK1K2 [10, 16) DistK2C [4, 14) DistK1T2 [2, 8) MinAngK2K1T1T2 [0, 15) => pass to K3
Conclusion • We have designed a domain independent strategy learning algorithm (MASDA), which learns from action trace and basic domain knowledge • Successful implementation on: • RoboCup domain evaluated by human expert and cross validation. • 3vs2 Keepaway domain evaluated by comparing with two reference strategies thru episode duration, visual comparison and rule inspection
Questions http://dis.ijs.si/andraz/logalyzer/
RoboCup Domain:successful attack strategies L-FW:long dribble →L-FW:pass → FW:shoot L-FW:pass to player →FW:dribble → FW:shoot C-FW:long dribble → C-FW:pass → FW:dribble → FW:shoot R-FW: pass to player →FW:control dribble → FW:shoot R-FW:dribble →R-FW:pass to player → FW:shoot FW:pass to player →L-FW:control dribble → L-FW:shoot