Conflicts about Teamwork: Hybrids to the Rescue

Conflicts about Teamwork: Hybrids to the Rescue with Emma Bowring, Hyuckchul Jung, Gal Kaminka, Rajiv Maheswaran, Janusz Marecki, Jay Modi, Ranjit Nair Steven Okamoto, Praveen Paruchuri, Jonathan Pearce, David Pynadath, Nathan Schurr, Pradeep Varakantam, Paul Scerri TEAMCORE GROUP teamcore.usc.edu Milind Tambe University of Southern California

Long-Term Research Goal • Building heterogeneous, dynamic teams • Types of entities: Agents, people, sensors, resources, robots,.. • Scale: 1000s or more • Domains: Highly uncertain, real-time, dynamic Space missions Agent facilitated human orgs Large-scale disaster rescue

Key Approaches in Multiagent Systems Distributed Constraint Optimization (DCOP) Distributed POMDP Market mechanisms Auctions Belief-Desire-Intention (BDI) Logics and Psychology Hybrid DCOP/ POMDP/ AUCTIONS/ BDI • Essential in large-scale multiagent teams • Synergistic interactions (JPG  p (MBp) ۸ (MGp)۸ (Until [(MB p) ۷ (MBp)] (WMGp)) x1 x2 x3 x4

Why Hybrid Approaches? Local interactions Local interactions Uncertainty Uncertainty Human usability & plan structure Human usability & plan structure Local utility Local utility DCOP DCOP Dis POMDPs Dis POMDPs BDI BDI Markets Markets DCOP-POMDP Hybrid BDI-POMDP Hybrid

Hybrid Multiagent Systems: Examples Teamcore proxy Teamcore proxy Teamcore proxy Meet Maker • Reschedule meetings • Decide presenters • Order our meals Papers Teamcore proxy Teamcore proxy Interest Matcher Scheduler agent High Task complexity Medium “ More & More computers are ordering food,…we need to think about marketing [to these computers]”local Subway owner Low Large-scale heterogeneous Small-scale homogeneous Small-scale heterogeneous Team Scale & Complexity

Hybrid Teams: Underlying Algorithms BDI Team Plan Role1 Role_i Role_n TEAMCOREPROXIES (Scerri et al 03, Pynadath/Tambe 03) Communication Adjustable autonomy Task allocation Adjustable autonomy Role allocation DCOP POMDP Task scheduling/allocation algorithms Communication algorithms Games BDI

Outline: Sampling Recent Results • Task allocation: Agent-human, offline (adjustable autonomy) • Key result: BDI/POMDP hybrid; speedup POMDPs • Domain: • Communication: Multiagent, explicit & implicit • Domain: • Task allocation: Multiagent, off-line • Domain: • Task allocation: Multiagent, on-line • Domain:

Adjustable Autonomy in Teams(Scerri, Pynadath & T, JAIR 02; Chalupsky et al, IAAI 01) • Agents dynamically adjust level of autonomy: • Key question: When transfer control to humans & when not? • Previous: one-shot transfer-of-control human or agent • Too rigid in teams, e.g. human fails to decide quickly, miscoordination • Agents misbehave, humans apologize • Solution: MDPs for flexible back-&-forth transfer-of-control policies • e.g. ask-human, delay, ask-human, cancel-team-activity • Address user response uncertainty, costs to the team,… • Exploit hybrids for decompositions into smaller MDPs

E-Elves: Hybrids and Adjustable Autonomy BDI Team Plan BDI Plans provide structure with roles Proxy algorithms Communication Adjustable autonomy Role allocation MDPs operate within BDI structure meeting M1 Role: Enable user attend; Avoid waste of team time Adj. Autonomy: MDPs for transfer-of-control policies Communication Role allocation • Reschedule • meetings Teamcore proxy

Personal Assistants: Observational Uncertainty 0.7 0.8 0 0.3 0.9 1.0 …. …. 0.9 0.5 1.0 0 0 0.1 • Problem: Monitor user over time & decide, e.g., transfer-of-control policy • Observational uncertainty, e.g. user observations • POMDPs not MDPs: policy generation extremely slow • Dynamic belief bounds (Max belief probability): big speedups • Reduces region of search for dominant policy vectors Time States

Speedups: Dynamic Belief Bounds (DB) (Varakantam, Maheswaran, T AAMAS’05) • Using Lagrangian techniques, solve the following in poly-time:

Outline • Task allocation: Agent-human, offline (adjustable autonomy) • Domain: • Communication: Multiagent, explicit & implicit • Key result: Distributed POMDPs; Analysis of BDI programs • Domain: • Task allocation: Multiagent, off-line • Domain: • Task allocation: Multiagent, on-line • Domain:

Multiagent Tiger Problem A B What did 2 hear? What did 1 hear? open left, open right, listen, Communicate? open left, open right, listen, or Communicate? • Shared reward • Listen has • small cost • unreliable • Communication • cost • Reset on open 1 2 What is the best joint policy over horizon T?

COM-MTDP (Pynadath & T, JAIR’02) Communicating Multiagent Team Decision Problem <S, A, P,Ω, O, R, S, RS, B>: • S: set of world states. S= {SL, SR} • A = ×iAi: set of joint actions. Ai= {OpenLeft, OpenRight, Listen} • P: state transition function • Ω: set of joint observations. Ωi= {HL, HR} • O: Oi : joint observation function • R: joint reward function

COM-MTDP • S: communication capabilities • RS: communication cost • Bi: Belief state (each Bi history of observations, messages) • Individual policies  : Bi i(Domain action) • E.g. <HL11, HR12>  Listen •  : Bi i(Communication) • Goal: Find joint policies  and  maximize total expected reward over T

Complexity Results in COM-MTDP • Local optimality • Hybrids Complexity:

JESP: Joint Equilibrium Search in COM-MTDP(Nair et al, IJCAI 03) • Repeat until convergence to local equilibrium, for each agent K: • Fix policy of all agents except K • Generate K’s optimal response Optimal response policy for K, given fixed policies for other agents: • Transformed to a single-agent POMDP problem: • “Extended” state defined as not as • Define new transition function • Define new observation function • Define multiagent belief state • Dynamic programming over belief states

JESP and Communication(Nair, T, Yokoo & Roth, AAMAS’04) JESP Synchronized Compact belief state t=3 (SL (HL HL) p1) (SL (HL HR) p2) (SL (HR HR) p3) (SL (HR HL) p4) (SR (HL HL) p5) (SR (HL HR) p6) (SR (HR HR) p7) (SR (HR HL) p8) • Run-time vs communication • frequency

BDI + COM-MTDP Hybrid I • Why hybrid with BDI ? • Ease of use for human developers • Hierarchical plan & organization structure improve scalability • BUT: Quantitative team optimization difficult (given uncertainty/cost) • COM-MTDP: Quantiative evaluation of communication heuristics RULE1(“joint intentions” {Levesque et al 90}): If (fact F  agent’s private state) ANDF matches goal of team’s planAND (F  team state) Then possible communicative goal CG to communicate F RULE2 (TEAMCORE: Rule1+Rule2): Ifpossible communicative goal CG AND (   miscoordination-cost > Communication-cost) Then Communicate CG

BDI + POMDP Hybrid I Feedback for modifying proxy communication algorithms pA : Fixed action policy pHeuristic pOptimal Derive locally, globally optimal communication Policy Distributed POMDP Model PROXIES Team Plans Domain COM-MTDP: Evaluate alternate communication policies Proxy algorithms Communication Adjustable autonomy Role allocation

Compare Communication Policies TEAMCORE • Given domain, for different observability conditions & comm costs: • Evaluate Teamcore (rule1+rule2); Jennings, others, compare with optimal • Optimal: : O(|S|| W|)T

Outline • Task allocation: Agent-human, offline (adjustable autonomy) • Domain: • Communication: Multiagent, explicit & implicit • Domain: • Task allocation: Multiagent, off-line • Key result: Synergistic interaction: BDI+POMDP • Domain: • Task allocation: Multiagent, on-line • Domain:

BDI+POMDP II: Role Allocation Route1 X Y Route2 Scout FAIL Route3 Transport • Task: Urgently move supplies from X to some refugees at Y • Three routes with varying length • Scouts make a route safe for transports • Uncertainty: In actions and observations • Scouts may fail along a route (and transports may replace scouts) • Scouts’ failure may not be observable to transports • How many scouts?On which routes?

BDI+POMDP II: Role Allocation RoboCup Rescue

Hybrid BDI+POMDP for Role Allocation(Nair & T JAIR 05) Feedback for specific role allocation in team plans pOptRole-taking Search policy space for optimal role-taking policy Distributed POMDP Model • POMDPs optimize team plan: Role allocation • Team plans constrain POMDP policy search: Significant speedup Team Plans Domain MTDP: Evaluate alternate role-taking policies Proxy algorithms Policy for executing roles Role allocation Adjustable autonomy Communication

BDI-POMDP Hybrids: Advantage II • Belief-based policy evaluation • Not entire observation histories, only beliefs required by BDI plans • E.g. • history: T=1:<Scout1okay, Scout2fail>; T=2:<Scout1fail, Scout2fail> • history: T=1:<Scout1okay, Scout2okay>; T=2:<Scout1fail, Scout2fail> T=2: <CriticalFailure>

BDI-POMDP Hybrids: Advantage III Organization hierarchy Plan hierarchy • Exploit BDI team plan structure for more efficient policy search • Humans specify BDI team plans • Best role allocation: How many helos in SctTeam A, B, C & Transport • More efficient policy search exploiting structure

BDI-POMDP Hybrid: Advantage IIIHierarchical Policy Groups 1926 2773 4167 3420 0 6 6 6 6 6 6 2 3 0 2 1 4 5 2 3 4 6 4 6 6 6 6 2 1 1 2 4 5 4 5 1 1 0 0 1 0 0 0 2 1 0 0 … …. • Helos • in task force 3420 2926

BDI-POMDP Hybrid: Advantage IIIObtaining upperbound Policy Group Value 6 2 4 [84] [3300] [36] DoScouting [Scout 2; Transport 4] DoTransport [Transport from previous] RemainScouts [Scout from previous] SafeRoute=2 Transport=4 Team-A =2 Team-B =0 Team-C =0 Transport =4 Team-A =1 Team-B =1 Team-C =0 Transport =4 SafeRoute=1 Transport=3 … … 3420 • Obtain max for each component over all start states & observation histories • Dependence: Start of next component based on end state of previous • Why speedup: Avoid duplicate start states & observation histories

Helicopter Domain: BDI + POMDP Synergy BDI helps POMDPs POMDPs help BDI

RoboCup Rescue BDI helps POMDPs POMDPs help BDI Distributed POMDP

Outline • Task allocation: Agent-human, offline (adjustable autonomy) • Domain: • Communication: Multiagent, explicit & implicit • Domain: • Task allocation: Multiagent, off-line • Domain: • Task allocation: Multiagent, off-line • Distributed constraint optimization + distributed POMDPs • Distributed constraint optimization + graphical game perspective • Domain:

ADOPT algorithm for DCOPs(Modi, Shen, T, Yokoo AIJ 05) di djf(di,dj) 1 2 2 0 Cost = 0 Cost = 7 x1 x1 x2 x2 x3 x4 x3 x4 • ADOPT: First asynchronous • complete algorithm • ADOPT’s asynchrony • significant speedups

Speeding up ADOPT via Preprocessing(Maheswaran et al, AAMAS’04; Ali, Koenig, T, AAMAS’05)

Hybrid DCOP-POMDP (Nair, T, Varakantam, Yokoo, AAAI’05) x1 x2 x3 x4 • Add uncertainty in DCOP • Add interaction structure to distributed POMDPs • Networked-distributed-POMDPs • Exploits network interaction locality • Locally interacting distributed JESP • Locally optimal joint policy • Significant speedups over JESP

Hybrid: DCOP and Graphical Games(Pearce, Maheswaran, T PDCS’04) • Incomplete, i.e. locally optimal algorithms • k-optimal algorithm: • k agents maximize local utility • No k-subset can deviate to improve quality • Key features: • Multiple diverse k-optimal solutions • Each of high relative quality • Robust against k failures • Higher global quality

Summary Distributed Constraint Optimization (DCOP) Distributed POMDP Game Theory Auctions Belief-Desire-Intention (BDI) Logics and Folk Psychology Hybrid techniques: First class citizenship Synergies: Build on each other’s strengths (JPG  p (MBp) ۸ (MGp)۸ (Until [(MB p) ۷ (MBp)] (WMGp)) x1 x2 x3 x4 • Key future research: just the beginning of hybrid techniques • Science of hybrid techniques in multiagent systems • Positive (or negative) interactions among techniques? • How to exploit (or avoid) them? • Exploit hybrids in large-scale multiagent systems

Humans in Agent Worlds • Virtual environments • Training, entertainment, education • Large numbers of agents • Realistic interactions with humans • Los Angeles Fire Dept • Large-scale disaster simulations (Schurr, Marecki, T, Scerri, IAAI’05)

Agents in Human Worlds • World is growing flat & interconnected • Command and control disappearing • Geography is history • Collaborations important • Cross regional, national boundaries • Research on agent teams: • “agents net” infrastructure • Allow rapid virtual organizations

Thank You • Mentors • Collaborators Prof. Lesser Prof. Grosz Prof. Yokoo Prof. Kraus

Thank You! TEAMCORE@USC Spring’05

CONTACT • Milind Tambe • tambe@usc.edu • http://teamcore.usc.edu THANK YOU!

Thank You! TEAMCORE@USC

Conflicts about Teamwork: Hybrids to the Rescue