Software Multiagent Systems: Lecture 13

Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California tambe@usc.edu

Teamwork When agents act together

Understanding Teamwork • Ordinary traffic • Driving in a convoy • Two friends A & B together drive in a convoy • B is secretly following A • Pass play in Soccer • Contracting with a software company • Orchestra

Understanding Teamwork • Together • Joint Goal • Co-labor  Collaborate • Not just a union of simultaneous coordinated actions • Different from contracting

Why Teamwork?Why not: Master-Slave? Contracts?

Why Teams • Robust organizations • Responsibility to substitute • Mutual assistance • Information communicated to peers • Still capable of structure (not necessarily flat) • Subteams, subsubteams • Variations in capabilities and limitations

Approach Theory Practical teamwork architectures

Taking a step back…

Key Approaches in Multiagent Systems Distributed Constraint Optimization (DCOP) Distributed POMDP Market mechanisms Auctions Belief-Desire-Intention (BDI) Logics and Psychology Hybrid DCOP/ POMDP/ AUCTIONS/ BDI • Essential in large-scale multiagent teams • Synergistic interactions (JPG  p (MBp) ۸ (MGp)۸ (Until [(MB p) ۷ (MBp)] (WMGp)) x1 x2 x3 x4

Key Approaches for Multiagent Teams Local interactions Local interactions Uncertainty Uncertainty Human usability & plan structure Human usability & plan structure Local utility Local utility DCOP DCOP Dis POMDPs Dis POMDPs BDI BDI Markets Markets BDI-POMDP Hybrid

Distributed POMDPs Three papers on the web pages: What to read: Ignore all the proofs Ignore complexity results JAIR article: the model and the results at the end Understand fundamental principles

Domain: Teamwork for Disaster Response

Multiagent Team Decision Problem (MTDP) • MTDP: < S, A, P, W, O, R> • S: s1, s2, s3… • Single global world state, one per epoch • A: domain-level actions; A = {A1, A2, A3,…An} • Ai is a set of actions for each agent i • Joint action

MTDP • P: Transition function: • P(s’ | s, a1, a2, …an) • RA: Reward • R(s, a1, a2,…an) • One common reward; not separate • Central to teamwork

MTDP (cont’d) • W: observations • Each agent: different finite sets of possible observations • W1, W2... • O: probability of observation • O(destination-state, joint-action, joint-observation) • P(o1,o2..om | a1, a2,…am, s’)

Simple Scenario • Cost of action: -0.2 • Must fight fires together • Observe own location and fire status +20 +40

MTDP Policy The problem: Find optimal JOINT policies • One policy for each agent • pi: Action policy • Maps belief state into domain actions • (Bi  A) for each agent • Belief state: sequence of observations

MTDP Domain Types • Collectively partially observable: general case, no assumptions • Collectively observable: Team (as a whole) observes state • For all joint observations, there is a state s, such that, for all other states s’ not equal to s, Pr (o1,o2…on | s’) = 0 • Pr (o1, o2, …on | s ) = ? • Pr (s | o1,o2..on) = ? • Individually observable: each agent observes the state • For all individual observations, there is a state s, such that for all other states s’ not equal to s, Pr (oi | s’) = 0

From MTDP to COM-MTDP • Two separate actions: communication vs domain actions • Two separate reward types: • Communication rewards and domain rewards • Total reward: sum two rewards • Explicit treatment of communication • Analysis

Communicative MTDPs(COM-MTDPs) • S: communication capabilities, possible “speech acts” • e.g., “I am moving to fire1.” • RS: communication cost (over messages) • e.g., saying, “I am moving to fire1,” has a cost • RS <= 0 • Why ever communicate?

Two Stage Decision Process World • P1: Communication • policy • P2: Action policy • Two state • estimators • Two belief • State updates Actions Observes Communications to and from Agent SE1 SE2 P1 P2 b1 b2

COM-MTDP Continued • B: Belief state (each Bi history of observations, Communication) • Two stage belief update • Stage 1: Pre-communication belief state for agent i (updates just from observations) < <Wi0, S0 >, <Wi1, S1 > .. <Wi t-1, S t-1 >, <Wi t, . > > • Stage 2: Post-communication belief state for i (updates from observations and communication) < <Wi0, S0 >, <Wi1, S1 > .. <Wi t-1, S t-1 >, <Wi t, S t > > • Cannot create probability distribution over states

COM-MTDP Continued The problem: Find optimal JOINT policies • One policy for each agent • pS: Communication policy • Maps pre-communication belief state into message • (Bi  S) for each agent • pA: Action policy • Maps post-communication belief state into domain actions • (Bi  A) for each agent

More Domain Types • General Communication: no assumptions on RS • Free communication: RS(s,s) = 0 • No communication: RS(s,s) is negatively infinite

Teamwork Complexity Results

Classifying Different Models

True or False • If agents communicated all their observations at each step then the distributed POMDP would be essentially a single agent POMDP • In distributed POMDPs, each agent plans its own policy • Solving Distributed POMDPs with two agents is of same complexity as solving two separate individual POMDPs

Algorithms

NEXP-complete • No known efficient algorithms • Brute force search 1. Generate space of possible joint policies 2. For each policy in policy space 3. Evaluate over finite horizon T • Complexity: Cost of evaluation No. of policies

Locally optimal search Joint equilibrium based search for policies JESP

Nash Equilibrium in Team Games • Nash equilibrium vs Global optimal reward for the team B B u v u v x x A A y y z z

JESP: Locally Optimal Joint Policy • Iterate keeping one agent’s policy fixed • More complex policies the same way B u v w x A y z

Joint Equilibrium-based Search • Description of algorithm: 1. Repeat until convergence 2. For each agent i 3. Fix policy of all agents apart from i 4. Find policy for i that maximizes joint reward • Exhaustive-JESP: • brute force search in policy space of agent I • Expensive

JESP: Joint Equilibrium Search (Nair et al, IJCAI 03) • Repeat until convergence to local equilibrium, for each agent K: • Fix policy for all except agent K • Find optimal response policy for agent K Optimal response policy for K, given fixed policies for others in MTDP: • Transformed to a single-agent POMDP problem: • “Extended” state defined as not as • Define new transition function • Define new observation function • Define multiagent belief state • Dynamic programming over belief states • Fast computation of optimal response

Extended State, Belief State • Sample progression of beliefs: HL and HR are observations a2: Listen

Run-time Results

Is JESP guaranteed to find the global optimal? Random restarts

Not All Agents are Equal • Scaling up Distributed POMDPs for Agent Networks

Runtime

POMDP vs. distributed POMDP • Distributed POMDPs more complex • Joint transition and observation functions • Better policy • Free communication = POMDP • Less dependency = lower complexity

BDI vs. distributed POMDP

Software Multiagent Systems: Lecture 13

Software Multiagent Systems: Lecture 13

Presentation Transcript

SI 503 Search and Retrieval

BD FACSDiVa 4.1

Function-Oriented Software Design (continued): Lecture 6

Chapter 5: System Software: Operating Systems and Utility Programs

MANAGEMENT INFORMATION SYSTEMS (MIS) LECTURE NOTES 5 ENTERPRISE SOFTWARE APPLICATIONS AND INTEGRATION Spring 2010

Safe and Secure Dependable Systems

Multiagent Systems

Lecture 10A Overview of Expert Systems

CSE503: Software Engineering Software architecture

Embedded Systems Software

Software Project Management (Lecture 7)

KIMAS 2003 Tutorial

ETM 555 Supplementary Lecture Notes Version 5. / 201 2 Contents:

Software Architecture in Practice Chapter 2: What Is Software Architecture? Why Is It Important?

Advanced Operating Systems Lecture notes gost.isi/555

Control and Decision Making in Uncertain Multi-agent Hierarchical Systems

Computers Simplified

Software Engineering Methods Software Design

Testing Software Systems

Lecture 2: RF Issues for Software Radios RF Engineering for the DSP Engineer

Lecture 10 Multi-Spectral Remote Sensing Systems 14 October 2008