1 / 42

Software Multiagent Systems: Lecture 13

Software Multiagent Systems: Lecture 13. Milind Tambe University of Southern California tambe@usc.edu. Teamwork. When agents act together. Understanding Teamwork. Ordinary traffic Driving in a convoy Two friends A & B together drive in a convoy B is secretly following A

judd
Download Presentation

Software Multiagent Systems: Lecture 13

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California tambe@usc.edu

  2. Teamwork When agents act together

  3. Understanding Teamwork • Ordinary traffic • Driving in a convoy • Two friends A & B together drive in a convoy • B is secretly following A • Pass play in Soccer • Contracting with a software company • Orchestra

  4. Understanding Teamwork • Together • Joint Goal • Co-labor  Collaborate • Not just a union of simultaneous coordinated actions • Different from contracting

  5. Why Teamwork?Why not: Master-Slave? Contracts?

  6. Why Teams • Robust organizations • Responsibility to substitute • Mutual assistance • Information communicated to peers • Still capable of structure (not necessarily flat) • Subteams, subsubteams • Variations in capabilities and limitations

  7. Approach Theory Practical teamwork architectures

  8. Taking a step back…

  9. Key Approaches in Multiagent Systems Distributed Constraint Optimization (DCOP) Distributed POMDP Market mechanisms Auctions Belief-Desire-Intention (BDI) Logics and Psychology Hybrid DCOP/ POMDP/ AUCTIONS/ BDI • Essential in large-scale multiagent teams • Synergistic interactions (JPG  p (MBp) ۸ (MGp)۸ (Until [(MB p) ۷ (MBp)] (WMGp)) x1 x2 x3 x4

  10. Key Approaches for Multiagent Teams Local interactions Local interactions Uncertainty Uncertainty Human usability & plan structure Human usability & plan structure Local utility Local utility DCOP DCOP Dis POMDPs Dis POMDPs BDI BDI Markets Markets BDI-POMDP Hybrid

  11. Distributed POMDPs Three papers on the web pages: What to read: Ignore all the proofs Ignore complexity results JAIR article: the model and the results at the end Understand fundamental principles

  12. Domain: Teamwork for Disaster Response

  13. Multiagent Team Decision Problem (MTDP) • MTDP: < S, A, P, W, O, R> • S: s1, s2, s3… • Single global world state, one per epoch • A: domain-level actions; A = {A1, A2, A3,…An} • Ai is a set of actions for each agent i • Joint action

  14. MTDP • P: Transition function: • P(s’ | s, a1, a2, …an) • RA: Reward • R(s, a1, a2,…an) • One common reward; not separate • Central to teamwork

  15. MTDP (cont’d) • W: observations • Each agent: different finite sets of possible observations • W1, W2... • O: probability of observation • O(destination-state, joint-action, joint-observation) • P(o1,o2..om | a1, a2,…am, s’)

  16. Simple Scenario • Cost of action: -0.2 • Must fight fires together • Observe own location and fire status +20 +40

  17. MTDP Policy The problem: Find optimal JOINT policies • One policy for each agent • pi: Action policy • Maps belief state into domain actions • (Bi  A) for each agent • Belief state: sequence of observations

  18. MTDP Domain Types • Collectively partially observable: general case, no assumptions • Collectively observable: Team (as a whole) observes state • For all joint observations, there is a state s, such that, for all other states s’ not equal to s, Pr (o1,o2…on | s’) = 0 • Pr (o1, o2, …on | s ) = ? • Pr (s | o1,o2..on) = ? • Individually observable: each agent observes the state • For all individual observations, there is a state s, such that for all other states s’ not equal to s, Pr (oi | s’) = 0

  19. From MTDP to COM-MTDP • Two separate actions: communication vs domain actions • Two separate reward types: • Communication rewards and domain rewards • Total reward: sum two rewards • Explicit treatment of communication • Analysis

  20. Communicative MTDPs(COM-MTDPs) • S: communication capabilities, possible “speech acts” • e.g., “I am moving to fire1.” • RS: communication cost (over messages) • e.g., saying, “I am moving to fire1,” has a cost • RS <= 0 • Why ever communicate?

  21. Two Stage Decision Process World • P1: Communication • policy • P2: Action policy • Two state • estimators • Two belief • State updates Actions Observes Communications to and from Agent SE1 SE2 P1 P2 b1 b2

  22. COM-MTDP Continued • B: Belief state (each Bi history of observations, Communication) • Two stage belief update • Stage 1: Pre-communication belief state for agent i (updates just from observations) < <Wi0, S0 >, <Wi1, S1 > .. <Wi t-1, S t-1 >, <Wi t, . > > • Stage 2: Post-communication belief state for i (updates from observations and communication) < <Wi0, S0 >, <Wi1, S1 > .. <Wi t-1, S t-1 >, <Wi t, S t > > • Cannot create probability distribution over states

  23. COM-MTDP Continued The problem: Find optimal JOINT policies • One policy for each agent • pS: Communication policy • Maps pre-communication belief state into message • (Bi  S) for each agent • pA: Action policy • Maps post-communication belief state into domain actions • (Bi  A) for each agent

  24. More Domain Types • General Communication: no assumptions on RS • Free communication: RS(s,s) = 0 • No communication: RS(s,s) is negatively infinite

  25. Teamwork Complexity Results

  26. Classifying Different Models

  27. True or False • If agents communicated all their observations at each step then the distributed POMDP would be essentially a single agent POMDP • In distributed POMDPs, each agent plans its own policy • Solving Distributed POMDPs with two agents is of same complexity as solving two separate individual POMDPs

  28. Algorithms

  29. NEXP-complete • No known efficient algorithms • Brute force search 1. Generate space of possible joint policies 2. For each policy in policy space 3. Evaluate over finite horizon T • Complexity: Cost of evaluation No. of policies

  30. Locally optimal search Joint equilibrium based search for policies JESP

  31. Nash Equilibrium in Team Games • Nash equilibrium vs Global optimal reward for the team B B u v u v x x A A y y z z

  32. JESP: Locally Optimal Joint Policy • Iterate keeping one agent’s policy fixed • More complex policies the same way B u v w x A y z

  33. Joint Equilibrium-based Search • Description of algorithm: 1. Repeat until convergence 2. For each agent i 3. Fix policy of all agents apart from i 4. Find policy for i that maximizes joint reward • Exhaustive-JESP: • brute force search in policy space of agent I • Expensive

  34. JESP: Joint Equilibrium Search (Nair et al, IJCAI 03) • Repeat until convergence to local equilibrium, for each agent K: • Fix policy for all except agent K • Find optimal response policy for agent K Optimal response policy for K, given fixed policies for others in MTDP: • Transformed to a single-agent POMDP problem: • “Extended” state defined as not as • Define new transition function • Define new observation function • Define multiagent belief state • Dynamic programming over belief states • Fast computation of optimal response

  35. Extended State, Belief State • Sample progression of beliefs: HL and HR are observations a2: Listen

  36. Run-time Results

  37. Is JESP guaranteed to find the global optimal? Random restarts

  38. Not All Agents are Equal • Scaling up Distributed POMDPs for Agent Networks

  39. Runtime

  40. POMDP vs. distributed POMDP • Distributed POMDPs more complex • Joint transition and observation functions • Better policy • Free communication = POMDP • Less dependency = lower complexity

  41. BDI vs. distributed POMDP

More Related