1 / 33

Bulding Practical Agent Teams: A hybrid perspective

SBIA 2004. Bulding Practical Agent Teams: A hybrid perspective. Milind Tambe tambe@usc.edu Computer Science Dept University of Southern California Joint work with the TEAMCORE GROUP http://teamcore.usc.edu. Long-Term Research Goal. Building large-scale heterogeneous teams

marv
Download Presentation

Bulding Practical Agent Teams: A hybrid perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SBIA 2004 Bulding Practical Agent Teams: A hybrid perspective Milind Tambe tambe@usc.edu Computer Science Dept University of Southern California Joint work with the TEAMCORE GROUP http://teamcore.usc.edu

  2. Long-Term Research Goal • Building large-scale heterogeneous teams • Types of entities: Agents, people, sensors, resources, robots,.. • Scale: 1000s or more • Domains: Highly uncertain, real-time, dynamic • Activities: Form teams, persist for long durations, coordinate, adapt… • Some applications: Agent facilitated human orgs Large-scale disaster rescue Large area security

  3. Domains and Motivations High Task & domain complexity Medium Low Large-scale heterogeneous Small-scale homogeneous Small-scale heterogeneous Team Scale & Complexity

  4. Motivation: BDI+POMDP Hybrids TOP: Team plans, organizations, agents Execute Rescue [RAP team] Teamcore proxy Extinguish Fires Rescue civilians [Fire company] [Ambulance team] Team proxy Team proxy Clear Roads Extinguish • DistributedPOMDP approach • BDI approach Compute Optimal Policy using Distributed Partially Observable Markov Decision Processes (POMDPs) • Frameworks: Teamcore/Machinetta, GPGP,… • +ve: Ease of use for human developers; coordinate large-scale teams • -ve: Quantitative team evaluations difficult (given uncertainty/cost) • Frameworks: MTDP, DEC-MDP/DEC-POMDP, POIPSG,… • +ve: Quantitative of team performance evaluation easy (with uncertainty) • -ve: Scale-up difficult, difficult for human developers to program policies

  5. BDI + POMDP Synergy Execute Rescue [RAP team] Teamcore proxy Extinguish Fires Rescue civilians [Fire company] [Ambulance] Team proxy Team proxy Clear Roads Extinguish Distributed POMDPs for TOP & proxy analysis and refinement • Combine “traditional” TOP approaches with distributed POMDPs • POMDPs improve TOP/proxies: E.g., Improve role allocation • TOP constrain POMDP policy search: Orders of magnitude speedup

  6. Overall Research Framework Distributed POMDP Analysis: Multiagent Team Decision Problem (MTDP) (Nair et al 03b, Nair et al 04, Paruchuri et al 04) Role allocation algorithms Communication algorithms Teamwork proxy infrastructure

  7. Electric Elves: 24/7 from 6/00 to 12/00(Chalupsky et al, IAAI’2001) Teamcore proxy Teamcore proxy Interest Matcher Scheduler agent Teamcore proxy Teamcore proxy Teamcore proxy Meet Maker • Reschedule meetings • Decide presenters • Order our meals Papers “ More & More computers are ordering food,…we need to think about marketing [to these computers]”local Subway owner

  8. Modules within the Proxies: AA(Scerri, Pynadath and Tambe, JAIR’2002) Team-oriented Program a : Meeting Proxy algorithms Role: user arrives on time Communication Adjustable autonomy Role allocation Adj. Autonomy: MDPs for transfer-of-control policies Communication Role allocation • Reschedule • meetings Teamcore proxy • MDP Policies: Planned sequence of transfers of control, coordination changes • E.g., ADAH: Ask, delay, ask, cancel

  9. Back to Hybrid BDI-POMDP Frameworks

  10. Motivation: Communication in Proxies Proxy’s heuristic “BDI” communication rules example: RULE1(“joint intentions” {Levesque et al 90}): If (fact F  agent’s private state) ANDF matches goal of team’s plan AND (F  team state) Then possible communicative goal CG to communicate F RULE2: Ifpossible communicative goal CG AND (   miscoordination-cost > Communication-cost) Then Communicate CG

  11. Motivation: Earlier BDI Evaluation Testing teamwork in RoboCup (Tambe et al, IJCAI’99) Testing Communication Selectivity (Pynadath & Tambe, JAAMAS’03) Helicopter domain • Quantiative analysis of optimality or complexity of optimal response difficult • Challenge in domains with significant uncertainty and costs

  12. Distributed POMDPs Si STATE COM-MTDP (Pynadath and Tambe, 02) RMTDP (Nair, Tambe, Marsella 03) • S: states of the world (e.g., helicopter position, enemy position) • Ai: Actions (Communicate action, domain action ) • P: State transition probabilities • R: Reward; sub-divided based on action types

  13. COM-MTDP: Analysis of Comunication Landmark1, Landmark2, E,NE… • W: observations (e.g., E enemy-on-radar, NE enemy-not-on-radar) • O: probability of observation given destination state & past action • B: Belief state (each Bi history of observations, messages) • Individual policies  :Bi i (Domain action)  :Bi i (Communication) • Goal: Find joint policies  and  maximize total expected reward Table per state, previous action STATE

  14. Complexity Results in COM-MTDP Complexity: • Locally optimal solution (No global team optimality) • Hybrid approach: POMDP + BDI

  15. Approach I: Locally Optimal Policy (Nair et al 03) • Repeat until convergence to local equilibrium, for each agent K: • Fix policy for all except agent K • Find optimal response policy for agent K Find optimal response policy for agent K, given fixed policies for others: • Problem becomes finding an optimal policy for a single agent POMDP • “Extended” state defined as not as • Define new transition function • Define new observation function • Define multiagent belief state • Dynamic programming over belief states • Significant speedup over exhaustive search, but problem size limited

  16. II: Hybrid BDI + POMDP Feedback for modifying proxy communication algorithms pA : Fixed action policy p1p2p3 Vary Commun policies pOptimal Derive locally, globally optimal communication Policy Distributed POMDP Model (Exploit TOP) Team-oriented Program Domain COM-MTDP: Evaluate alternate communication policies Proxy algorithms Communication Adjustable autonomy Role allocation

  17. Compare Communication Policiesover Different Domains TEAMCORE • Given domain, for different observability conditions & comm costs: • Evaluate Teamcore (rule1+rule2); Jennings, others, compare with optimal • Optimal: : O(|S|| W|)T

  18. Distributed POMDPs to Analyze Role Allocations: RMTDP

  19. Role Allocation: Illustration • Task: Move cargo from X to Y, large reward for cargo at destination • Three routes with varying length and failure rates • Scouts make a route safe for transports • Uncertainty: In actions and observations • Scouts may fail along a route (and transports may replace scouts) • Scouts failure rate decreases if more scouts to a route • Scouts’ failure may not be observable to transports

  20. Team-Oriented Program Organization hierarchy Plan hierarchy • Best initial role allocation: How many helos in SctTeam A, B, C & Transport • TOP:Almost entire RMTDP policy is completely fixed • Policy gap only on step 1: Best role allocation in initial state for each agent • Assume six helicopter agents: 84 combinations (84 RMTDP policies)

  21. Analyzing Role Allocation in Teamwork Feedback for specific role allocation in TOP pOptRole-taking Search policy space for optimal role-taking policy Distributed POMDP Model Team-oriented Program Domain R-MTDP: Evaluate alternate role-taking policies Role execution Policy Proxy algorithms Role allocation Adjustable autonomy Communication Fill in gaps In policies S2 S4 …. S1 ? S3 S5

  22. RMTDP Policy Search: Efficiency Improvements • Belief-based policy evaluation • Not entire observation histories, only beliefs required by TOP • Form hierarchical policy groups for branch-&-bound search • Obtain upper bound on values of policies within a policy-group • If individual policies higher valued than a group, prune the group • Exploit TOP for generating policy groups, and for upper bounds E.g., history: T=1:<Scout1okay, Scout2fail>; T=2:<Scout1fail, Scout2fail> history: T=1:<Scout1okay, Scout2okay>; T=2:<Scout1fail, Scout2fail> E.g., T=2: <CriticalFailure>

  23. MaxExp: Hierarchical Policy Groups 1926 2773 4167 3420 0 6 6 6 6 6 1 0 2 3 4 2 3 4 6 5 6 6 6 6 2 2 1 1 5 4 5 4 1 1 0 0 1 0 0 0 1 2 0 0 … …. 6 2926

  24. MaxExp: Upperbound Policy Group Value [84] [3300] [36] DoScouting [Scout 2; Transport 4] DoTransport [Transport from previous] RemainScouts [Scout from previous] SafeRoute=2 Transport=4 6 Team-A =2 Team-B =0 Team-C =0 Transport =4 Team-A =1 Team-B =1 Team-C =0 Transport =4 SafeRoute=1 Transport=3 … 2 4 … 3420 • Obtain max for each component over all start states & observation histories • If each component independent: Can evaluate each separately • Dependence: Start of next component based on end state of previous • Why speedup: • No duplicate start states: multiple paths of previous component merge • No duplicate observation histories

  25. Helicopter Domain: Computational Savings • NOPRUNE-OBS: No pruning, maintain full observation history • NOPRUNE: No pruning, maintain beliefs not observation histories • MAXEXP: Pruning using MAXEXP heuristic, using beliefs • NOFAIL: MAXEXP enhanced with “no failure” for quicker upper bound

  26. Does RMTDP Improve Role Allocation?

  27. RoboCup Rescue: Computational Savings

  28. RoboCupRescue: RMTDP Improves Role Allocation

  29. SUMMARY Team proxy Team proxy Team proxy COM-MTDP & R-MTDP: Distributed POMDPs for analysis • Combine “traditional” TOP approaches with distributed POMDPs • Exploit POMDPs to improve TOP/teamcore proxies • Exploit TOP to constrain POMDP policy search • Key policy evaluation complexity results TOP: Team plans organizations, agents

  30. Future Work Trainee Agent-based Simulation technology Visualization

  31. Thank You Contact: • Milind Tambe • tambe@usc.edu • http://teamcore.usc.edu/tambe • http://teamcore.usc.edu

  32. Key Papers cited in this Presentation • Rajiv T. Maheswaran, Jonathan P. Pearce, and Milind Tambe. Distributed Algorithms for DCOP: A Graphical Game-Based Approach. Proceedings of the 17th International Conference on Parallel and Distributed Computing Systems (PDCS-2004). • Praveen Paruchuri, Milind Tambe, Fernando Ordonez, Sarit Kraus, Towards a formalization of teamwork with resource constraints, International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. • Ranjit Nair, Maayan Roth, Makoto Yokoo and Milind Tambe: "Communication for Improving Policy Computation in Distributed POMDPs". In Proceedings of The Third International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-04), 2004. (Post-script/PDF). • Rajiv T. Maheswaran, Milind Tambe, Emma Bowring, Jonathan P. Pearce, Pradeep Varakantham "Taking DCOP to the Real World : Efficient Complete Solutions for Distributed Event Scheduling". In Proceedings of the third International Joint Conference on Agents and Multi Agent Systems, AAMAS-2004. • Modi, P.J., Shen, W., Tambe, M., Yokoo, M. “Solving Distributed Constraint Optimization Problems Optimally, Efficiently and Asynchronously” Artificial Intelligence Journal (accepted) • D.V.Pynadath and M.Tambe. Automated teamwork among heterogeneous software agents and humans. Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS). 7:71--100, 2003.** [pdf] ** • Nair, R., Tambe, M., Yokoo, M., Pynadath, D. and Marsella, S. Taming Decentralized POMDPs: Towards efficient policy computation for multiagent settings Proceedings of the International Joint conference on Artificial Intelligence (IJCAI), 2003 • Nair, R., Tambe, M., and Marsella, S. Role allocation and reallocation in multiagent teams: Towards a practical analysis Proceedings of the second International Joint conference on agents and multiagent systems (AAMAS), 2003 • Scerri, P., Johnson, L., Pynadath, D., Rosenbloom, P. Si, M., Schurr, N. and Tambe, M. A prototype infrastructure for distributed robot, agent, person teams Proceedings of the second International Joint conference on agents and multiagent systems (AAMAS), 2003 • Scerri, P. Pynadath, D. and Tambe, M. Towards adjustable autonomy for the real-world Journal of AI Research (JAIR), 2002, Volume 17, Pages 171-228 ** [pdf] ** • Pynadath, D. and Tambe, M. The communicative multiagent team decision problem: Analyzing teamwork theories and models Journal of AI Research (JAIR), 2002 • Kaminka, G., Pynadath, D. and Tambe, M. Monitoring teams by overhearing: A multiagent plan-recognition approach Journal of AI Research (JAIR), 2002 ** [pdf] **

  33. All the Co-authors

More Related