1 / 32

http://teamcore.usc.edu

Lafayette College. Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor , Prateek Tandon, and Milind Tambe. http://teamcore.usc.edu. Forward Pointer . When Should There be a “ Me ” in “ Team ”? Distributed Multi-Agent Optimization Under Uncertainty

thuong
Download Presentation

http://teamcore.usc.edu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lafayette College Towards a Theoretic Understanding of DCEEScott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe http://teamcore.usc.edu

  2. Forward Pointer When Should There be a “Me” in “Team”? Distributed Multi-Agent Optimization Under Uncertainty Matthew E. Taylor, Manish Jain, Yanquin Jin, Makoto Yooko, & Milind Tambe Wednesday, 8:30 – 10:30 Coordination and Cooperation 1

  3. Teamwork: Foundational MAS Concept • Joint actions improve outcome • But increases communication & computation • Over two decades of work • This paper: increased teamwork can harm team • Even without considering communication & computation • Only considering team reward • Multiple algorithms, multiple settings • But why?

  4. DCOPs: Distributed Constraint Optimization Problems • Multiple domains • Meeting scheduling • Traffic light coordination • RoboCup soccer • Multi-agent plan coordination • Sensor networks • Distributed • Robust to failure • Scalable • (In)Complete • Quality bounds

  5. DCOP Framework a1 a2 a3

  6. DCOP Framework a1 a2 a3

  7. DCOP Framework a1 a2 a3 Different “levels” of teamwork possible Complete Solution is NP-Hard

  8. D-Cee: Distributed Coordination of Exploration and Exploitation • Environment may be unknown • Maximize on-line reward over some number of rounds • Exploration vs. Exploitation • Demonstrated mobile ad-hoc network • Simulation [Released] & Robots [Released Soon]

  9. DCOP Distrubted Constraint Optimization Problem

  10. DCOP → DCEE Distributed Coordination of Exploration and Exploitation

  11. DCEE Algorithm: SE-Optimistic (Will build upon later) Rewards on [1,200] If I move, I’d get R=200 a1 a2 a3 a4 50 75 99

  12. DCEE Algorithm: SE-Optimistic (Will build upon later) Rewards on [1,200] If I move, I’d gain 275 If I move, I’d gain 251 If I move, I’d gain 101 If I move, I’d gain 125 a1 a2 a3 a4 a3 50 75 99 Explore or Exploit?

  13. Success! [ATSN-09][IJCAI-09] • Both classes of (incomplete) algorithms • Simulation and on Robots • Ad hoc Wireless Network (Improvement if performance > 0)

  14. k-Optimality • Increased coordination – originally DCOP formulation • In DCOP, increased k = increased team reward • Find groups of agents to change variables • Joint actions • Neighbors of moving group cannot move • Defines amount of teamwork (Higher communication & computation overheads)

  15. “k-Optimality” in DCEE • k=1, 2, ... • Groups of size k form, those with the most to gain move (change the value of their variable) • A group can only move if no other agents in its neighborhood move

  16. Example: SE-Optimistic-2 Rewards on [1,200] If I move, I’d gain 275 If I move, I’d gain 251 If I move, I’d gain 101 If I move, I’d gain 125 a1 a2 a3 a4 50 75 99  275 + 250 - 150 200-99  251 + 275 - 150  101 + 251 - 101  125 + 275 - 125 a1 a4 a2 a2 a3 a3 99 50 75

  17. Sample coordination results Omniscient: confirms DCOP result, as expected ! ! ? Artificially Supplied Rewards (DCOP) Complete Graph Chain Graph

  18. Physical Implementation • Create Robots • Mobile ad-hoc Wireless Network

  19. Confirms Team Uncertainty Penalty • Averaged over 10 trials each • Trend confirmed! • (Huge standard error) ! ! ? Total Gain Chain Complete

  20. Problem with “k-Optimal” • Unknown rewards • cannot know if can increase reward by moving! • Define new term: L-Movement • # of agents that can change variables per round • Independent of exploration algorithm • Graph dependant • Alternate measure of teamwork

  21. L-Movement • Example: k = 1 algorithms • L is the size of the largest maximal independent set of the graph • NP-hard to calculate for a general graph • harder for higher k • Consider ring & complete graphs, both with 5 vertices • ring graph: maximal independent set is 2 • complete graph: maximal independent set is 1 • For k =1 • L=1 for a complete graph • size of the maximal independent set of a ring graph is: General DCOP Analysis Tool?

  22. Configuration Hypercube No (partial-)assignment is believed to be better than another wlog, agents can select next value when exploring Define configuration hypercube: C Each agent is a dimension is total reward when agent takes value cannot be calculated without exploration values drawn from known reward distribution Moving along an axis in hypercube → agent changing value Example: 3 agents (C is 3 dimensional) Changing from C[a, b, c] to C[a, b, c’] Agent A3 changes from c to c’

  23. How many agents can move? (1/2) • In a ring graph with 5 nodes • k = 1 : L = 2 • k = 2 : L = 3  • In a complete graph with 5 nodes • k = 1 : L = 1 • k = 2 : L = 2

  24. How many agents can move? (2/2) Configuration is reachable by an algorithm with movement L in s steps if an only if and C[2,2] reachable for L=1 if s ≥ 4

  25. L-Movement Experiments For various DCEE problems, distributions, and L: For steps s = 1...30: • Construct hypercube with s values per dimension • Find M, the max achievable reward in s steps, given L • Return average of 50 runs Example: 2D Hypercube • Only half reachable if L=1  • All locations reachable if L=2 s s

  26. Restricting to L-Movement: Complete L=1→2 Complete Graph  • k = 1 : L = 1 • k = 2 : L = 2 Average Maximum Reward Discovered

  27. Restricting to L-Movement: Ring L=2→3 Ring graph • k = 1 : L = 2 • k = 2 : L = 3  Average Maximum Reward Discovered

  28. Ring Complete • Uniform distribution of rewards • 4 agents • Different normal distribution

  29. k and L: 5-agent graphs • Increasing k changes L less in ring than complete • Configuration Hypercube is upper bound • Posit a consistent negative effect • Suggests why increasing k has different effects: • Larger improvement in complete than ring for increasing k

  30. L-movement May Help Explain Team Uncertainty Penalty • L = 2 will be able to explore more of C than algorithm with L = 1 • Independent of exploration algorithm! • Determined by k and graph structure • C is upper bound – posit constant negative effect • Any algorithm experiences diminishing returns as k increases • Consistent with DCOP results • L-movement difference between k = 1 algorithms and k = 2 • Larger difference in graphs with more agents • For k = 1, L = 1 for a complete graph • For k = 1, L increases with the number of vertices in a ring graph

  31. Thank you Towards a Theoretic Understanding of DCEEScott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe http://teamcore.usc.edu

More Related