1 / 1

We have also applied VPI in a disaster management setting :

Coalition C 0 ={p1, c2, e1}. p2. C 2. c2. e2. p1. c3. e1. c1. C 1. e3. p3. University of Toronto. Department of Computer Science. Sequential Decision Making in Repeated Coalition Formation under Uncertainty Georgios Chalkiadakis Craig Boutilier.

fancy
Download Presentation

We have also applied VPI in a disaster management setting :

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coalition C0={p1, c2, e1} p2 C2 c2 e2 p1 c3 e1 c1 C1 e3 p3 University of Toronto Department of Computer Science Sequential Decision Making in Repeated Coalition Formation under UncertaintyGeorgios Chalkiadakis Craig Boutilier School of Electronics and Computer Science University of Southampton Southampton, United Kingdom Department of Computer Science University of Toronto Toronto, Canada Coalition Formation Type Uncertainty: It Matters! Reasoning under Type Uncertainty I believethatsome guys are better than my current partners…but is there any possible coalition that can guarantee me a higher payoff share? Agents have to: • decide who to join • Agents have own beliefs about the types(capabilities) of others. • Type uncertaintyis then translated to value uncertainty: • According to i,what’s the value (quality) of <C, a>? Coalition structure CS= C0, C1, C2  • decide on how to act p2 C2 • decide how to share the coalitional value / utility C0 c2 e2 p1 c3 e1 Coalition structure CS= C0, C1, C2 c1 Beliefs are over types. Types reflect capabilities (private information) +Action-related uncertainty +Action outcomes are stochastic +No superadditivity assumptions Coalitional value: u(C0 |aC0) = 30 Allocation: <p1=12, c2=3, e1=15> C1 e3 p3 Action vector: a=aC0 , aC1 ,aC2 Optimal Repeated Coalition Formation A Bayesian Coalition Formation Model • N agents; each agent i has a type t  Ti • Set of type profiles: • For any coalition • Agent i has beliefs about the types of the members of any C of agents: • Coalitional actions (i.e., choice of task) for C : AC • Action’s outcome s  S (given actualmembers’ types) • Probability • Each s results into some rewardR(s) • Each i has (a possibly different) estimate about the value of any coalition C: Belief-State MDP formulation to address the induced exploration-exploitation problem: • Takes into account the immediate reward from forming a coalition and executing an action • Takes into account the long-term impact of a coalitional agreement(i.e., the value of information: through belief-state updating and incorporation of the belief-state value into calculations) i.e.: Equations account forthe sequential value of coalitional agreements Approximation Algorithms Example experiment: The Good, the Bad, and the Ugly • One-step lookahead (OSLA): performs a one-step lookahead in belief space • VPI exploration:estimates Value of Perfect Information regarding coalitional agreements • VPI-over-OSLA: combines VPI with OSLA • Maximum a Posteriori (MAP): uses the most likely type vector given beliefs • Myopic: calculates expectation disregarding the sequential value of formation decisions Discounted accumulated rewards: • Balances the expected gain against the expected cost from executing a suboptimal action: • Use current model to myopically evaluate actions’ EU • Assume an action results to perfect information regarding its Q-value. • This perfect information has non-zero value only if it results to a change in policy. • EVPIis calculated and accounted for in action selection (act greedily towardsEU + EVPI ) Total actual rewards gathered during the “Big Crime” phase: VPI is a winner! a)Bayesian, and, yet, b)efficient: uses myopic evaluation of actions, but boosts their desirability EVPI estimates • Consistently outperforms other approximation algorithms • Scales to dozens / hundreds of agents (see below), unlike lookahead approaches Ongoing and Future Work • We have recasted RL algorithms / sequential decision making ideas within a computational trust framework ( we beat the winner of the international ART competition!! ): Paper in this AAMAS!: W.T.L.Teacy, Georgios Chalkiadakis, A. Rogers and N.R. Jennings: “Sequential Decision Making with Untrustworthy Service Providers” • We have also applied VPI in a disaster management setting: • We investigate overlapping coalition formationmodels.

More Related