1 / 19

Increasing Security through Communication and Policy Randomization in Multiagent Systems

Increasing Security through Communication and Policy Randomization in Multiagent Systems. Praveen Paruchuri, Milind Tambe, Fernando Ordonez University of Southern California Sarit Kraus Bar-Ilan University,Israel University of Maryland, College Park. Motivation: The Prediction Game.

brigit
Download Presentation

Increasing Security through Communication and Policy Randomization in Multiagent Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind Tambe, Fernando Ordonez University of Southern California Sarit Kraus Bar-Ilan University,Israel University of Maryland, College Park

  2. Motivation: The Prediction Game • An UAV (Unmanned Aerial Vehicle) • Flies between the 4 regions • Can you predict the UAV-fly pattern ?? • Pattern 1 • 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4,…… • Pattern 2 • 1, 4, 3, 1, 1, 4, 2, 4, 2, 3, 4, 3,… (as generated by 4-sided dice) • Can you predict if 100 numbers in pattern 2 are given ?? • Randomization decreases Predictability • Increases Security

  3. Problem Definition • Problem : Increase securityby decreasing predictability for agent-team acting in adversarial environments. • Even if Policy Given, it is Secure • Environment is stochastic and observable (MDP-based) • Communication is a limited • Efficient Algorithms for Reward/Randomization/Communication Tradeoff

  4. Assumptions • Assumptions for agent-team: • Adversary is unobservable • Adversary’s actions/capabilities or payoffs are unknown • Communication is encrypted (safe) • Assumptions for Adversary: • Knows the agents plan/policy • Exploits action predictability • Can see the agent’s state

  5. Solution Technique • Technique developed: • Intentional policy randomization • CMDP based framework : • Sequential Decision Making • Limited Communication Resources • CMDP  Constrained Markov Decision Process • Increase Security=>Solve Multi-criteria problem for agents • Maximize action unpredictability (Policy randomization) • Maintain reward above threshold (Quality constraints) • Communication usage below threshold (Resource constraints)

  6. Domains • Scheduled activities at airports like security check, refueling etc • Can be observed by adversaries • Randomization of schedules helpful • UAV-team patrolling humanitarian mission • Adversary disrupts mission – Can disrupt food, harm refugees, shoot down UAV’s etc • Randomize UAV patrol policy

  7. Our Contributions • Randomized policies for Multi-agent CMDP (MCMDP) • Solve Miscoordination • Randomized polices in team settings • Policy not implementable! (Reward constraint gets violated) Maximize Policy Randomization Expected Team Reward > Threshold Communication Resource < Threshold

  8. Miscoordination: Effect of Randomization • Meeting tomorrow • 9am – 40%, 10am – 60% • Communicate to coordinate • Limited Communication Should have been 0 (Violates Threshold Rewards)

  9. Communication Issue • Generate Randomized Implementable policies • Limited communication • Problem of communication • M coordination points • N units of communication • Generatebest communication policy • Communication policy can also be randomized • Transform MCMDP to implementable MCMDP • Solution algorithm for transformed MCMDP

  10. MCMDP: Formally Defined • An MCMDP (for a 2 agent case) is a tuple <S,A,P,R, C1,C2, T1,T2, N,Q>where, • S,A,R – Joint states, actions, rewards • P – Transition function • C1 - Cost vector for resource k • T1 - Threshold on expected resource k consumption. • N - Joint communication cost vector • Q - Threshold on communication costs • Basic terms used : • x(s,a) : Expected times action a is taken in state s • Policy (as function of x) :

  11. Entropy : Measure of randomness • Randomness or information content quantified using Entropy ( Shannon 1948 ) • Entropy for CMDP - • Additive Entropy – Add entropies of each state • Weighted Entropy – Weigh each state by it contribution to total flow where alpha_j is the initial flow of the system

  12. Issue 1: Randomized Policy Generation • Non-linear Program: Max entropy, Reward above threshold, Communication below threshold • Obtains required randomization • Appends communication for every action • Issue 2: Generate the Communication Policy

  13. Issue 2: Transformed MCMDP a1b1 a1C a1b2 a1b1 a1o S1 a1b2 S1 a2b1 a2C a2b2 a2o a2b1 a2b2 For each state, for each joint action, Introduce C (communication) and NC for different individual action, add corresponding new states Transition between original and new states Transitions between new states and original target states

  14. Non-linear Constraints • Need to introduce non-linear constraints • For each original state • For each new state introduced by no communication action • Conditional probability of corresponding actions equal Ex: P(b1/ ) = P(b1/ ) && P(b2/ ) = P(b2/ ) , - Observable, Reached by Comm action , - Unobservable, No Comm action

  15. Non-Linear constraints: Handling Miscoordination • Agent B has no hint of state if NC actions. • Necessity to make its actions independent of source state. • Probability of action b1 from state should equal probability of same action (i.e b1) from . • Meeting scenario: • Irrespective of agent A’s plan • If agent B’s plan is 20% 9am & 80% 10am • B is independent of A • Miscoordination avoided  Actions independent of state.

  16. Experimental Results Z-axis Y – axis X-axis

  17. Experimental Conclusions • Reward Threshold decreases => Entropy increases • Communication increases => Agents coordinate better • Coordination invisible to adversary • Agents coordinate better to fool the adversary • Increased communication Higher entropy !!!

  18. Summary • Randomized Policies in Multiagent MDP settings • Developed NLP to maximize weighted entropy with reward and communication constraints. • Provided transformation algorithm to explicitly reason about communication actions. • Showed that communication increases security.

  19. Thank You Any Questions ???

More Related