1 / 29

Security in Multiagent Systems by Policy Randomization

Security in Multiagent Systems by Policy Randomization. Praveen Paruchuri, Milind Tambe, Fernando Ordonez University of Southern California Sarit Kraus Bar-Ilan University,Israel University of Maryland, College Park. Motivation: The Prediction Game. An UAV (Unmanned Aerial Vehicle)

kirra
Download Presentation

Security in Multiagent Systems by Policy Randomization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Security in Multiagent Systems by Policy Randomization Praveen Paruchuri, Milind Tambe, Fernando Ordonez University of Southern California Sarit Kraus Bar-Ilan University,Israel University of Maryland, College Park

  2. Motivation: The Prediction Game • An UAV (Unmanned Aerial Vehicle) • Flies between the 4 regions • Can you predict the UAV-fly pattern ?? • Pattern 1 • 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4,…… • Pattern 2 • 1, 4, 3, 1, 1, 4, 2, 4, 2, 3, 4, 3,… (as generated by 4-sided dice) • Can you predict if 100 numbers in pattern 2 are given ?? • Randomization decreases Predictability • Increases Security

  3. Problem Definition • Problem : Increase securityby decreasing predictability for agent/agent-team acting in uncertain adversarial environments. • Even if Policy Given, it is Secure • Efficient Algorithms for Reward/Randomness Tradeoff • Assumptions for Agent/agent-team: • Adversary is unobservable • Adversary’s actions/capabilities or payoffs are unknown • Assumptions for Adversary: • Knows the agents plan/policy • Exploits the action predictability • Can see the agent’s state (or belief state)

  4. Solution Technique • Technique developed: • Intentionalpolicy randomization • MDP/POMDP framework • Sequential decision making • MDP Markov Decision Process • POMDP Partially Observable MDP • Increase Security=>Solve Multi-criteria problem for agents • Maximize action unpredictability (Policy randomization) • Maintain reward above threshold (Quality constraints)

  5. Domains • Scheduled activities at airports like security check, refueling etc • Observable by anyone • Randomization of schedules helpful • UAV/UAV-team patrolling humanitarian mission • Adversary disrupts mission – Can disrupt food, harm refugees, shoot down UAV’s etc • Randomize UAV patrol policy

  6. My Contributions • Two main contributions • Single Agent Case : • Formulate as Non linear program : Entropy based metric • Convert to Linear Program called BRLP • (Binary search for randomization) • Randomize single agent policies with reward > threshold • Multi Agent Case : RDR (Rolling Down Randomization) • Randomized policies for decentralized POMDPs • Threshold on team reward

  7. MDP based single agent case • MDP is tuple < S, A, P, R > • S – Set of states • A – Set of actions • P – Transition function • R – Reward function • Basic terms used : • x(s,a) : Expected times action a is taken in state s • Policy (as function of MDP flows) :

  8. Entropy : Measure of randomness • Randomness or information content : Entropy (Shannon 1948) • Entropy for MDP - • Additive Entropy – Add entropies of each state (π is a function of x) • Weighted Entropy – Weigh each state by it contribution to total flow where, alpha_j is the initial flow of the system

  9. Tradeoff : Reward vs Entropy • Non-linear Program: Max entropy, Reward above threshold • Objective (Entropy) is non-linear • BRLP ( Binary Search for Randomization LP ) : • Linear Program • No entropy calculation, Entropy as function of flows

  10. BRLP • Input and target reward (n% * maximum reward) • Poly-time convergence • Monotonicity: Entropy decreases or constant with increasing reward. • Control through • Input can be any high entropy policy • One such input is the uniform policy • Equal probability for all actions out of all states

  11. LP for Binary Search • Policy as function of and • Linear Program

  12. BRLP in Action Beta = .5 = 1 - Max entropy = 0 Deterministic Max Reward Target Reward

  13. Results (Averaged over 10 MDPs) Max entropy : Expected Entropy Method : 10% avg gain over BRLP Fastest : BRLP : 7 fold average speedup over Expected Entropy

  14. Multi Agent Case: Problem • Maximize entropy for agent teams subject to reward threshold • For agent team : • Decentralized POMDP framework used • Agents know initial joint belief state • No communication possible between agents • For adversary : • Knows the agents policy • Exploits the action predictability • Can calculate the agent’s belief state

  15. RDR : Rolling Down Randomization • Input : • Best ( local or global ) deterministic policy • Percent of reward loss • d parameter – Number of turns each agent gets • Ex: d = .5 => Number of steps = 1/d = 2 • Each agent gets one turn (for 2 agent case) • Single agent MDP problem at each step • For agent 1’s turn : • Fix policy of other agents (Agent 2) • Find randomized policy • Maximizes joint entropy • ( w1 * Entropy(agent1) + w2 * Entropy(agent2) ) • Maintains joint reward above threshold

  16. RDR : d = .5 Agent 1 Maximize joint entropy Joint Reward > 90% Max Reward Reward = 90% Agent 2 Maximize joint entropy Joint reward > 80% 80% of Max Reward

  17. Experimental Results : Reward Threshold vs Weighted Entropy ( Averaged 10 instances )

  18. Summary • Intentional randomization as main focus • Single agent case : • BRLP algorithm introduced • Multi agent case : • RDR algorithm introduced • Multi-criterion problem solved that • Maximizes entropy • Maintains Reward > Threshold

  19. Thank You • Any comments/questions ??

  20. Difference between safety and security ?? • Security: It is defined as the ability of the system to deal with threats that are intentionally caused by other intelligent agents and/or systems. • Safety : A system's safety is its ability to deal with any other threats to its goals.

  21. Probing Results : Single agent Case

  22. Probing Results : Multi agent Case

  23. Define POMDP

  24. Define Distributed POMDP • Dec-POMDP is a tuple <S,A,P,Ω,O,R>, where • S – Set of states • A – Joint action set <a1,a2,…,an> • P – Transition function • Ω – Set of joint observations • O- Observation function – Probability of joint observation given current state and previous joint action. Observations independent of each other • R – Immediate, Joint reward • A DEC-MDP is a DEC-POMDP with the restriction that at each time step the agents observations together uniquely determine the state.

  25. Counterexample : Entropy • Lets say adversary shoots down UAV • Hence targets highest probable action --- Called Hit rate • Assume UAV has 3 actions. • 2 possible probability distributions • H ( 1/2, 1/2, 0 ) = 1 ( log base 2 ) • H ( 1/2 - delta, 1/4 + delta, 1/4 ) ~ 3/2 • Entropy = 3/2, Hit rate = 1/2-delta • Entropy = 1, Hit rate = 1/2 • Higher entropy but lower hit rate

  26. d-parameter & Comments on Results • Effect of d-parameter (avg of 10 instances) RDR : Avg runtime in sec and (Entropy), T = 2 Conclusions: • Greater tolerance of reward loss => Higher entropy • Reaching maximum entropy tougher than single agent case • Lower miscoordination cost implies higher entropy • d parameter of .5 is good for practical purposes.

  27. Example where uniform policy is not best

  28. Entropies • For uniform policy – • 1 + ½ * 1 + 2 * ¼ * 1 + 4 * 1/8 * 1 = 2.5 • If initially deterministic policy and then uniform – • 0 + 1 * 1 + 2 * ½ * 1 + 4 * ¼ * 1 = 3 • Hence, uniform policies need not always be optimal.

More Related