Zahy Bnaya and Ariel Felner, ISE Department, Ben-Gurion University.

A Macro action in which agent l moves forward along path Ii until reaching target; if a blocked edge is encountered, the agent returns along the path and stops at source. An action in which agent Al moves forward Along path Ii until reaching (without crossing) ei,j , or a blocked edge, whichever occurs first. In either case the agent returns to source. Followers-Committing policy – A policy where A0 uses only TRY actions and whenever A0 reaches t through path Ii, for some i < k, the agents A1-A(n-1) traverse Ii as well. the action TRY (0,i) executed in a followers-committing policy. The following order on the k paths. I<* j iff Where Qi is the probability of path Ii being unblocked; and E[TRYn(Ii)] is the expected cost of action TRYn(Ii). The followers-committing policy where A0 executes the committing policy of trying the paths by increasing order of <* and A1-An-1 follow A0. s a d(s) d(x) x Lemma: Let П be a followers-committing policy for M. Then cost(П*M) ≤ cost(П) Theorem: П*M is an optimal policy for M. e y t Zahy Bnaya and Ariel Felner, ISE Department, Ben-Gurion University. Dror Fried, Eyal Solomon Shimony and Olga Maksin, CS department, Ben-Gurion University Experimental Results Canadian Traveler Problem The Canadian Traveler Problem (CTP) is a shortest path navigation problem on a given graph G=(V,E). Each edge e E, might be blocked with a known probability p(e). Otherwise, edge e is traversable. The exact status (traversable/blocked) of each edge is known only upon reaching an adjacent vertex. Objective: Navigating to the goal while minimizing the total travel cost. Finding the optimal policy is known to be PSPACE-complete (C. Papadimitriou and M. Yannakakis) CTP can be modeled as a POMDP but has an exponential state space. Multi-Agent CTP Several (n) agents operate in the given graph. Agents are fully cooperative and aim to minimize their total travel cost. Agents share their knowledge about edges. Repeated task CTP CTP-REP(n) Only one agent is active at a time. All other agents are inactive until currently active agent reaches the target. Example: Daily delivery service on a partially known environment. • Theoretical Results • There are 4 different ways to generalize UCT to work for repeated • CTP depending on the following attributes: • Behavior of consecutive agents • Followers – Follow the shortest-path revealed by the first agent. • Reasoning – Activates a CTP-UCT on the currently known graph • 2) Considering consecutive agents. • Inconsiderate agents – Each agent searches for a policy for himself • Considerate agents – the agent considers the remaining n-1 agents and their expected travel cost to the goal. This can be done by introducing an estimate Fk(бi) of the cost consecutive agents incurs. • So the modified UCT rule is: Executed On Delaunay graphs with 20 edges and 8 unknowns (100x100) Execution time in minutes C-blind is a lower bound that uses only the known edges of the graph to find a shortest path and then follows it. Any unknown edges revealed while traversing, are ignored. Agent is located in s d(x) – Best bypass from x d(s) – Best bypass from s Example Travel cost results UCT for Repeated-Task CTP Summary and Conclusions • Repeated task CTP is introduced, and an optimal followers policy for the special case of disjoint-path graphs is defined and proved. • Follower policy is very strong as shown empirically. • Future Work • Experiment CTP-REP(n) on other environments such as grids and real maps. • Understand better under what circumstances each of the versions performs better. UCTR(б i) = UCT(б i)+(n-1)Fk(бi) Should agents explore more for the benefit of others?

A Macro action in which agent l moves forward along path Ii until reaching target; if a blocked edge is encountered, the agent returns along the path and stops at source. An action in which agent Al moves forward Along path Ii until reaching (without crossing) ei,j , or a blocked edge, whichever occurs first. In either case the agent returns to source. Followers-Committing policy – A policy where A0 uses only TRY actions and whenever A0 reaches t through path Ii, for some i < k, the agents A1-A(n-1) traverse Ii as well. the action TRY (0,i) executed in a followers-committing policy. The following order on the k paths. I<* j iff Where Qi is the probability of path Ii being unblocked; and E[TRYn(Ii)] is the expected cost of action TRYn(Ii). The followers-committing policy where A0 executes the committing policy of trying the paths by increasing order of <* and A1-An-1 follow A0. Lemma: Let П be a followers-committing policy for M. Then cost(П*M) ≤ cost(П) Theorem: П*M is an optimal policy for M.

Experimental Results Delaunay graphs with 50 vertices (100x100) Sensing cost - Constant Sensing cost - Distance

Theoretic Results Disjoint graph without sensing Lemma 1 The optimal committing travel policy is to try the paths in non-decreasing order of the value Di Disjoint graph without sensing Lemma2 For disjoint paths, there exists a committing policy, that is optimal among all policies. Theorem 1 The optimal travel policy for disjoint paths is to TRYthe paths in order of non-decreasing Di. Lemma 1 The optimal committing travel policy is to try the paths in non-decreasing order of the value Di Weight of partial path VOI policies Expected “backtrack” cost committing policy: once traversal along any path i begins, it commits the agent to continue along path i until either the target t or a blocked edge is reached committing policy: once traversal along any path i begins, it commits the agent to continue along path i until either the target t or a blocked edge is reached Path1 Path1 Pathi Pathi S S T T Disjoint graph with sensing Lemma 3 - An optimal sensing policy is to order the sensing actions in non-increasing order of Theorem 3 For a set of sensing operations to be made along a single disjoint path, the optimal sensing policy is to perform the sensing operations in non-increasing order of Pathn Pathn n! Committing policies for n-disjoint path graph n! Committing policies for n-disjoint path graph Weight of partial path Expected “backtrack” cost CTP With Sensing On General Graph FSSNscheme (Free space assumption sensing based navigation) * Unknown edges are assumed traversable * Pluggable policies Disjoint graph with sensing Lemma 3 - An optimal sensing policy is to order the sensing actions in non-increasing order of Theorem 3 For a set of sensing operations to be made along a single disjoint path, the optimal sensing policy is to perform the sensing operations in non-increasing order of Brute Force Policies BFAS – Always senses an edge BFNS – Never senses an edge Expected cost of sense Expected cost of sense Expected cost of not sense Expected cost of not sense Value of information Value of information Sense if condition is met Sense if condition is met EXP FSSN Single step VOI Generates sample graphs and averages costs for each of the four scenarios CTP With Sensing On General Graph FSSN scheme (Free space assumption sensing based navigation) * Unknown edges are assumed traversable * Pluggable policies Policies BFAS – Always sense an edge BFNS – Never sense an edge VOI policies EXP FSSN Single step VOI Generates sample graphs and averages costs for each of the four scenarios

CTP With Sensing On General Graph FSSN scheme (Free space assumption sensing based navigation) * Unknown edges are assumed traversable * Pluggable policies Policies BFAS – Always sense an edge BFNS – Never sense an edge VOI policies EXP Disjoint graph without sensing Lemma 1 The optimal committing travel policy is to try the paths in non-decreasing order of the value Di committing policy: once traversal along any path i begins, it commits the agent to continue along path i until either the target t or a blocked edge is reached Path1 Pathi S T Pathn n! Committing policies for n-disjoint path graph Weight of partial path FSSN Single step VOI Generates sample graphs and averages costs for each of the four scenarios Expected “backtrack” cost Disjoint graph with sensing Lemma 3 - An optimal sensing policy is to order the sensing actions in non-increasing order of Theorem 3 For a set of sensing operations to be made along a single disjoint path, the optimal sensing policy is to perform the sensing operations in non-increasing order of Expected cost of sense Expected cost of not sense Value of information Sense if condition is met Lemma 2 For disjoint paths, there exists a committing policy, that is optimal among all policies. Theorem 1 The optimal travel policy for disjoint paths is to TRYthe paths in order of non-decreasing Di. Considers the state of a single edge

? Shortest path S Longer path U Blocked edge might affect travel cost M P G ? N committing policy: once traversal along any path i begins, it commits the agent to continue along path i until either the target t or a blocked edge is reached Path1 Pathi S T Pathn n! Committing policies for n-disjoint path graph Weight of partial path Expected “backtrack” cost

s a d(v) d(x) x e y t

Zahy Bnaya and Ariel Felner, ISE Department, Ben-Gurion University.