- By
**avian** - Follow User

- 76 Views
- Uploaded on

Download Presentation
## MURI Review MIT

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### MURI ReviewMIT

Eric Feron

Joint Work with

Jan De Mot, Vishwesh Kulkarni, Sommer Gentry, Tom Schouwenaars, and Vladislav Gavrilets.

at the Laboratory for Information and Decision Systems, MIT.

??

Number of UAVs

We view spatial distribution of the UAVs as a key factor and present original

results concerning the UAV separations and the UAV placements.

Overview- Efficient multi-agent operations require robust, optimal coordination policies.
- UAV specifications constrain deployable coordination policies.
- How may we improve our understanding of these constraints?
- How may we use it to synthesize more efficient coordination policies?

Coordinated Path Planning (CPP)

- CPP Problem Setting
- UAVs need to go from a point s to a point t.
- Environment is dynamic and uncertain.
- UAVs cooperate by sharing the acquired local information.
- UAVs have limited resources.

GOAL: Optimize the traversal efficiency.

- Questions
- What is the spatial distribution under an optimal policy?
- We have characterized the separation bounds.
- How many UAVs are needed?
- We do not know the full answer yet!

Multi-Agent Exploration of Unknown Environments

- Probabilistic map building of Burgard et al [2002] uses deterministic value iteration to determine the next optimal observation point.
- The market architecture of Zlot et al [2002] auctions off the next optimal observation points obtained by solving a TSP.
- The end goal is spanning rather than CPP.

- CPP as Multi-Agent MDPs
- Boutilier et al [2000]. We consider partially observable MDPs.
- Greedy policy pursuit-evasion games of Hespanha et al [2002].

agent

known region

unknown region

new region

We present new results in a coordinated target acquisition setting using DP.

Our CPP Problem

- Gray zones: obstacles
- Red zones: danger

- Terrain is mapped into regions having payoffs indicating for example the distance to a target t or potential threats.
- To each region is associated a node.
- Links connect the nodes forming a graph.
- To each link is associated a link cost reflecting:
- The payoff of the goal region and
- The traverse cost of the link.
- Cluster of agents (e.g. UAV’s) need to reach t from s.

Model of the Environment: Cylindrical Graph

- Cylindrical: reduces the size of the state space and therefore the computational complexity (eliminating boundaries).
- Infinitely long: target sits at infinity, reduces computational complexity.
- Graph Gm contains m horizontal arrays of nodes. The figure shows the cylinder cut open and flattened out.

Observation Zones

- Each agent observes a set of links from its current location, the observation zone, using on board sensors.
- The observation zone used in the rest of this presentation is:
- Each agent observes the link costs of two links in the target direction,
- Each agent has no local information on any other link cost.
- Links not belonging to any observation zone are subject to a set of assumptions:
- Its cost belongs to the set {0, 0.5, …, 3}, and
- Independent Identically Distributed (i.i.d.).

- For example:
- Two UAV’s
- Each observes the cost of the two red links
- On the black links there is no more local information

Cooperating Agents

- Agents share information and have memory:
- What is observed by each agent is shared with all other agents
- Once a link cost is known, it remains known.
- Therefore: the cluster agents cooperate.

Goal for the Agents

- Goal: Find a path for each agent such that, under the assumptions stated, the following is minimized:
- Remark: discount factor a leads to giving less importance to future costs.
- In other words: Find a path for each agent so that the whole cluster moves in the cheapest possible way infinitely long in the direction of the target.
- Remark: After each move, the cluster sits on the same vertical line, or stage. The agents move synchronously.

- N: number of agents
- a: problem discount factor (0<a<1)
- ci,k: link cost of link traversed by agent i leaving stage k
- E[.]: expected value over the unknown costs ci,k, given the initial position of the cluster

Dynamic Programming Formulation

- To formulate the problem as a discrete dynamic program, write the total cost to be:

Cost to go from stage 0

to infinity, given the initial

state is x0

Cost to go from

stage 0 to stage 1

Cost to go from

stage 1 to infinity,

discounted by factor a.

- x0 comprises:
- Initial cluster position
- Link costs of observed links,
- at stage 0, i.e. the initial stage.

- u0 stands for the decision the agents take at stage 0
- p1 stands for the set of policies mk each
- agent uses at each stage in the future
- a policy mk(xk), takes the current state and gives us the input
- to be applied.

Dynamic Programming Formulation

- Rewrite to:
- And this becomes:
- Formulating this for a general stage k, yields :

x1 denotes the state

at stage 1

Dynamic Programming Formulation

- Since the cost to go to infinity from stage k as is equal to the cost to go when starting from stage k+1, Bellman’s equation for a infinite horizon discrete system dynamic program is:
- Remark:
- x = (s, a1, a2, b1, b2), where separation

s = 0 in case the two agents sit on one

node, as in (i). (a1,b1) = (a2,b2) = (a, b),

in case s = 0.

- In (ii), for both agents to go straight,

xnext = (1, b1, b2, 1, 2), where 1 and 2

denote variables unknown at the

current stage (and over which E[.] is taken in

Bellman’s equation).

Value Iteration

- We need to find J*(x), the optimal costs of the discounted problem, unique solution of Bellman’s equation, (J*(x): optimal value function):
- So that stationary policy m can be computed as:

(a stationary policy m is independent of the stage and is optimal iff for all states x, m attains the minimum in Bellman’s eq.)

- How do we compute J*(x) ?

With value iteration: a numerical iterative algorithm which, starting from arbitrary initial conditions, converges to J*(x).

Value Iteration

- The algorithm:
- This is the finite horizon version of Bellman’s equation, with terminal cost J0, as initial condition. (index k has been reversed for notational convenience) It can be proven that if , this yields J*.
- So: plug J0 in, in the right hand side, get J1, plug in and so on, infinitely many times.
- Properties of value iteration:
- , the error is bounded, d is a constant.
- , where

Lower bound

Upper bound

and

Two Agent Example

G7, infinite horizon, discount factor a = 0.8

Using optimal paths for two agents in , configurations ,

, and do not evolve into configurations with l > 2.

The UAV separation is bounded in

Conjecture 1: The UAV separation is bounded in

Extra nodes should not affect the separation adversely.

Conjecture 2: The UAV separation is bounded in

in a pair-wise sense.

Conjecture 1 should hold pair-wise in the n-agent setting.

The CPP Separation ResultsG7, infinite horizon, discount factor a = 0.8

- Communication power, hierarchy tier sizes

Outline of the Proof of the Bounded Agent Separation

- What do we want to prove?

Two agents using optimal policies don’t separate more than two nodes apart.

- Cn is the configuration in which the agents are n nodes apart.
- In other words: we want to show that configuration C3, cannot be reached.
- How? We show that there is

no possible state where it is optimal for the agents to reach a three node separation (C3).

How can the agents reach a three node separation…

- Given the agents are 0, 1 or 2 nodes apart (C0, C1 or C2), then in three situations (i, ii and iii), there exists a policy that leads to C3:
- The green dots denote the position of the agents.
- The blue arrows denote the policy leading to C3. Note that even in case (ii) this is the case since the graph is cylindrical.

… but never actually reach C3?

- In each case, the red arrows, not leading to C3, denote a cheaper policy than the blue policy, according to the optimal value function.
- Remark that the red policy is not necessarily optimal, we only claim that it is better than the blue policy.

Value Function J as function of separation s

- Observation zone:
- Link costs:

Pr(0) = 0.5

Pr(3) = 0.3

Pr(7) = 0.2

- # nodes per stage: m = 40
- Note:
- Minimum at s = 1 (adjacent nodes), smin = 1.
- E[J(.)|s] monotonically increasing for increasing s
- For large s, cost two cooperating agents incur = sum of cost two single non-cooperating agents incur

E[J(.)|s]

Separation s

How can we prove using a numerically obtained value function?

- Value iteration is a numerical iterative method to solve Bellman’s equation of this problem, if infinitely many iterations are done:

(Bellman’s eq.)

- It also provides an upper and a lower bound of the optimal value function (J*). These bounds can be made arbitrarily close.
- In practice: after a limited number of iterations (~10),

we find a reasonable upper and lower bound of .

- Let’s prove case (i):
- For any value of a1 and b1, the

red policy is better than the blue policy,

or, the agents prefer to move closer

than to move apart.

(i)

Case (i)

- Using Bellman’s equation we can calculate the cost to continue infinitely long on the graph for both policies:
- Blue:

Remark: J3* is the optimal value function for a separation of 3

and is function of the four observed links (only b1 known).

- Red:

J1* is the optimal value function for a separation of 1 and is function of the four observed links (only b1 known).

Expected cost to go from next

stage to infinity

using an optimal policy

Expected cost to go from

Current to next stage

Case (i)

- In order for the red policy to be better than the blue policy, the following inequality needs to be valid for each b1:

Namely, it needs to be cheaper to take red than to take blue.

- Or, since , we get:
- Replacing the unknown J3* and J1* with the appropriate upper and lower bounds yields:

If this equation is valid for all b1, then the previous equation is also valid.

Case (i)

- Conceptually:
- This is verified numerically. Cases (ii) and (iii) are similar.

Current Research

- Characteristics of the general value function on a graph with infinite diameter lead to properties of the optimal two-agent policy:

For example:

- If E[J(.)|s] can be shown to increase

monotonically with s, separation will be

bounded and sbound = smin+1.

- The separation bound can be shown to be independent of the particular assumptions on the link costs (pdf – cost values).
- Translate a real environment into a stochastic graph:
- Hierarchical approach: each layer is solving a mapping problem on a different scale using DP, to deal with scalability.
- On which level should cooperation occur?

Current Research

- Performance of agent wavescompared to synchronous motion:
- Agents (with different functionality) navigate sequentially.
- Later agents use information previous agents gather.
- Time is a key factor in determining optimality: synchronous requires less time than sequential.
- In between these two extreme cases: 2nd agent starts when 1st agent hasn’t reached the target yet. Optimal delay?
- Mixed worst-case probabilistic approach:
- Pdf on link costs is an element of a bounded set of pdfs.
- Compute the worst-case value function Jwc.
- Use local information on the actual pdf on the link costs to devise a better, local, value function with value iteration using Jwc as terminal cost.

- Different observation zones lead to different separation bounds.

For example: add two diagonal links.

Separation is most likely not bounded with a hard bound. BUT: With high probability separation will not exceed…

- Do all observation zones lead to a better performance per agent ? Is the extra cost of moving close together balanced by a cost decrease thanks to extra relevant information?

- Link costs varying with time, and consequently, agents which have the option to wait or return.
- Observation zone follows the direction in which each agent moves.
- Agents follow each other rather than staying parallel.

Previous extensions increase the computational complexity dramatically.

Can we extract properties on the navigation strategies and separations using approximate Dynamic Programming?

Trade-off between the agent DOC (direct operating cost) and benefit to the mission.

Optimal number of UAVs

How many UAVs are optimal?Efficiency is a function of the number of UAVs:

The more UAVs used, the higher the benefit of the cluster of UAVs.

Download Presentation

Connecting to Server..