1 / 28

Approximate Dynamic Programming Methods for Resource Constrained Sensor Management

Approximate Dynamic Programming Methods for Resource Constrained Sensor Management. John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL & LIDS. Outline. Motivation & assumptions Constrained Markov Decision Process formulation Communication constraint Entropy constraint

pfrederick
Download Presentation

Approximate Dynamic Programming Methods for Resource Constrained Sensor Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approximate Dynamic Programming Methodsfor Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL & LIDS

  2. Outline • Motivation & assumptions • Constrained Markov Decision Process formulation • Communication constraint • Entropy constraint • Approximations • Linearized Gaussian • Greedy sensor subset selection • n-Scan pruning • Simulation results

  3. Problem statement • Object tracking using a network of sensors • Sensors provide localization in close vicinity • Sensors can communicate locally at cost /f(range) • Energy is a limited resource • Consumed by communication, sensing and computation • Communication is orders of magnitude more expensive than other tasks • Sensor management algorithm must determine • Which sensors to activate at each time • Where the probabilistic model should be stored at each time • While trading off the competing objectives of estimation performance and communication cost

  4. Heuristic solution • At each time step activate the sensor with the most informative measurement (e.g. minimum expected posterior entropy)

  5. Heuristic solution • At each time step activate the sensor with the most informative measurement (e.g. minimum expected posterior entropy) • Must transmit probabilistic model of object state every time most informative sensor changes • Could benefit from using more measurements • Estimation quality vs energy cost trade-off

  6. Notation Time index: k Position of sensor s: ls Activated sensors: SkµS Leader node: lk Object state / location: xk / Lxk Probabilistic model: Xk = p(xk|z0:k{1) Sensor s measurement: zks History of incorporated measurements: z0:k{1

  7. Formulation • We formulate as a constrained Markov Decision Process • Maximize estimation performance subject to a constraint on communication cost • Minimize communication cost subject to a constraint on estimation performance • State is PDF (Xk = p(xk|z0:k{1)) and previous leader node (lk{1) • Dynamic programming equation for an N-step rolling horizon: • such that E{G(Xk,lk{1,uk:k+N{1)|Xk,lk{1} £0

  8. Lagrangian relaxation • We address the constraint using a Lagrangian relaxation: • Strong duality does not hold since the primal problem is discrete

  9. Communication-constrained formulation • Cost-per-stage is such that the system minimizes joint expected conditional entropy of object state over planning horizon: • Constraint applies to expected communication cost over planning horizon:

  10. Communication-constrained formulation • Integrating the communication costs into the per-stage cost: • Per-stage cost now contains both information gain and communication cost

  11. Information-constrained formulation • Cost-per-stage is such that the system minimizes the energy consumed over the planning horizon: • Constraint ensures that the joint entropy over the planning horizon is less than given threshold:

  12. Solving the dual problem • The dual optimization can be solved using a subgradient method • Iteratively perform the following procedure • Evaluate the DP for a given value of ¸ • Adjust value of ¸ • Increase if constraint is exceeded • Decrease (to a minimum value of zero) if constraint has slack • We employ a practical approximation which suits problems involving sequential replanning

  13. Evaluating the DP • DP has infinite state space, hence it cannot be evaluated exactly • Conceptually, it could be evaluated through simulation • Complexity is

  14. Branching due to measurement values • The first source of branching is that due to different values of measurement • If we approximate the measurement model as linear Gaussian locally around a nominal trajectory, then the future costs are dependent only on the control choices, not on the measurement values • Hence this source of branching can be eliminated entirely

  15. Greedy sensor subset selection • For large sensor networks complexity is high even for a single lookahead step due to consideration of sensor subsets • We decompose each decision stage into a generalized stopping problem, where at each substage we can • Add unselected sensor to the current selection • Terminate with the current selection • The per-stage cost can be conveniently decomposed into a per-substage cost which directly trades off the cost of obtaining each measurement against the information it returns:

  16. Greedy sensor subset selection • Outer DP recursion • Inner DP sub-problem

  17. T s1 s2 s3 T s2 s3 T s2 Greedy sensor subset selection • The per-stage cost can be conveniently decomposed into a per-substage cost which directly trades off the cost of obtaining each measurement against the information it returns:

  18. s1 s2 s3 G G G s1 s2 s3 s1 s2 s3 s1 s2 s3 G G G s1 s2 s3 n-Scan approximation • The greedy subset selection is embedded within a n-scan pruning method (similar to the MHT) which addresses the growth due to different choices of leader node sequence

  19. kk+1k+N–1 Planning horizon time k Planning horizon time k+1 k+1k+2k+N Sequential subgradient update • This method can be used to evaluate the dualized DP for a given value of the dual variable ¸ • Subgradient method evaluates DP for many different ¸ • Updates between adjacent time intervals will typically be small since planning horizons are highly overlapping: • Thus, as an approximation, in each decision stage we plan with a single value of ¸, and update it for the next time step using a single subgradient step

  20. Sequential subgradient update • We also need to constrain the value of the dual variable to avoid anomalous behavior when the constrained problem is infeasible • Update has two forms: additive and multiplicative

  21. Surrogate constraints • The information constraint formulation allows you to constrain the joint entropy of object state within the planning horizon: • Commonly, a more meaningful constraint would be the minimum entropy in the planning horizon: • This form cannot be integrated into the per-stage cost • An approximation: use the joint entropy constraint formulation in the DP, and the minimum entropy constraint in the subgradient update

  22. Communications cost: i = i0 i1 i2 i3 j = i4 Example – observation/communications Observation model: r

  23. Simulation results

  24. Simulation results • Typical behavior of subgradient adaptation in communication-constrained case is: • Take lots of measurements when there is no sensor hand-off in the planning horizon • Take no measurements when there is a sensor hand-off in the planning horizon • Longer planning horizons reduce this anomalous behavior

  25. Simulation results

  26. Significant Cost Reduction • Decomposed cost structure into a form in which greedy approximations are able to capture the trade-off • Complexity O([Ns2Ns]NNpN) reduced to O(NNs3) • Ns = 20 / Np = 50 / N = 10: 1.61090 8105 • Strong non-myopic planning for horizon lengths > 20 steps possible • Proposed an approximate subgradient update which can be used for adaptive sequential re-planning

  27. Conclusions • We have presented a principled formulation for explicitly trading off estimation performance and energy consumed within an integrated objective • Twin formulations for optimizing estimation performance subject to energy constraint, and optimizing energy consumption subject to estimation performance constraint • Decomposed cost structure into a form in which greedy approximations are able to capture the trade-off • Complexity O([Ns2Ns]NNpN) reduced to O(NNs3) • Ns = 20 / Np = 50 / N = 10: 1.61090 8105 • Strong non-myopic planning for horizon lengths > 20 steps possible • Proposed an approximate subgradient update which can be used for adaptive sequential re-planning

More Related