1 / 16

Planning to Gather Information

Planning to Gather Information. Richard Dearden University of Birmingham Joint work with Moritz Göbelbecker (ALU), Charles Gretton, Bramley Merton (NOC), Zeyn Saigol, Mohan Sridharan (Texas Tech), Jeremy Wyatt. Underwater Vent Finding. AUV used to find vents

Download Presentation

Planning to Gather Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Planning to Gather Information Richard Dearden University of Birmingham Joint work with Moritz Göbelbecker (ALU), Charles Gretton, Bramley Merton (NOC), Zeyn Saigol, Mohan Sridharan (Texas Tech), Jeremy Wyatt

  2. Underwater Vent Finding • AUV used to find vents • Can detect vent itself (reliably), plume of fresh water emitted • Problem is where to go to collect data to find the vents as efficiently as possible • Hard because plume detection is unreliable, can’t easily assign ‘blame’ for the detections we do make

  3. Vision Algorithm Planning • Goal: Answer queries and execute commands. • Is there a red triangle in the scene? • Move the mug to the right of the blue circle. • Our operators: colour, shape, SIFT identification, viewpoint change, zoom etc. • Problem: Build a plan to achieve the goal with high confidence

  4. Assumptions • The visual operators are unreliable • Reliability can be represented by a confusion matrix, computed from data • Speed of response and answering the query correctly are what really matters • We want to build the fastest plan that is ‘reliable enough’ • We should include planning time in our performance estimate too

  5. $ POMDPs • Partially Observable Markov Decision Problems • Markov Decision Problem: • (discrete) States, stochastic actions, reward • Maximise expected (discounted) long-term reward • Assumption: state is completely observable • POMDPs: MDPs with observations • Infer state from (sequence of) observations • Typically maintain belief state, plan over that

  6. POMDP Formulation States: Cartesian product of individual state vectors Actions: A = {Colour, Shape, SIFT, terminal actions} Observations: {red, green, blue, circle, triangle, square, empty, unknown} Transition function Observation function given by confusion matrices Reward specification time cost of actions, large +ve/-ve rewards on terminal actions Maintain belief over states, likelihood of action outcomes

  7. POMDP Formulation • For a broad query: ‘what is that?’ • For each ROI: • 26 states (5 colours x 5 shapes + term) • 12 actions (2 operations, 10 terminal actions SayBlueSquare, SayRedTriangle, SayUnknown, …) • 8 observations • For n ROIs: • 25n + 1 states • Impractical for even a very small number of ROIs • BUT: There’s lots of structure. How to exploit it?

  8. A Hierarchical POMDP • Proposed solution: Hierarchical Planning in POMDPs – HiPPo • One LL-POMDP for planning the actions in each ROI • Higher-level POMDP to choose which LL-POMDP to use at each step • Significantly reduces complexity of the state-action-observation space Which Region to Process? HL POMDP • Model creation and policy generation are automatic, based on the input query How to Process? LL POMDP

  9. Low-level POMDP • The LL-POMDP is the same as the flat POMDP • Only ever operates on a single ROI • 26 states, 12 Actions • Reward combines time-based cost for actions and answer quality • Terminal actions are answering the query for this region

  10. Example • Query: ‘where is the blue circle?’ • State space: {RedCircle, RedTriangle, BlueCircle, BlueTriangle, …, Terminal} • Actions: {Colour, Shape, …, SayFound, …} • Observations: {Red, Blue, NoColour, UnknownColour, Triangle, Circle, NoShape, UnknownShape, …} • Observation probabilities given by confusion matrix

  11. Policy • Policy tree for uniform prior initial state • We limit all LL policies to a fixed maximum number of steps Colour B R Shape sNotFound C T Shape Shape T C C T . . . sFound sNotFound sFound

  12. High-level POMDP • State space consists of the regions the object of interest is in • Actions are regions to process • Observations are whether the object of interest was found in a particular region • We derive the observation function and action costs for the HL-POMDP from the policy tree for the LL-POMDP • Treat the LL-POMDP as a black box that returns definite labels (not belief densities)

  13. Example • Query: ‘where is the blue circle?’ • State space: • Actions: {DoR1, DoR2, SayR1, SayR2, SayR1^R2, SayNo} • Observations: {FoundR1, ¬FoundR1, FoundR2, ¬FoundR2} • Observation probabilities are computed from the LL-POMDP

  14. Results (very briefly)

  15. Vent Finding Approach • Assume mapping using occupancy grid • Rewards only for visiting cells with vents in • State space also too large to solve POMDP • Instead do fixed length lookahead in belief space • Reasoning in belief space allows us to account for value of information gained from observations • Use P(vent|all observations so far) as heuristic value at end of lookahead

  16. What we’re working on now • Most of these POMDPs are too big to solve • Take a domain, problem description in a very general language, generate a classical planning problem for it • Assume we can observe any variable we care about • For each such observation, use a POMDP planner to determine the value of the variable with high confidence

More Related