1 / 31

Representing hierarchical POMDPs as DBNs for multi-scale robot localization

Representing hierarchical POMDPs as DBNs for multi-scale robot localization. G. Thocharous, K. Murphy, L. Kaelbling. Presented by: Hannaneh Hajishirzi. Outline. Define H-HMM Flattening H-HMM Define H-POMDP Flattening H-POMDP Approximate H-POMDP with DBN Inference and Learning in H-POMDP.

lixue
Download Presentation

Representing hierarchical POMDPs as DBNs for multi-scale robot localization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Representing hierarchical POMDPs as DBNs for multi-scale robot localization G. Thocharous, K. Murphy, L. Kaelbling Presented by: Hannaneh Hajishirzi

  2. Outline • Define H-HMM • Flattening H-HMM • Define H-POMDP • Flattening H-POMDP • Approximate H-POMDP with DBN • Inference and Learning in H-POMDP

  3. Introduction • H-POMDPs represent state-space at multiple levels of abstraction • Scale much better to larger environments • Simplify planning • Abstract states are more deterministic • Simplify learning • Number of free parameters is reduced

  4. Hierarchical HMMs • A generalization of HMM to model hierarchical structure domains • Application: NLP • Concrete states: emit single observation • Abstract states: emit strings of observations • Emitted strings by abstract states are governed by sub-HMMs

  5. Example • HHMM representing a(xy)+b | c(xy)+d When the sub-HHMM is finished, control is returned to wherever it was called from

  6. HHMM to HMM • Create a state for every leaf in HHMM

  7. HHMM to HMM • Create a state for every leaf in HHMM • Flat transition probability = • Sum( P( all paths in HHMM)) • Disadvantages: • Flattening loses modularity • Learning requires more samples

  8. : state at level d Representing HHMMs as DBNs if HMM at level d finished

  9. H-POMDPs • HHMMs with inputs and reward function • Problems: • Planning: Find mapping from belief states to actions • Filtering: Compute the belief state online • Smoothing: Compute offline • Learning: Find MLE of model parameters

  10. H-POMDP for Robot Navigation Flatmodel Hierarchical model 4 * Robot position: Xt (1..10) • * Abstract state: Xt1 (1..4)* Concrete state: Xt2 (1..3)* Observation: Yt (4 bits) In this paper, Ignore the problem of how to choose the actions

  11. State Transition Diagram for 2-H-POMDP Sample path:

  12. State Transition Diagram for Corridor Environment Abstract States Exit States Concrete States Entry States

  13. Flattening H-POMDPs • Advantages of H-POMDP over corresponding POMDP: • Learning is easier: Learn sub-models • Planning is easier: Reason in terms of “macro” actions

  14. 0.08 0.01 0.05 0.01 0.7 0.08 Dynamic Bayesian Networks STATE POMDP FACTORED DBN POMDP # of parameters # of parameters

  15. WEST WEST WEST EAST EAST WEST EAST EAST Representing H-POMDPs as DBNs FACTORED DBN H-POMDP STATE H-POMDP

  16. WEST WEST WEST EAST EAST WEST EAST EAST Representing H-POMDPs as DBNs FACTORED DBN H-POMDP STATE H-POMDP

  17. WEST WEST WEST EAST EAST WEST EAST EAST Representing H-POMDPs as DBNs FACTORED DBN H-POMDP STATE H-POMDP

  18. WEST WEST WEST EAST EAST WEST EAST EAST Representing H-POMDPs as DBNs FACTORED DBN H-POMDP STATE H-POMDP

  19. WEST WEST WEST EAST EAST WEST EAST EAST Representing H-POMDPs as DBNs FACTORED DBN H-POMDP STATE H-POMDP

  20. : Abstract location : Concrete location : Orientation : Exit node (5 values) H-POMDPs as DBNs : Observation : Action node Representing no-exit, s, n, l, r -exit

  21. Transition Model If e = no-exit otherwise Abstract horizontal transition matrix

  22. Transition Model If e = no-exit otherwise Probability of entering exit state e Concrete horizontal transition matrix If e = no-exit otherwise Concrete vertical entry vector

  23. Observation Model • Probability of seeing a wall or opening on each of 4 sides of the robot • Naïve Bayes assumption: where • Map global coordinate frame to robot’s local coordinate frame Then,  Learn the appearance of the cell in all directions

  24. Example

  25. Inference • Online filtering: • Input of controller: MLE of the abstract and concrete states • Offline smoothing: • O(DK1.5D T) D: # of dimensions K: # of states in each level • 1.5D: size of largest clique in DBN =The state nodes at t-1 + half of the state nodes at t • Approximation (belief propagation): O(DKT)

  26. Learning • Maximum likelihood parameter estimate using EM • In E step, compute: • In M step, compute normalizing matrix of expected counts:

  27. Learning (Cont.) Concrete horizontal transition matrix: Exit probabilities: Vertical transition vector:

  28. Estimating Observation Model • Map local observations into world-centered Probability of observing y, facing North

  29. Hierarchical Localizes better Factored DBN H-POMDP H-POMDP STATE POMDP Before training

  30. Conclusions • Represent H-POMDPs with DBNs • Learn large models with less data • Difference with SLAM: • SLAM is harder to generalize

  31. WEST WEST WEST EAST EAST WEST EAST EAST Complexity of Inference STATE H-POMDP FACTORED DBN H-POMDP Number of states:

More Related