1 / 29

CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment

CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment. Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML). OUTLINE. Project Overview Analytical results Maze Domain Experiments Results Conclusions and Future Work. MDP environment: Maze domain.

Download Presentation

CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)

  2. OUTLINE • Project Overview • Analytical results • Maze Domain • Experiments • Results • Conclusions and Future Work

  3. MDP environment:Maze domain • states • actions • transitions between states • immediate rewards • Markov property

  4. MDP environment:Maze domain • states • actions • transitions between states • immediate rewards • Markov property •  optimal value V* of each state

  5. Project Overview • Analyze MDP/POMDP domain in the presence of: • State Abstraction • Errors in state transition function • Errors in V* function due to • State Abstraction • Machine Learning • Evaluate effectiveness of lookahead search policy in the presence errors.

  6. Questions • When is the problem MDP ? • if not MDP: can we recast the Markov property? • limited lookahead: does it helps?

  7. No state abstraction,imperfect value function V • MDP • V now can be used as a heuristic • limited lookahead: usually admissible heuristic function • Combining lookahead with learning: • Learning Real Time A* • Real Time Dynamic Programming

  8. “Abstracted” value function • We know where we are but the value function is the same for all states in abstracted state G

  9. “Abstracted” value function In given abstracted state: value is the average over V* of all states in the abstracted state • not admissible • lookahead may help you to get outside the abstraction boundary

  10. Does lookahed always help? G Depth 1

  11. Does lookahed always help? G Depth 1

  12. Does lookahed always help? G Depth 1

  13. Does lookahed always help? G Depth 3

  14. Does lookahed always help? G Depth 3

  15. Does lookahed always help? G Depth 3

  16. Does lookahed always help? G Depth 3

  17. State abstraction • not Markovian • special case of POMDP • transition from one abstracted state to another and rewards depend on a history • some special cases when it are Markovian

  18. How to recast Markov property? • If we know underlying MDP: updating belief over states Fully observed MDP in belief space • solve the belief MDP • use V* of underlying states as heuristic • Real-Time Dynamic Programming in belief space

  19. How to recast Markov property? • If we do not know the underlying MDP: use the history as part of a state description How long path do we need to use? In general: the whole history Special cases: only part

  20. Error in transition function • Can be crucial • Agent can be easily trapped in loops

  21. Error in the transition function example: no state abstraction, perfect V* G 10 1 2 3 4 5 6 7 8 9 Two actions: right: left:

  22. 100% 100% Error in the transition function example: no state abstraction, perfect V* G 10 1 2 3 4 5 6 7 8 9 Two actions: right: left: real:

  23. 100% 100% Error in the transition function example: no state abstraction, perfect V* G 10 1 2 3 4 5 6 7 8 9 Two actions: right: left: real: what we think: 35% 65% 65% 35%

  24. Experimental Setup • 48x48 cell maze • 3 Experiments • State Abstraction • Machine Learning (ANN) • State Abstraction and Machine Learning • Error Measurements • Relative Score (global policy error) • Distance to goal (sample score error)

  25. State Abstraction Error(s) • Abstraction Tile size varied • k = 1, 2, 3, 4, 6, 8, 12, 24, 48 • Ply Depth 1 – 7 @ 10 games/ply depth

  26. Machine Learning Error • 2 – h – 100 ANN, inputs (x,y), out V*(s) • Error achieved by varying the number of hidden nodes (h) within a NN (1-20)

  27. State Abstraction + ML Error(s)

  28. Conclusion Most important results: • analysis of lookahead for “abstracted” value function: especially experimentally • demonstration of possible adverse effects of errors in transition function • answers for questions about Markov property and investigation of ways to restore it

  29. Future Work • Improve Policy Error Evaluation Measures • Further analytical work on lookahead

More Related