1 / 57

Lookahead pathology in real-time pathfinding

Lookahead pathology in real-time pathfinding. Mitja Luštrek Jožef Stefan Institute, Department of Intelligent Systems Vadim Bulitko University of Alberta, Department of Computer Science. Introduction Problem Explanation Remedy. Real-time single-agent heuristic search. Task:

lorrieg
Download Presentation

Lookahead pathology in real-time pathfinding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lookahead pathology in real-time pathfinding Mitja Luštrek Jožef Stefan Institute, Department of Intelligent Systems Vadim Bulitko University of Alberta, Department of Computer Science

  2. Introduction • Problem • Explanation • Remedy

  3. Real-time single-agent heuristic search • Task: • find a path from a start state to a goal state • Complete search: • plan the whole path to the goal state • execute the plan • example: A* [Hart et al. 68] • good: given an admissible heuristic, the path is optimal • bad: the delay before the first move can be large

  4. Real-time single-agent heuristic search • Incomplete search: • plan a part of the path to the goal • execute the plan • repeat • example: LRTA* [Korf 90], LRTS [Bulitko & Lee 06] • good: delay before the first move small, amount of planning per move bounded • bad: the path is typically not optimal

  5. Why do we need it? • Picture a real-time strategy game • The user commands dozens of units to move towards a distant goal • Complete search would have to compute the whole paths for all of them • Incomplete search computes just the first couple of steps

  6. Heuristic lookahead search Lookahead area Current state Goal state Lookahead depth d

  7. Heuristic lookahead search f = g + h True shortest distance g Estimated shortest distance h Frontier state

  8. Heuristic lookahead search Frontier state with the lowest f (fopt)

  9. Heuristic lookahead search

  10. Heuristic lookahead search h = fopt

  11. Heuristic lookahead search

  12. Lookahead pathology • Generally believed that larger lookahead depths produce better solutions • Solution-length pathology: larger lookahead depths produce worse solutions Degree of pathology = 2

  13. Lookahead pathology • Pathology on states that do not form a path • Error pathology: larger lookahead depths produce more suboptimal decisions Degree of pathology = 2 There is pathology

  14. Related: minimax pathology • Minimax backs up heuristic values from the leaves of the game tree to the root • Attempts to explain why backed-up heuristic values are better than static values • Theoretical analyses show that they are worse – pathology [Nau 79, Beal 80] • Explanations: • similarity of nearby positions in real games • realistic modeling of error • ... • Focus on why the pathology doesnot appear in practice

  15. Related: pathology in single-agent search • Discovered on synthetic search trees [Bulitko et al. 03] • Observed in eight puzzle [Bulitko 03] • appears with different evaluation functions • shown that the benefit from knowing the optimal lookahead depth is large • Explained on synthetic search trees [Luštrek 05] • caused by certain properties of trees • caused by inconsistent and inadmissible heuristics • Unexplored in pathfinding

  16. Introduction • Problem • Explanation • Remedy

  17. Our setting • HOG – Hierarchical Open Graph [Sturtevant et al.] • Maps from commercial computer games (Baldur’s Gate, Warcraft III) • Initial heuristic: octile distance (true distance assuming an empty map) • 1,000 problems (map, start state, goal state)

  18. On-policy experiments • The agent follows a path from the start state to the goal state, updating the heuristic along the way • Solution length and error over the whole path computed for each lookahead depth -> pathology d = 1 d = 2 d = 3

  19. Off-policy experiments • The agent spawns in a number of states • It takes one move towards the goal state • Heuristic not updated • Error is computed from these first moves -> pathology d = 3 d = 1, 2 d = 1 d = 1 d = 2 d = 2, 3 d = 3

  20. Basic on-policy experiment • A lot of pathology – over 60%! • First explanation: a lot of states are intrinsically pathological (off-policy mode) • Not true: only 3.9% are • If the topology of the maps is not at fault, perhaps the algorithm is to blame?

  21. Off-policy experiment on 188 states • Comparison not fair: • On-policy: pathology from error over a number of states • Off-policy: pathologicalness of single states • Fair: off-policy error over the same number of states as on-policy – 188 (chosen randomly) • Can use only error – no solution length off-policy • Not much less pathology than on-policy: 42.2% vs. 61.5%

  22. Tolerance • The first off-policy experiment showed little pathology, the second one quite a lot • Perhaps off-policy pathology is caused by minor differences in error – noise • Introduce tolerence t: • increase in error counts towards the pathology only if error (d1) > t ∙ error (d2) • set t so that the pathology in the off-policy experiment on 188 states is < 5%: t = 1.09

  23. Experiments with t = 1.09 • On-policy changes little vs. t = 1: 57.7% vs. 61.9% • Apparently on-policy pathology is more severe than off-policy • Investigate why! • The above experiments are the basic on-policy experiment and the basic off-policy experiment

  24. Introduction • Problem • Explanation • Remedy

  25. Hypothesis 1 • LRTS tends to visit pathological states with an above-average frequency • Test: compute pathology from states visited on-policy instead of 188 random states • More pathology than in random states: 6.3% vs. 4.3% • Much less pathology than basic on-policy: 6.3% vs. 57.7% • Hypothesis 1 is correct, but it is not the main reason for on-policy pathology

  26. Is learning the culprit? • There is learning (updating the heuristic) on-policy, but not off-policy • Learning necessary on-policy, otherwise the agent gets caught in infinite loops • Test: traverse paths in the normal on-policy manner, measure error without learning • Less pathology than basic on-policy: 20.2% vs. 57.7% • Still more pathology than basic off-policy: 20.2% vs. 4.3% • Learning is a reason, although not the only one

  27. Hypothesis 2 • Larger fraction of updated states at smaller depths Current lookahead area Updated state

  28. Hypothesis 2 • Smaller lookahead depths benefit more from learning • This makes their decisions better than the mere depth suggests • Thus they are closer to larger depths • If they are closer to larger depths, cases where a larger depth happens to be worse than a smaller depth are more common • Test: equalize depths by learning as much as possible in the whole lookahead area – uniform learning

  29. Uniform learning

  30. Uniform learning Search

  31. Uniform learning Update

  32. Uniform learning Search

  33. Uniform learning Update

  34. Uniform learning

  35. Uniform learning

  36. Uniform learning

  37. Uniform learning

  38. Pathology with uniform learning • Even more pathology than basic on-policy: 59.1% vs. 57.7% • Is Hypothesis 2 wrong? • Let us look at the volume of heuristic updates encountered per state generated during search • This seems to be the best measure of the benefit of learning

  39. Volume of updates encountered • Hypothesis 2 is correct after all

  40. Consistency • Initial heuristic is consistent • the difference in heuristic value between two states does not exceed the actual shortest distance between them • Updates make it inconsistent • Research on synthetic trees showed inconsistency causes pathology [Luštrek 05] • Uniform learning preserves consistency • It is more pathological than regular learning • Consistency is not a problem in our case

  41. Hypothesis 3 • On-policy: one search every d moves, so fewer searchs at larger depths • Off-policy: one search every move

  42. Hypothesis 3 • The difference between depths in the amount of search is smaller on-policy than off-policy • This makes the depths closer on-policy • If they are closer, cases where a larger depth happens to be worse than a smaller depth are more common • Test: search every move on-policy

  43. Pathology when searching every move • Less pathology than basic on-policy: 13.1% vs. 57.7% • Still more pathology than basic off-policy: 13.1% vs. 4.3% • Hypothesis 3 is correct, the remaining pathology due to Hypotheses 1 and 2 • Further test: number of states generated per move

  44. States generated / move • Hypothesis 3 confirmed again

  45. Summary of explanation • On-policy pathology caused by different lookahead depths being closer to each other in terms of the quality of decisions than the mere depths would suggest: • due to the volume of heuristic updates ecnountered per state generated • due to the number of states generated per move • LRTS tends to visit pathological states with an above-average frequency

  46. Introduction • Problem • Explanation • Remedy

  47. Is a remedy worth looking for? • Optimal lookahead depth selected for each problem: • Solution length = 107.9 • States generated / move = 73.6 • The answer is yes – solution length improved by 38.5%

  48. What can we do? • House + garden • Precompute the optimal depth for every start state

  49. Optimal depth per start state • Optimal lookahead depth selected for each start state: • Solution length: 132.4 • States generated / move: 59.3 • Similar to 1,000 problems – map representative

  50. Optimal depth per start state

More Related