1 / 44

Using Plans to Reduce Search

Using Plans to Reduce Search. Wei Wei. Topics I’ll try to address. Wilkins’ work on using plan to reduce search in chess Junghanns and Schaeffer’s work on search in Sokoban Our attempt to reduce search in solving Sokoban problems. Reduce search in chess.

palma
Download Presentation

Using Plans to Reduce Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Plans to Reduce Search Wei Wei

  2. Topics I’ll try to address • Wilkins’ work on using plan to reduce search in chess • Junghanns and Schaeffer’s work on search in Sokoban • Our attempt to reduce search in solving Sokoban problems

  3. Reduce search in chess • Chess is an honor of AI. But, most successful chess programs use brute force search • Search is not practical in many other games such as Go due to the big branching factor. • Human players use very skinny search trees.

  4. Wilkins’ work • Use knowledge, patterns and planning to control search. • PARADISE (Wilkins 1979) finds the best move in tactically sharp positions. • “Tactically sharp”: success can be judged by the winning of material.

  5. Wilkins’ work, cont. • Get right answers in 89 out of 92 positions. Some as deep as 26 plies. • Successful in this restricted domain. • A set of about 200 rules as the knowledge base.

  6. Production Rules: An example ((DMP1) (NEVER (EXISTS (SQ)(PATTERN MOBIL DMP1 SQ) (NEVER (EXISTS (P1)(PATTERN ENPRIS P1 DMP1))) (ACTION ATTACK((OTHER-COLOR DMP1)(LOCATION DMP1) (THREAT (WIN DMP1)) (LIKELY 0))) This rule captures “a trapped piece”

  7. Templates in plans • (P SQ) move P to SQ • (NIL SQ) move any piece to SQ • (P NIL) move P to any SQ • (ANYBUT P) move any piece other than P • NIL matches any defensive move

  8. A plan produced by PARADISE (((WN N5) (((BN N4)(SAFEMOVE WR Q7) (((BK NIL)(SAFECAPTURE WR BR)) ((ANYBUT BK)(SAFECAPTURE WR BK)))) ((BN N4)(CHECKMOVE WR Q7)(BK NIL)(SAFECAPTURE WR BQ)))) ((THREAT (PLUS(EXCHVAL WN N5)(FORK WR BK BR))) (LIKELY 0)) ((THREAT(PLUS(EXCHVAL WN N5)(EXCH WR BQ))) (LIKELY 0)))

  9. Knowledge Source (KS) • In the previous plan, SAFEMOVE, CHECKMOVE, SAFECAPTURE are all KSes. • Each KS provides the knowledge necessary to understand and reason about the abstract concept.

  10. KS cont. • A KS is a group of productions and a list of variables. • For example, ATTACK is a KS, and has 2 variables: COL and SQ, as well as a set of productions that know how to attack SQ for side COL. • PARADISE treats every KS as a subgoal and produce plan to achieve this subgoal.

  11. Plan • THREAT • SAVE • LOSS • LIKELY: branches. If every step forcing, the LIKELY value is 0

  12. Creating plans • The static analysis process posts a THREAT KS. • THREAT KS post other KS, such as MOVE, SAFEMOVE, etc.

  13. Modification search methods • B* search (Berliner 1979): use range to express values. • We see a plan in PARADISE is a tree. • In the tree search, it is knowledge-controlled rather than parameter-controlled.

  14. B* search • Use ranges to express values: give more space to alpha-beta pruning • Best-first search • A threshold is defined: whenever offense wins by 2 pawns, stop search.

  15. The limited domain helps search • The program knows each position is sharp in the sense the offense can get material gain. • The threshold (2 pawns) helps the PARADISE terminate the search, and thus makes it “parameter-controlled”. • It is easier to make plans in sharp positions because more explicit concepts are involved.

  16. Why doesn’t it work in general • Advantages other than material are hard to capture. • Without a clear threshold, there is no way to terminate a search. • He didn’t have sophisticated planners at that time.

  17. Recap: PARADISE • Developed in late 70’s • Simplifies the problem by picking “sharp” positions. • Achieve the goal of knowledge-controlled search by planning and complicated pruning techniques. • PARADISE: 10-100 nodes • Brute force: 1000-100,000 nodes

  18. Revisit this problem • We revisit this problem because • It is a core problem in AI. • With the recent advances in searching, planning, and learning, we have more powerful tools than ever.

  19. Why not on chess again? • Deep Blue has beaten the human champion. Can we do better? • Chess is a complicated problem, many rules involved. • We will prefer a problem with less rules, and more related to practical use. • A better understanding of how to reduce search will lead to new applications in e.g. theorem proving and program verification.

  20. Sokoban

  21. A game demo: stage 17

  22. Sokoban is PSPACE-complete • J. Culberson 1997. Proven by using Sokoban to simulate a finite tape TM. • The complexity of “popular Sokoban instance”, which means all goals are contiguous, is unknown.

  23. Junghanns and Schaeffer’s work • They use A* search plus domain-specific enhancements to solve this problems. • Pure A* solves none of the 90 instances.

  24. What makes it hard for domain independent methods? • Underlying directed graph: deadlock • Long solution length (up to 674) and large branching factor produce a large search space • Solutions are sequential. Subgoals interrelated. • No simple lower bound on solution length.

  25. Domain-dependent enhancements • Over 3 years, they have solved 52 out of the 90 instances. • Lower bound (0) • Transposition table (6) • Move ordering (6) • Deadlock table 4*5 (8) • Tunnel macros (10)

  26. Domain-dependent enhancements cont. • Goal macros (26) • Pattern Search (46) • Relevance cuts (47) : not safe • Overestimate (52) : not optimal

  27. Insights: • What improves the performance most are the “dynamic” knowledge (gleaned from search) • Examples: deadlock table, pattern search, transposition table.

  28. Conclusion: • A* search plus all kinds of domain-dependent enhancements can improve the performance dramatically, though still not satisfactory. • Search power, rather than human advice, works.

  29. Our goal: • Use knowledge, and planning to reduce search in this field. • Ideally, we could use learning to learn the knowledge needed in a short period of exploration. • Junghanns and Schaeffer’s work gives us a good comparison.

  30. How about current planners? • Blockbox used more than one hour to solve a two-ball instance. (a few seconds to solve a one-ball instance.) • Planners are not good at dealing with long-range goal interactions. (McDermott 1998)

  31. Domain knowledge is essential • We need to formalize the knowledge humans have. It is hard to formalize some “easy” concepts. • For example, rooms, tunnels, dead ends, entrances, goal area, etc.

  32. We have a complex definition of room here: • A room is … • Any sq. in a 2*2 grid is a REG sq. • Any nonREG sq. next to a REG sq is a WIR sq. • A room is a set of REG or WIR sqs such that any two sqs are connected only by REG sqs in the path.

  33. Why need room?

  34. An essential concept: deadlock • If we could define deadlock, we could say, our next goal is to push one ball into a goal without causing deadlock. It is always true. • So, judging deadlock is PSPACE-complete. • But still, we need to recognize “local” deadlock.

  35. Deadlocks

  36. How to detect deadlocks b a

  37. More complicated situations 2 3 1 4 a H b 2 3 1 4 H

  38. Deadlocks cont. H H a b

  39. Deadlocks: subst rules • Classes: Wall > Ball > Empty > Goal • replacing a low-class sq. with a high-class sq keeps deadlocks

  40. Deadlock: another method • Proposed in Junghanns and Schaeffer, 1999 • Basic idea: solve the one-ball problem, in there are balls in either the ball-path or the man-path, add those ball and solve it again. • Shrink: after finding a deadlock, try all proper subset to find smaller deadlocks.

  41. Deadlock: another method, cont. • Advantage: find some global deadlocks. • Disadvantages: • The method is neither sufficient nor necessary • Computationally expensive

  42. Why need logic? • Database has a cut-off size, never solves problem like:

  43. Tasks: • Formalize the knowledge humans use • Incorporate all the knowledge into a planner • Find a planner suitable for a large amount of domain knowledge • Hopefully, beat brute force methods • Can we learn those knowledge automatically?

  44. Difficulties • Hard to formalize the vague concepts • No current planner can generate long plans • Category III rules cannot be captured into constraint-based planners. (Huang et al 1999) Category III: control that depends on current state and requires dynamic user-defined predicates.

More Related