Monte Carlo Tree Search: Insights and Applications BCS Real AI Event

Download Presentation

Monte Carlo Tree Search: Insights and Applications BCS Real AI Event

Loading in 2 Seconds...

- 100 Views
- Uploaded on
- Presentation posted in: General

Monte Carlo Tree Search: Insights and Applications BCS Real AI Event

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Monte Carlo Tree Search:Insights and ApplicationsBCS Real AI Event

Simon Lucas

Game Intelligence Group

University of Essex

- General machine intelligence: the ingredients
- Monte Carlo Tree Search
- A quick overview and tutorial

- Example application: Mapello
- Note: Game AI is Real AI !!!

- Example test problem: Physical TSP
- Results of open competitions
- Challenges and future directions

- Evolution
- Reinforcement Learning
- Function approximation
- Neural nets, N-Tuples etc

- Selective search / Sample based planning / Monte Carlo Tree Search

- Minimax with alpha-beta pruning, transposition tables
- Works well when:
- A good heuristic value function is known
- The branching factor is modest

- E.g. Chess: Deep Blue, Rybka
- Super-human on a smartphone!

- Tree grows exponentially with search depth

- Much tougher for computers
- High branching factor
- No good heuristic value function
- MCTS to the rescue!

“Although progress has been steady, it will take many decades of research and development before world-championship–calibre go programs exist”. Jonathan Schaeffer, 2001

Monte Carlo Tree Search (MCTS) Upper Confidence bounds for Trees (UCT)Further reading:

- Anytime
- Scalable
- Tackle complex games and planning problems better than before
- May be logarithmically better with increased CPU

- No need for heuristic function
- Though usually better with one

- Next we’ll look at:
- General MCTS
- UCT in particular

- Tree policy: choose which node to expand (not necessarily leaf of tree)
- Default (simulation) policy: random playout until end of game

- Decompose into 6 parts:
- MCTS main algorithm
- Tree policy
- Expand
- Best Child (UCT Formula)

- Default Policy
- Back-propagate

- Tree policy
- We’ll run through these then show demos

- BestChild simply picks best child node of root according to some criteria: e.g. best mean value
- In our pseudo-code BestChild is called from TreePolicy and from MctsSearch, but different versions can be used
- E.g. final selection can be the max value child or the most frequently visited one

- Note that node selected for expansion does not need to be a leaf of the tree
- But it must have at least one untried action

- This is the standard UCT equation
- Used in the tree

- Higher values of c lead to more exploration
- Other terms can be added, and usually are

- Each time a new node is added to the tree, the default policy randomly rolls out from the current state until a terminal state of the game is reached
- The standard is to do this uniformly randomly
- But better performance may be obtained by biasing with knowledge

- Note that v is the new node added to the tree by the tree policy
- Back up the values from the added node up the tree to the root

- Additional term in UCT equation:
- Treat actions / moves the same independently of where they occur in the move sequence

- Each move you must Pincer one or more opponent counters between the one you place and an existing one of your colour
- Pincered counters are flipped to your own colour
- Winner is player with most pieces at the end

- Simple rules
- Balance
- Sense of drama
- Outcome should not be obvious

- Take the counter-flipping drama of Othello
- Apply it to novel situations
- Obstacles
- Power-ups (e.g. triple square score)
- Large maps with power-plays e.g. line fill

- Novel games
- Allow users to design maps that they are expert in
- The map design is part of the game

- Research bonus: large set of games to experiment with

- Give players a challenging game
- Even when the game map can be new each time

- Obvious easy to apply approaches
- TD Learning
- Monte Carlo Tree Search (MCTS
- Combinations of these …
- E.g. Silver et al, ICML 2008
- Robles et al, CIG 2011

- Simple algorithm
- Anytime
- No need for a heuristic value function
- E-E balance
- Works well across a range of problems

- TDL learns reasonable weights rapidly
- How well will this play at 1 ply versus limited toll-out MCTS?

- Combine MCTS, TDL, N-Tuples

- Coming to Android (November 2012)
- Nestorgames (http://www.nestorgames.com)

- Hard to get long-term planning without good heuristics

- Better handling of problems with continuous action spaces
- Some work already done on this

- Better understanding of handling real-time problems
- Use of approximations and macro-actions

- Stochastic and partially observable problems / games of incomplete and imperfect information
- Hybridisation:
- with evolution
- with other tree search algorithms

- MCTS: a major new approach to AI
- Works well across a range of problems
- Good performance even with vanilla UCT
- Best performance requires tuning and heuristics
- Sometimes the UCT formula is modified or discarded

- Can be used in conjunction with RL
- Self tuning

- And with evolution
- E.g. evolving macro-actions

- http://ptsp-game.net/
- http://www.pacman-vs-ghosts.net/