Monte carlo tree search insights and applications bcs real ai event
Sponsored Links
This presentation is the property of its rightful owner.
1 / 39

Monte Carlo Tree Search: Insights and Applications BCS Real AI Event PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on
  • Presentation posted in: General

Monte Carlo Tree Search: Insights and Applications BCS Real AI Event. Simon Lucas Game Intelligence Group University of Essex. Outline. General machine intelligence: the ingredients Monte Carlo Tree Search A quick overview and tutorial Example application: Mapello

Download Presentation

Monte Carlo Tree Search: Insights and Applications BCS Real AI Event

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Monte Carlo Tree Search:Insights and ApplicationsBCS Real AI Event

Simon Lucas

Game Intelligence Group

University of Essex


Outline

  • General machine intelligence: the ingredients

  • Monte Carlo Tree Search

    • A quick overview and tutorial

  • Example application: Mapello

    • Note: Game AI is Real AI !!!

  • Example test problem: Physical TSP

  • Results of open competitions

  • Challenges and future directions


General Machine Intelligence: the ingredients

  • Evolution

  • Reinforcement Learning

  • Function approximation

    • Neural nets, N-Tuples etc

  • Selective search / Sample based planning / Monte Carlo Tree Search


Conventional Game Tree Search

  • Minimax with alpha-beta pruning, transposition tables

  • Works well when:

    • A good heuristic value function is known

    • The branching factor is modest

  • E.g. Chess: Deep Blue, Rybka

    • Super-human on a smartphone!

  • Tree grows exponentially with search depth


Go

  • Much tougher for computers

  • High branching factor

  • No good heuristic value function

  • MCTS to the rescue!

“Although progress has been steady, it will take many decades of research and development before world-championship–calibre go programs exist”. Jonathan Schaeffer, 2001


Monte Carlo Tree Search (MCTS) Upper Confidence bounds for Trees (UCT)Further reading:


Attractive Features

  • Anytime

  • Scalable

    • Tackle complex games and planning problems better than before

    • May be logarithmically better with increased CPU

  • No need for heuristic function

    • Though usually better with one

  • Next we’ll look at:

    • General MCTS

    • UCT in particular


MCTS: the main idea

  • Tree policy: choose which node to expand (not necessarily leaf of tree)

  • Default (simulation) policy: random playout until end of game


MCTS Algorithm

  • Decompose into 6 parts:

  • MCTS main algorithm

    • Tree policy

      • Expand

      • Best Child (UCT Formula)

    • Default Policy

    • Back-propagate

  • We’ll run through these then show demos


MCTS Main Algorithm

  • BestChild simply picks best child node of root according to some criteria: e.g. best mean value

  • In our pseudo-code BestChild is called from TreePolicy and from MctsSearch, but different versions can be used

    • E.g. final selection can be the max value child or the most frequently visited one


TreePolicy

  • Note that node selected for expansion does not need to be a leaf of the tree

  • But it must have at least one untried action


Expand


Best Child (UCT)

  • This is the standard UCT equation

    • Used in the tree

  • Higher values of c lead to more exploration

  • Other terms can be added, and usually are

    • More on this later


DefaultPolicy

  • Each time a new node is added to the tree, the default policy randomly rolls out from the current state until a terminal state of the game is reached

  • The standard is to do this uniformly randomly

    • But better performance may be obtained by biasing with knowledge


Backup

  • Note that v is the new node added to the tree by the tree policy

  • Back up the values from the added node up the tree to the root


MCTS Builds Asymmetric Trees (demo)


All Moves As First (AMAF),Rapid Value Action Estimates (RAVE)

  • Additional term in UCT equation:

    • Treat actions / moves the same independently of where they occur in the move sequence


Using for a new problem:Implement the State interface


Example Application: Mapello


Othello

  • Each move you must Pincer one or more opponent counters between the one you place and an existing one of your colour

  • Pincered counters are flipped to your own colour

  • Winner is player with most pieces at the end


Basics of Good Game Design

  • Simple rules

  • Balance

  • Sense of drama

  • Outcome should not be obvious


Othello Example – white leads: -58(from http://radagast.se/othello/Help/strategy.html )





Black wins with score of 16


Mapello

  • Take the counter-flipping drama of Othello

  • Apply it to novel situations

    • Obstacles

    • Power-ups (e.g. triple square score)

    • Large maps with power-plays e.g. line fill

  • Novel games

    • Allow users to design maps that they are expert in

    • The map design is part of the game

  • Research bonus: large set of games to experiment with


Example Initial Maps


Or how about this?


Need Rapidly Smart AI

  • Give players a challenging game

    • Even when the game map can be new each time

  • Obvious easy to apply approaches

    • TD Learning

    • Monte Carlo Tree Search (MCTS

    • Combinations of these …

      • E.g. Silver et al, ICML 2008

      • Robles et al, CIG 2011


MCTS (see Browne et al, TCIAIG 2012)

  • Simple algorithm

  • Anytime

  • No need for a heuristic value function

  • E-E balance

  • Works well across a range of problems


Demo

  • TDL learns reasonable weights rapidly

  • How well will this play at 1 ply versus limited toll-out MCTS?


For Strong Play …

  • Combine MCTS, TDL, N-Tuples


Where to play / buy

  • Coming to Android (November 2012)

  • Nestorgames (http://www.nestorgames.com)


MCTS in Real-Time Games: PTSP

  • Hard to get long-term planning without good heuristics


Optimal TSP order != PTSP Order


MCTS: Challenges and Future Directions

  • Better handling of problems with continuous action spaces

    • Some work already done on this

  • Better understanding of handling real-time problems

    • Use of approximations and macro-actions

  • Stochastic and partially observable problems / games of incomplete and imperfect information

  • Hybridisation:

    • with evolution

    • with other tree search algorithms


Conclusions

  • MCTS: a major new approach to AI

  • Works well across a range of problems

    • Good performance even with vanilla UCT

    • Best performance requires tuning and heuristics

    • Sometimes the UCT formula is modified or discarded

  • Can be used in conjunction with RL

    • Self tuning

  • And with evolution

    • E.g. evolving macro-actions


Further reading and links

  • http://ptsp-game.net/

  • http://www.pacman-vs-ghosts.net/


  • Login