Monte carlo tree search insights and applications bcs real ai event
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Monte Carlo Tree Search: Insights and Applications BCS Real AI Event PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on
  • Presentation posted in: General

Monte Carlo Tree Search: Insights and Applications BCS Real AI Event. Simon Lucas Game Intelligence Group University of Essex. Outline. General machine intelligence: the ingredients Monte Carlo Tree Search A quick overview and tutorial Example application: Mapello

Download Presentation

Monte Carlo Tree Search: Insights and Applications BCS Real AI Event

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Monte carlo tree search insights and applications bcs real ai event

Monte Carlo Tree Search:Insights and ApplicationsBCS Real AI Event

Simon Lucas

Game Intelligence Group

University of Essex


Outline

Outline

  • General machine intelligence: the ingredients

  • Monte Carlo Tree Search

    • A quick overview and tutorial

  • Example application: Mapello

    • Note: Game AI is Real AI !!!

  • Example test problem: Physical TSP

  • Results of open competitions

  • Challenges and future directions


General machine intelligence the ingredients

General Machine Intelligence: the ingredients

  • Evolution

  • Reinforcement Learning

  • Function approximation

    • Neural nets, N-Tuples etc

  • Selective search / Sample based planning / Monte Carlo Tree Search


Conventional game tree search

Conventional Game Tree Search

  • Minimax with alpha-beta pruning, transposition tables

  • Works well when:

    • A good heuristic value function is known

    • The branching factor is modest

  • E.g. Chess: Deep Blue, Rybka

    • Super-human on a smartphone!

  • Tree grows exponentially with search depth


Monte carlo tree search insights and applications bcs real ai event

Go

  • Much tougher for computers

  • High branching factor

  • No good heuristic value function

  • MCTS to the rescue!

“Although progress has been steady, it will take many decades of research and development before world-championship–calibre go programs exist”. Jonathan Schaeffer, 2001


Monte carlo tree search mcts upper confidence bounds for trees uct further reading

Monte Carlo Tree Search (MCTS) Upper Confidence bounds for Trees (UCT)Further reading:


Attractive features

Attractive Features

  • Anytime

  • Scalable

    • Tackle complex games and planning problems better than before

    • May be logarithmically better with increased CPU

  • No need for heuristic function

    • Though usually better with one

  • Next we’ll look at:

    • General MCTS

    • UCT in particular


Mcts the main idea

MCTS: the main idea

  • Tree policy: choose which node to expand (not necessarily leaf of tree)

  • Default (simulation) policy: random playout until end of game


Mcts algorithm

MCTS Algorithm

  • Decompose into 6 parts:

  • MCTS main algorithm

    • Tree policy

      • Expand

      • Best Child (UCT Formula)

    • Default Policy

    • Back-propagate

  • We’ll run through these then show demos


Mcts main algorithm

MCTS Main Algorithm

  • BestChild simply picks best child node of root according to some criteria: e.g. best mean value

  • In our pseudo-code BestChild is called from TreePolicy and from MctsSearch, but different versions can be used

    • E.g. final selection can be the max value child or the most frequently visited one


Treepolicy

TreePolicy

  • Note that node selected for expansion does not need to be a leaf of the tree

  • But it must have at least one untried action


Expand

Expand


Best child uct

Best Child (UCT)

  • This is the standard UCT equation

    • Used in the tree

  • Higher values of c lead to more exploration

  • Other terms can be added, and usually are

    • More on this later


Defaultpolicy

DefaultPolicy

  • Each time a new node is added to the tree, the default policy randomly rolls out from the current state until a terminal state of the game is reached

  • The standard is to do this uniformly randomly

    • But better performance may be obtained by biasing with knowledge


Backup

Backup

  • Note that v is the new node added to the tree by the tree policy

  • Back up the values from the added node up the tree to the root


Mcts builds asymmetric trees demo

MCTS Builds Asymmetric Trees (demo)


All moves as first amaf rapid value action estimates rave

All Moves As First (AMAF),Rapid Value Action Estimates (RAVE)

  • Additional term in UCT equation:

    • Treat actions / moves the same independently of where they occur in the move sequence


Using for a new problem implement the state interface

Using for a new problem:Implement the State interface


Example application mapello

Example Application: Mapello


Othello

Othello

  • Each move you must Pincer one or more opponent counters between the one you place and an existing one of your colour

  • Pincered counters are flipped to your own colour

  • Winner is player with most pieces at the end


Basics of good game design

Basics of Good Game Design

  • Simple rules

  • Balance

  • Sense of drama

  • Outcome should not be obvious


Othello example white leads 58 from http radagast se othello help strategy html

Othello Example – white leads: -58(from http://radagast.se/othello/Help/strategy.html )


Monte carlo tree search insights and applications bcs real ai event


Monte carlo tree search insights and applications bcs real ai event


Monte carlo tree search insights and applications bcs real ai event


Black wins with score of 16

Black wins with score of 16


Mapello

Mapello

  • Take the counter-flipping drama of Othello

  • Apply it to novel situations

    • Obstacles

    • Power-ups (e.g. triple square score)

    • Large maps with power-plays e.g. line fill

  • Novel games

    • Allow users to design maps that they are expert in

    • The map design is part of the game

  • Research bonus: large set of games to experiment with


Example initial maps

Example Initial Maps


Or how about this

Or how about this?


Need rapidly smart ai

Need Rapidly Smart AI

  • Give players a challenging game

    • Even when the game map can be new each time

  • Obvious easy to apply approaches

    • TD Learning

    • Monte Carlo Tree Search (MCTS

    • Combinations of these …

      • E.g. Silver et al, ICML 2008

      • Robles et al, CIG 2011


Mcts see browne et al tciaig 2012

MCTS (see Browne et al, TCIAIG 2012)

  • Simple algorithm

  • Anytime

  • No need for a heuristic value function

  • E-E balance

  • Works well across a range of problems


Monte carlo tree search insights and applications bcs real ai event

Demo

  • TDL learns reasonable weights rapidly

  • How well will this play at 1 ply versus limited toll-out MCTS?


For strong play

For Strong Play …

  • Combine MCTS, TDL, N-Tuples


Where to play buy

Where to play / buy

  • Coming to Android (November 2012)

  • Nestorgames (http://www.nestorgames.com)


Mcts in real time games ptsp

MCTS in Real-Time Games: PTSP

  • Hard to get long-term planning without good heuristics


Optimal tsp order ptsp order

Optimal TSP order != PTSP Order


Mcts challenges and future directions

MCTS: Challenges and Future Directions

  • Better handling of problems with continuous action spaces

    • Some work already done on this

  • Better understanding of handling real-time problems

    • Use of approximations and macro-actions

  • Stochastic and partially observable problems / games of incomplete and imperfect information

  • Hybridisation:

    • with evolution

    • with other tree search algorithms


Conclusions

Conclusions

  • MCTS: a major new approach to AI

  • Works well across a range of problems

    • Good performance even with vanilla UCT

    • Best performance requires tuning and heuristics

    • Sometimes the UCT formula is modified or discarded

  • Can be used in conjunction with RL

    • Self tuning

  • And with evolution

    • E.g. evolving macro-actions


Further reading and links

Further reading and links

  • http://ptsp-game.net/

  • http://www.pacman-vs-ghosts.net/


  • Login