Loading in 5 sec....

Evaluation-Function Based Monte-Carlo LOAPowerPoint Presentation

Evaluation-Function Based Monte-Carlo LOA

- 120 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Evaluation-Function Based Monte-Carlo LOA' - tyanne

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

- Background
- MC-LOA
- Evaluators and Simulations
- Experiments
- Conclusions

- Monte-Carlo Tree Search (Kocsis et al., 2006, Coulom 2007)
- Revolution in Go
- Applied in Amazons, Clobber, Havannah, Hex, GGP etc
- Here we focus on the sudden-death game Lines of Action (LOA)

Test Domain: Lines of Action (LOA)

- Two-person zero-sum chess-like connection game
- A move takes place in a straight line, exactly as many squares as there are pieces of either colour anywhere along the line of movement
- Goal is to group your pieces into one connected unit

Computer LOA

- 1st LOA program, Stanford AI laboratory 1975
- Since 2000, it is also played at the Computer Olympiad
- Several programs: YL (Björnnson), Mona (Billings), Bing (Helmstetter), T-T (JAIST Lab), Lola (Coulom)
- All programs use αβ search + evaluation function
- αβ search: null-move, multi-cut, RPS
- Evaluation functions are quite advanced

Monte-Carlo Tree Search (MCTS)

- A best-first search guided by the results of Monte-Carlo simulations
- We will discuss the steps for MC-LOA

- Traversal from the root until we reach a position which is not part of the tree
- Selection Strategy: Controls exploitation and exploration balance
- UCT (Kocsis, 2006) and Progressive BIAS (PB) (Chaslot et al. 2008)
- Selects the child k of the node p according:
- UCT: vi is the value of the node i, niis the visit count of i, and np is the visit count of p. C is a coefficient
- PB: Pmc is the transition probability of a move category mc, W is a constant

UCT

PB

Progressive Bias: Transition Probabilities

- Transition Probabilities originally used in Realization Probability Search (Tsuruoka, 2002)
- Transition Probability: Probability that a move belonging to a category mc will be played
- Statistics are retrieved from self-play

Progressive Bias: Move Categories

- Move categories in LOA:
- Captures or non-captures
- Origin and destination of the move’s from and to squares
- Number of squares traveled from- or towards to the centre-of-mass

- 277 move categories

- If the node has been visited fewer times than a threshold T, the next move is selected pseudo-randomly according to the simulation strategy
- Their move category weights

- At a leaf node:
- All moves are generated and checked for a mate in one

- Play-out step: outside the tree, self-play

- The result of a simulated game k is back-propagated from the leaf node L, through the previously traversed path
- (+1 win, 0 draw, -1 loss)

- The value v of a node is computed by taking the average of the results of all simulated games made through this node

MCTS – Solver: Backpropagation

- MCTS-Solver (Winands et al. 2008)
- For terminal positions assign the values ∞ (Win) or -∞ (Loss)
- The internal nodes are assigned in a minimax way

- MCTS does not require a positional evaluation function, but is based on the results of simulations
- Not good enough for LOA
- How to use a positional evaluation function?
- Four different strategies
- Evaluation Cut-off
- Corrective
- Greedy
- Mixed

Evaluation function: (MIA 2006)

- Concentration
- Centralisation
- Centre-of-mass position
- Quads
- Mobility
- Walls
- Connectedness
- Uniformity
- Variance

- Stops a simulated game before a terminal state is reached if, according to a heuristic knowledge, the game is judged to be effectively over
- Starting at second ply in the play-out, every three ply the evaluation function is called
- If the value exceeds a certain threshold, a win is scored (700)
- If the value is below a certain threshold, a loss is scored (-700)

- Function can return a quite trustworthy score, more so than even elaborate simulation strategies
- The game can thus be safely terminated both earlier and with a more accurate score than if continuing the simulation

- Minimizing the risk of choosing an obviously bad move
- First, we evaluate the position for which we are choosing a move
- Next, we generate the moves and scan them to get their weights
- If the move leads to a successor which has a lower evaluation score than its parent, we set the weight of a move to a preset minimum value (close to zero)
- If a move leads to a win, it will be immediately played

- The move leading to the position with the highest evaluation score is selected
- We evaluate only moves that have a good potential for being the best
- The k-best moves according to their move category weights are fully evaluated (k = 7)
- When a move leads to a position with an evaluation over a preset threshold, the play-out is stopped and scored as a win
- The remaining moves — which are not heuristically evaluated — are checked for a mate
- Evaluation function has small a random factor

- Greedy strategy could be too deterministic
- The Mixed strategy combines the Corrective strategy and the Greedy strategy
- The Corrective strategy is used in the selection step, i.e., at tree nodes where a simulation strategy is needed (i.e., n < T ) as well as in the ﬁrst position entered in the play-out step
- For the remainder of the play-out the Greedy strategy is applied

- Test the strength against a non-MC program: MIA
- Won the 8th, 9th and 11th Computer Olympiad
- αβ depth-first iterative-deepening search in the PVS frame work
- Enhancements: Transposition Table, ETC, killer moves, relative history heuristic etc.
- Variable depth: Multi-cut, (adaptive) null move, Enhanced realization probability search, quiescence search

Parameter Tuning: Evaluation Cut-off

- Tune the bounds of Evaluation Cut-off Strategy
- 3 programs: No-Bound, MIA III and MIA 4.5
- 1000 game results, 1 sec per move

Round-Robin Tournament Results

- 1000 Games
- 1 sec per move

- Mixed Strategy
- 5 sec a move
- Root-Parallelization

- No parallel version of MIA
- Two-threaded MC-LOA program competed against MIA, where MIA was given 50% more time
- Simulating a search-efficiency increase of 50% if MIA were to be given two processors
- A 1,000 game match resulted in a 52% winning percentage for MC-LOA

- Applying an evaluation function to stop simulations, increases the playing strength
- Mixed strategy of playing greedily in the play-out phase, but exploring more in the earlier selection phase, works the best

- MCTS using simple root-parallelization outperforms MIA
- MCTS programs are competitive with αβ-based programs in the game LOA

Download Presentation

Connecting to Server..