How to win a chinese chess game
Download
1 / 32

How to Win a Chinese Chess Game - PowerPoint PPT Presentation


  • 426 Views
  • Updated On :

How to Win a Chinese Chess Game. Reinforcement Learning Cheng, Wen Ju. Set Up. RIVER. General. Guard. Minister. Rook. Knight. Cannon. Pawn. Training. how long does it to take for a human? how long does it to take for a computer?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'How to Win a Chinese Chess Game' - Donna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
How to win a chinese chess game l.jpg

How to Win a Chinese Chess Game

Reinforcement Learning

Cheng, Wen Ju


Set up l.jpg
Set Up

RIVER









Training l.jpg
Training

  • how long does it to take for a human?

  • how long does it to take for a computer?

  • Chess program, “KnightCap”, used TD to learn its evaluation function while playing on the Free Internet Chess Server (FICS, fics.onenet.net), improved from a 1650 rating to a 2100 rating (the level of US Master, world champion are rating around 2900) in just 308 games and 3 days of play.


Training11 l.jpg
Training

  • to play a series of games in a self-play learning mode using temporal difference learning

  • The goal is to learn some simple strategies

    • piece values or weights


Why temporal difference learning l.jpg
Why Temporal Difference Learning

  • the average branching factor for the game tree is usually around 30

  • the average game lasts around 100 ply

  • the size of a game tree is 30100


Searching l.jpg
Searching

  • alpha-beta search

  • 3 ply search vs 4 ply search

  • horizon effect

  • quiescence cutoff search


Horizon effect l.jpg
Horizon Effect

t

t+1

t+2

t+3


Evaluation function l.jpg
Evaluation Function

  • feature

    • property of the game

  • feature evaluators

    • Rook, Knight, Cannon , Minister, Guard, and Pawn

  • weight:

    • the value of a specific piece type

  • feature function: f

    • return the current player’s piece advantage on a scale from -1 to 1

  • evaluation function: Y

    Y = ∑k=1 to 7 wk * fk


Td and updating the weights l.jpg
TD(λ) and Updating the Weights

wi, t+1 = wi, t + a (Yt+1 – Yt)Sk=1 to tlt-k∆ wiYk

= wi, t + a (Yt+1 – Yt)(fi, t + l fi, t-1 + l2fi, t-2 + … + lt-1fi, 1)

  • = 0.01

    learning rate

    –how quickly the weights can change

  • = 0.01

    feedback coefficient

    -how much to discount past values


Features table l.jpg
Features Table

Array of Weights


Example l.jpg
Example

t=5

t=6

t=7

t-8


Final reward l.jpg
Final Reward

  • loser

    • if is a draw, the final reward is 0

    • if the board evaluation is negative, then the final reward is twice the board

    • if the board evaluation is positive, then the final reward is -2 times the board evaluation

  • winner

    • if is a draw, the final reward is 0

    • if the board evaluation is negative, then the final reward is -2 times the board evaluation

    • if the board evaluation is positive, then the final reward is twice the board evaluation


Final reward20 l.jpg
Final Reward

  • the weights are normalized by dividing by the greatest weight

  • any negative weights are set to zero

  • the most valuable piece has weight 1


Summary of main events l.jpg
Summary of Main Events

  • Red’s turn

  • Update weights for Red using TD(λ)

  • Red does alpha-beta search.

  • Red executes the best move found

  • Blue’s turn

  • Update weights for Blue using TD(λ)

  • Blue does alpha-beta search

  • Blue executes the best move found (go to 1)


After the game ends l.jpg
After the Game Ends

  • Calculate and assign final reward for losing player

  • Calculate and assign final reward for winning player

  • Normalize the weights between 0 and 1


Results l.jpg
Results

  • 10 games series

  • 100 games series

  • learned weights are carried over into the next series

  • began with all weights initialized to 1

  • The goal is to learn the different the piece values that is close to the default values defined by H.T. Lau or even better


Observed behavior l.jpg
Observed Behavior

  • the early stages

    • played pretty randomly

  • after 20 games

    • had identified the most valuable piece – Rook

  • after 250 games

    • played better

    • protecting the valuable pieces, and trying to capture a valuable piece



Testing l.jpg
Testing

  • self-play games

    • Red played using the learned weights after 250 games

    • Blue used H.T. Lau’s equivalent of the weights

  • 5 games

    • red won 3

    • blue won once

    • one draw


Future works l.jpg
Future Works

8 different types or "categories" of features:

  • Piece Values

  • Comparative Piece Advantage

  • Mobility

  • Board Position

  • Piece Proximity

  • Time Value of Pieces

  • Piece Combinations

  • Piece Configurations




Conclusion l.jpg
Conclusion

  • Computer Chinese chess has been studied for more than twenty years. Recently, due to the advancement of AI researches and enhancement of computer hardware in both efficiency and capacity, some Chinese chess programs with grand-master level (about 6-dan in Taiwan) have been successfully developed.

  • Professor Shun-Chin Hsu of Chang-Jung University (CJU), who has involved in the development of computer Chinese chess programs for a long time of period, points out that “the strength of Chinese chess programs increase 1-dan every three years.” He also predicts that a computer program will beat the “world champion of Chinese chess” before 2012.


When and what l.jpg
When and What

  • 2004 World Computer Chinese Chess Championship

  • Competition Dates :

    •  June 25-26, 2004

  • Prizes :

    (1) First Place USD 1,500 A gold medal

    (2) Second Place USD 900 A silver medal

    (3) Third Place USD 600 A bronze medal

    (4) Fourth Place USD 300


References l.jpg
References

C. Szeto. Chinese Chess and Temporal Difference Learning

J. Baxter. KnightCap: A chess program that learns by combining TD(λ) with minimax search

T. Trinh. Temporal Difference Learning in Chinese Chess

http://chess.ncku.edu.tw/index.html


ad