Reinforcement Learning and Tetris

Reinforcement Learning and Tetris Jared Christen

Tetris • Markov decision processes • Large state space • Long-term strategy without long-term knowledge

Background • Hand-coded algorithms can clear > 1,000,000 lines • Genetic algorithm by Roger Llima averages 42,000 lines • Reinforcement learning algorithm by Kurt Driessens averages 30-40 lines

Goals • Develop a Tetris agent that improves on previous reinforcement learning implementations • Secondary goals • Use as few handpicked features as possible • Encourage risk-taking • Include rarely-studied features of Tetris

Approach • TD() with a feedforward neural network

Neural Net Control • Inputs • Raw state – filled & empty blocks • Handpicked features • Outputs • Movements • Placements

Contour Matching

Structure Active tetromino Next tetromino Hold value Placement 1 value Held tetromino Placement n value Placement 1 score Placement 1 match length Placement n score Placement n match length

Experiments • 200 learning games • Averaged over 30 runs • Two-piece and six-piece configurations • Compare to benchmark contour matching agent

Results Six-piece Two-piece

Results

Conclusions • Accidentally developed a heuristic that beats previous reinforcement learning techniques • Six-piece’s outperformance of two-piece suggests there is some pseudo-planning going on • A better way to generalize the board state may be necessary

Reinforcement Learning and Tetris

Reinforcement Learning and Tetris

Presentation Transcript

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning for the game of Tetris using Cross Entropy

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

REINFORCEMENT LEARNING

Applying reinforcement learning to Tetris A reduction in state space

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Applying reinforcement learning to Tetris

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning