1 / 12

Reinforcement Learning and Tetris

Reinforcement Learning and Tetris. Jared Christen. Tetris. Markov decision processes Large state space Long-term strategy without long-term knowledge. Background. Hand-coded algorithms can clear > 1,000,000 lines Genetic algorithm by Roger Llima averages 42,000 lines

sorcha
Download Presentation

Reinforcement Learning and Tetris

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reinforcement Learning and Tetris Jared Christen

  2. Tetris • Markov decision processes • Large state space • Long-term strategy without long-term knowledge

  3. Background • Hand-coded algorithms can clear > 1,000,000 lines • Genetic algorithm by Roger Llima averages 42,000 lines • Reinforcement learning algorithm by Kurt Driessens averages 30-40 lines

  4. Goals • Develop a Tetris agent that improves on previous reinforcement learning implementations • Secondary goals • Use as few handpicked features as possible • Encourage risk-taking • Include rarely-studied features of Tetris

  5. Approach • TD() with a feedforward neural network

  6. Neural Net Control • Inputs • Raw state – filled & empty blocks • Handpicked features • Outputs • Movements • Placements

  7. Contour Matching

  8. Structure Active tetromino Next tetromino Hold value Placement 1 value Held tetromino Placement n value Placement 1 score Placement 1 match length Placement n score Placement n match length

  9. Experiments • 200 learning games • Averaged over 30 runs • Two-piece and six-piece configurations • Compare to benchmark contour matching agent

  10. Results Six-piece Two-piece

  11. Results

  12. Conclusions • Accidentally developed a heuristic that beats previous reinforcement learning techniques • Six-piece’s outperformance of two-piece suggests there is some pseudo-planning going on • A better way to generalize the board state may be necessary

More Related