50 likes | 188 Views
This project explores a variety of algorithmic comparisons and evaluations in reinforcement learning and planning. It focuses on comparing policy rollout variants using different bandit algorithms and Monte-Carlo tree search approaches. Key algorithms such as Forward Search Sparse Sampling and Least-Squares Policy Iteration will be implemented, with attempts to replicate literature results. Selected problem domains include classic games like Chess, Tetris, and Starcraft, as well as real-world challenges like compiler scheduling and fire management. This work aims to assess the effectiveness of various algorithms on challenging decision-making problems.
E N D
Algorithmic Evaluations/Comparisons • Compare variants of (nested) policy rollout using different bandit algorithms • Compare some variants of Monte-Carlo tree search • Implement an algorithm from the literature and attempt to replicate results, e.g. • Forward Search Sparse Sampling (a type of Monte-Carlo tree search algorithm) • Anytime AO* • Least-Squares Policy Iteration • I could give other pointers depending on interests
Algorithmic Comparisons • Compare some reinforcement learning algorithms across some interesting problems • E.g. compare TD-based vs. Policy Gradient based • You could use the domains I have in the Java framework for evaluation
Solve a Particular Problem • Pick a challenging sequential decision making problem • Apply one or more of our planning/learning approaches to it and evaluate • Problems from past projects: • Games • Tetris • Pokemon • Blockus • Chess • Backgammon • Othello • Clue • Space Wars (Galcon Fusion) • Starcraft • Pac Man
Solve a Particular Problem • Problems from past projects: • Compiler scheduling • Adaptive Java program optimization • Forest Fire Management • Crop Management • Optimizing Policies for Network Protocols • Controllers for Real-Time Strategy Games • Subproblems of the game • Optimizing file sharing policies • Reinforcement learning and Monte-Carlo were the most commonly applied solution approaches