1 / 30

Learning Motion Prediction Models for Opponent Interception

Learning Motion Prediction Models for Opponent Interception. Bulent Tastan David Chang Gita Sukthankar. Intercepting Opponents. The ability to intercept opponents is an important aspect of many adversarial games. Human players exhibit user-specific movement preferences

step
Download Presentation

Learning Motion Prediction Models for Opponent Interception

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Motion Prediction Models for Opponent Interception BulentTastan David Chang GitaSukthankar

  2. Intercepting Opponents • The ability to intercept opponents is an important aspect of many adversarial games. • Human players • exhibit user-specific movement preferences • don’t necessarily prefer the shortest routes • are capable of intercepting opponents in a partially occluded map • This paper presents a method for learning user path preferences from data and planning interception routes. University of Central Florida gitars@eecs.ucf.edu

  3. Framework (2) Opponent tracking using Particle Filters (1) Learning motion models (3) Planning to intercept University of Central Florida gitars@eecs.ucf.edu

  4. Example Scenario • Bot plays a series of repeated games against a human to learn the human’s evasion strategies. • Human needs to safely cross the area and is initially occluded from the bot. • The bot’s goal is to intercept the human player before they leave the map. • Training map is a subsection of larger Unreal Tournament maze. University of Central Florida gitars@eecs.ucf.edu

  5. Related Work • Particle filters for opponent modeling • A sequential Monte Carlo state estimation technique in which the current probability distribution is represented as a set of particles and importance weights which are resampled and reweighted based on observed data • Multiple aspects of the framework can be configured • (Bererton 2004): modify number of particles to adapt the difficulty level of the AI • (Weber et al. 2011): Starcraft unit movement • (Hladky and Bulitko 2008): learning motion models from game logs in Counterstrike Our method can learn motion models from a small number of logs and generalize them using inverse reinforcement learning .

  6. Learning to intercept Data collection Max-Entropy IRL University of Central Florida gitars@eecs.ucf.edu

  7. Data Collection • Gather a small set of traces from a specific human player • Player is instructed to use a small subset of the entrances/exits on the map • Log traces are converted into a feature-based representation • Features include: 1) distance to corners 2) distance to map center 3) distance to nearest exit 4) quadrant information (binary descriptor). • Data is used to learn a user-specific model of the player’s evasion preferences in the form of expected feature counts.

  8. Learning to intercept Max-Entropy IRL (Ziebert et al, 2008) University of Central Florida gitars@eecs.ucf.edu

  9. Inverse Reinforcement Learning • Inverse RL is a mechanism for learning the implicit reward structure of an MDP from demonstrations of a policy. • Under constrained problem—many reward vectors map to the same policy. • Assumption: human player is acting to optimize a hidden reward metric. • Iterative process of selecting a reward, creating a policy that optimizes the rewards, modifying the reward to make the policy “more similar” to the demonstrations • Many different ways of defining similarity function. Reward Vector Demonstrations (Policy) IRL

  10. Max Entropy IRL (Ziebart et al, 2008) Input: frequency that features were viewed inplayer traces Reward Model: linear weighted combinationof features Output: weights 1) Calculate expectation of player’s trajectoryviewing features 2) Policy is expressed as the probability of player taking an action conditioned on stateand parameter. 3) Find weights that maximize the loglikelihood of seeing the trajectories 4) Use gradient descent to improve theweights 5) Forward-backward procedure is used tocalculate the feature expectation for weights

  11. Framework University of Central Florida gitars@eecs.ucf.edu

  12. Framework (2) Opponent tracking using Particle Filters University of Central Florida gitars@eecs.ucf.edu

  13. Particle Filter Opponent tracking • Generate candidate paths using IRL • Candidate paths are used as the motion model for the particle filter • Particle filter can be run forward in time to predict where the opponent will be at longer time horizons • Evaluate 2 motion models: • IRL motion model uses the paths • Brownian motion assumes there’s no path information

  14. Particle Filter Tracker • Generate a set of particle that match the prior probability distribution • Use the motion model to make the prediction for the next time step • Reweight the particles based on the observation (if any) • Use importance sampling to resample the particles based on the new weights

  15. PF without Path Info

  16. PF with Max-Entropy IRL

  17. Tracking Error Results • Verification that the first part of the prediction part of the system works • Measure tracking error for different specific entrance and goal pairs • The error is reduced substantially using the IRL motion model. • However tracking is not the whole story---the planning model matters as well.

  18. Framework (3) Planning to intercept University of Central Florida gitars@eecs.ucf.edu

  19. Planning to intercept • Centroid: the center of the entire particle set • Uncertainty Elimination: Maximum number of particles within the bot's sensor radius • Cluster: Particles are clustered, and the best cluster centroid is selected University of Central Florida gitars@eecs.ucf.edu

  20. Centroid Planner

  21. Uncertainty Elimination Planner

  22. Cluster Planner

  23. Planners University of Central Florida gitars@eecs.ucf.edu

  24. Evaluations • Models: IRL, Brownian motion • Planners: centroid, uncertainty elimination, and cluster • Delays: running the particle filter at different time horizons University of Central Florida gitars@eecs.ucf.edu

  25. Results University of Central Florida gitars@eecs.ucf.edu

  26. Results University of Central Florida gitars@eecs.ucf.edu

  27. Results University of Central Florida gitars@eecs.ucf.edu

  28. Conclusion and Future Work • We introduce a general method for learning and incorporating user-specific evasion models into adversarial planning. • Motion model has the most impact on the results. • But the choice of planner and time horizon matters as wellwithout good planning the modeling benefits are lost. • Future work is combining the prediction model with a hierarchical POMDP planner University of Central Florida gitars@eecs.ucf.edu

  29. Thank You Questions? University of Central Florida gitars@eecs.ucf.edu

  30. Please come join us next month at AIIDE in Palo, Alto, CA! The 8th Annual Conference on Artificial Intelligence and Interactive Digital Entertainment General Chair: Mark Riedl Program Chair: GitaSukthankar October 8-9, Palo Alto, California

More Related