1 / 69

Evolving Multimodal Behavior

This PhD proposal explores the challenge of discovering and evolving multimodal behavior in noisy and complex domains such as simulations, video games, and robotics. Previous approaches, including design approaches, value-function based approaches, and evolutionary approaches, have limitations that need to be addressed. The proposal suggests using multiobjective neuroevolution, specifically the NSGA-II algorithm, to evolve multimodal behavior. The battle domain is proposed as a testbed to experiment with evolving multimodal teamwork. Research questions include comparing NSGA-II with the weighted sum method and investigating the evolution of homogeneous and heterogeneous teams.

croy
Download Presentation

Evolving Multimodal Behavior

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolving Multimodal Behavior PhD Proposal Jacob Schrum 11/4/09

  2. Introduction • Challenge: Discover behavior automatically • Simulations, video games, robotics • Why challenging? • Noisy sensors • Complex domains • Continuous states/actions • Multiple agents, teamwork • Multiple objectives • Multimodal behavior required (focus)

  3. What is Multimodal Behavior? • Working definition: • Agent exhibits distinct kinds of actions under different circumstances • Examples: • Offensive & defensive modes in soccer • Search for weapons or opponents in video game • Animal with foraging & fleeing modes • Very important for teams • Roles correspond to modes • Example domains will involve teamwork

  4. Previous Approaches • Design Approaches • Hand code in a structured manner • Value-Function Based Approaches • Learn the utility of actions (RL) • Evolutionary Approaches • Selectively search based on performance

  5. Design • Subsumption Architecture (Brooks 1986) • Hierarchical design • Lower levels independent of higher levels • Built incrementally • Common in robotics • Hand coded

  6. Value-Function Based (1/2) • MAXQ (Dietterich 1998) • Hand-designed hierarchy • TD learning at multiple levels • Reduce state space • Taxi domain • Still just a grid world • Discrete state & action

  7. Value-Function Based (2/2) • Basis Behaviors (Matarić 1997) • Low-level behaviors pre-defined • Learn high-level control • Discrete state space • High-level features (conditions) • Reward shaping necessary • Applied to real robots • Too much expert knowledge

  8. Evolutionary (1/2) • Layered Evolution (Togelius 2004) • Evolve components of subsumption architecture • Applied to: • EvoTanks (Thompson and Levine 2008) • Unreal Tournament 2004 (van Hoorn et al. 2009) • Must specify: • Hierarchy • Training tasks • Similar to Layered Learning (Stone 2000)

  9. Evolutionary (2/2) • Neuro-Evolving Robotic Operatives (Stanley et al. 2005) • ML game • Train robot army • Many objectives • Weighted sum: z-scores method • User changes weights during training • Dynamic objective management • Leads to multimodal behavior

  10. Multiple Objectives • Multimodal problems are typically multiobjective • Modes associated with objectives • Traditional: weighted sum (Cohon 1978) • Must tune the weights • Only one solution • Bad for non-convex surfaces • Need better formalism Cannot be captured by weighted sum Each point corresponds to one set of specific weights

  11. Greatest Mass Sarsa (Sprague and Ballard 2003) • Multiple MDPs with shared action space • Learn via Sarsa(0) update rule: • Best value is sum of component values: • Used in sidewalk navigation task • Like weighted sum

  12. Convex Hull Iteration (Barrett and Narayanan 2008) • Changes MDP formalism: • Vector reward • Find solutions for all possible weightings where • Maximize: • Results in compact set of solutions • Different trade-offs • Cannot capture non-convex surfaces • Discrete states/actions only • Need a way to capture non-convex surfaces!

  13. Pareto-based Multiobjective Optimization (Pareto 1890) • Imagine game with two objectives: • Damage Dealt • Health Remaining • dominates iff • Population of points not dominated are best: Pareto Front High health but did not deal much damage Tradeoff between objectives and Dealt lot of damage, but lost lots of health

  14. Non-dominated Sorting Genetic Algorithm II(Deb et al. 2000) • Population P with size N; Evaluate P • Use mutation to get P´ size N; Evaluate P´ • Calculate non-dominated fronts of {P È P´} size 2N • New population size N from highest fronts of {P È P´}

  15. Constructive Neuroevolution • Genetic Algorithms + Neural Networks • Build structure incrementally • Good at generating control policies • Three basic mutations (no crossover used) • Other structural mutations possible • More later Perturb Weight Add Connection Add Node

  16. Evolution of Teamwork • Homogeneous • Shared policy • Individuals know how teammates act • Individuals fill roles as needed: multimodal • Heterogeneous • Different roles • Cooperation harder to evolve • Team-level multimodal behavior

  17. Completed Work • Benefits of Multiobjective Neuroevolution • Pareto-based leads to multimodal behavior • Targeting Unachieved Goals (TUG) • Speed up evolution with objective management • Evolving Multiple Output Modes • Allow networks to have multiple policies/modes • Need a domain to experiment in …

  18. Battle Domain • Evolved monsters (yellow) • Scripted fighter (green) • Approach nearest monster • Swing bat repeatedly • Monsters can hurt fighter • Bat can hurt monsters • Multiple objectives • Deal damage • Avoid damage • Stay alive • Can multimodal teamwork evolve?

  19. Benefits of Multiobjective Neuroevolution • Research Questions: • NSGA-II better than z-scores (weighted sum)? • Homogeneous or heterogeneous teams better? • 30 trials for each combination • Three evaluations per individual • Average scores to overcome noisy evals

  20. Incremental Evolution • Hard to evolve against scripted strategy • Could easily fail to evolve interesting behavior • Incremental evolution against increasing speeds • 0%, 40%, 80%, 90%, 95%, 100% • Increase speed when all goals are met • End when goals met at 100%

  21. Goals • Average population performance high enough? • Then increase speed • Each objective has a goal: • At least 50 damage to bot (1 kill) • Less than 20 damage per monster on average (2 hits) • Survive at least 540 time steps (90% of trial) • AVG population objective score met goal value? • Goal achieved

  22. Evolved Behaviors • Baiting + Side-Swiping • Lure fighter • Turns allow team to catch up • Attacks on left side of fighter • Taking Turns • Hit and run • Next counter-clockwise monster rushes in • Fighter hit on left side • Multimodal behaviors!

  23. Multiobjective Conclusions • NSGA-II faster than z-scores • NSGA-II more likely to generate multimodal behavior • Many runs did not finish/were slow • Several “successful” runs did not have multimodal behavior

  24. Targeting Unachieved Goals • Research Question: • How to speed up evolution, make more reliable • When objective’s goal is met, stop using it • Restore objective if scores drop below goal • Focuses on the most challenging objectives • Combine NSGA-II with TUG Evolution Tough Objectives

  25. Evolved Behaviors • Alternating Baiting • Bait until another monster hits • Then baiting monster attacks • Fighter knocked back and forth • Synchronized Formation • Move as a group • Fighter chases one bait • Other monster rushes in with side swipe attacks • More multimodal behaviors!

  26. TUG Conclusions • TUG results in huge speed-up • No wasted effort on achieved goals • TUG runs finish more reliably • Heterogeneous runs have more multimodal behavior than homogeneous • Some runs still did not finish • Some “successful” runs still did not have multimodal behavior

  27. Fight or Flight • Separate Fight and Flight trials • Fight = Battle Domain • Flight: • Scripted prey (red) instead of fighter • Has no bat; has to escape • Monsters confine and damage • New objective: Deal damage in Flight • Flight task requires teamwork • Requires multimodal behavior

  28. New-Mode Mutation • Encourage multimodal behavior • New mode with inputs from preexisting mode • Initially very similar • Maximum preference node determines mode

  29. Evolving Multiple Output Modes • Research Question: • How to evolve teams that do well in both tasks • Compare 1Mode to ModeMutation • Three evals in Fight and three in Flight • Same networks for two different tasks

  30. 1Mode Behaviors • Aggressive + Corralling • Aggressive in Fight task • Take lots of damage • Deal lots of damage • Corralling in Flight task • Run/Rush + Crowding • Run/Rush in Fight task • Good timing on attack • Kill fighter w/o taking too much damage • Crowding in Flight task • Get too close to prey • Knock prey out and it escapes • Networks can’t handle both tasks!

  31. ModeMutation Behaviors • Alternating Baiting + Corralling • Alternating baiting in Fight task • Corralling in Flight task • Spread out to prevent escape • Individuals rush in to attack • Hit into Crowd + Crowding • Hitting into Crowd in Fight task • One attacker knocks fighter into others • Crowding in Flight task • Rush prey, ricochet back and forth • Some times knocks prey free • Networks succeed at both tasks!

  32. Mode Mutation Conclusions • ModeMutation slower than 1Mode • ModeMutation better at producing multimodal behaviors • Harder task resulted in more failed runs • Many unused output modes created • Slows down execution • Bloats output layer

  33. Proposed Work • Extensions • Avoiding Stagnation by Promoting Diversity • Extending Evolution of Multiple Output Modes • Heterogeneous Teams Using Subpopulations • Open-Ended Evolution + TUG • Evaluate in new tasks • Killer App: Unreal Tournament 2004

  34. 1. Avoiding Stagnation by Promoting Diversity • Behavioral diversity avoids stagnation • Add a diversity objective (Mouret et al. 2009) • Behavior vector: • Given input vectors, concatenate outputs • Diversity objective: • AVG distance from other behavior vectors in pop. … -1 2.2 1.2 -2 … 0.5 1 0.2 1.7 -2 1.5 -1 0.6 0.3 2 …

  35. 2. Extending Evolution of Multiple Output Modes • Encourage mode differences • Random input sources • Probabilistic arbitration • Bad modes less likely to persist • Like softmax action selection • Restrict New-Mode Mutation • New objective: punish unused modes, reward used modes • Delete similar modes • Based on behavior metric • Limit modes: make best use of limited resources • Dynamically increase the limit?

  36. 3. Heterogeneous Teams Using Subpopulations • Each team member from different subpopulation (Yong 2007) • Encourages division of labor across teammates • Different roles leads to multimodal team behavior

  37. 4. Open-Ended Evolution + TUG • Keep increasing goals • Evolution has something to strive towards • Preserves benefits of TUG • Does not settle early • When to increase goals? • When all goals are achieved • As individual goals are achieved

  38. New Tasks • More tasks require more modes • Investigate single-agent tasks • Only teams so far • Investigate complementary objectives • TUG only helps contradictory? • Are hard when combined with others • Tasks: • Predator • Opposite of Flight • Partial observability • Sink the Ball • Very different from previous • Needs more distinct modes? • Less mode sharing?

  39. Unreal Tournament 2004 • Commercial First-Person Shooter (FPS) • Challenging domain • Continuous state and action • Multiobjective • Partial information • Multimodal behaviors required • Programming API: Pogamut • Competitions: • Botprize • Deathmatch

  40. Unreal Deathmatch • Packaged bots are hand-coded • Previous winners of botprize hand-coded • Learning attempts • Simplified version of game (van Hoorn et al. 2009) • Limited to certain behaviors (Kadlec 2008) • Multimodal behavior in full game: not done yet

  41. Unreal Teams • Team Deathmatch • Largely ignored? • Capture the Flag • Teams protect own flag • Bring enemy flag to base • GP approach could not beat UT bots (Kadlec 2008) • Domination • King of hill • Teams defend key locations • RL approach learned group strategy of hand-coded bots (Smith et al. 2007)

  42. Review • System for developing multimodal behavior • Multiobjective Evolution • Targeting Unachieved Goals • New-Mode Mutation • Behavioral Diversity • Extending Mode Mutation • Subpopulations • Open-Ended Evolution • Final evaluation in Unreal Tournament 2004

  43. Conclusion • Create system: • Automatically discovers multimodal behavior • No high-level hierarchy needed • No low-level behaviors needed • Works in continuous, noisy environments • Discovers team behavior as well • Agents w/array of different useful behaviors • Lead to better agents/behaviors in simulations, games and robotics

  44. Questions?

More Related