1 / 26

Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping. Jacob Schrum and Risto Miikkulainen University of Texas at Austin Department of Computer Science. Typical Uses of MOEAs. Where have MOEAs proven themselves? Wireless Sensor Networks (Woehrle et al, 2010)

caleb-kim
Download Presentation

Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping Jacob Schrum and Risto Miikkulainen University of Texas at Austin Department of Computer Science

  2. Typical Uses of MOEAs • Where have MOEAs proven themselves? • Wireless Sensor Networks (Woehrle et al, 2010) • Groundwater Management (Siegfried et al 2009) • Hydrologic model calibration (Tang et al, 2006) • Epoxy polymerization (Deb et al, 2004) • Voltage-controlled oscillator design (Chu et al, 2004) • Multi-spindle gear-box design (Deb & Jain, 2003) • Foundry casting scheduling (Deb & Reddy, 2001) • Multipoint airfoil design (Poloni & Pediroda, 1997) • Design of aerodynamic compressor blades (Obayashi, 1997) • Electromagnetic system design (Michielssen & Weile, 1995) • Microprocessor design (Stanley & Mudge, 1995) • Design of laminated ceramic composites (Belegundu et al, 1994) • Many engineering/design problems!

  3. New Domains for MOEAs • Simulated agents often face multiple objectives • Automatic discovery of intelligent behavior • Video game opponents in Unreal Tournament (van Hoorn, 2009) • Predator/prey scenarios (Schrum & Miikkulainen 2009) • Race car driving in TORCS (Agapitos et al, 2008) • Comparatively little so far • Direct application of MOEA seldom successful • Success often depends on “shaping”

  4. What is Shaping? • Term from Behavioral Psychology • Identified by B. F. Skinner (1938) • Task-Based Example: Train rat to press lever • First reward proximity • Then any interaction with lever • Then actual pressing of lever

  5. Evolutionary Shaping • Environment changes, making task harder • Evolution shapes behavior across generations • Example: Migration given continental drift [1] • Animals become accustomed to short migration • Continental drift increases distance of migration • Ability to travel increasing distances required • EC models with incremental evolution (ex. [2]) Arctic Tern Atlantic Salmon [1] B. F. Skinner. The shaping of phylogenic behavior. Experimental Analysis of Behavior. 1975. [2] Schrum and Miikkulainen. Constructing Complex NPC Behavior via Multiobjective Neuroevolution. 2008.

  6. Fitness-Based Shaping • Not extensively used • Little/no domain knowledge needed • Multiobjective approach a good fit • Selection criteria change • Exploiting ignored objectives (TUG) • Exploiting unfilled niches (BD) Crowded Niches Objective Space Dominated, but exploiting mostly ignored objective Uncrowded Niches Uncrowded Niches Uncrowded Niches Behavior Space

  7. Mutiobjective Optimization • Pareto dominance: iff • Assumes maximization • Want nondominated points • NSGA-II used in this work • What to evolve? • NNs as control policies Nondominated

  8. Constructive Neuroevolution • Genetic Algorithms + Neural Networks • Build structure incrementally (complexification) • Good at generating control policies • Three basic mutations (no crossover used) Perturb Weight Add Connection Add Node

  9. Targeting Unachieved Goals • Main ideas: • Temporarily deactivate “easy” objectives • Focus on “hard” objectives • “Hard” and “easy” defined in terms of goal values • Easy: average fitness “persists” above goal (achieved) • Hard: goal not yet achieved • Objectives reactivated when no longer achieved • Increase goal values when all achieved Evolution Hard Objectives

  10. TUG Example Other goals also achieved → Goals increase Noisy evaluations Goal achieved Reset recency-weighted average

  11. Behavioral Diversity • Originally developed for single-objective tasks [3] • Add behavioral diversity objective • Encourage exploration of new behaviors • Domain-specific behavior measure required • Extensions in this work: • Multiobjective task • Domain independent method • Only requires policy mapping ℝ to ℝ , e.g. NNs Senses N M Actions [3] J.-B. Mouret and S. Doncieux. Using behavioral exploration objectives to solve deceptive problems in neuro-evolution. 2009.

  12. Behavioral Diversity Details • Behavior vector: • Given input vectors, concatenate outputs • Behavioral diversity objective: • AVG distance from other behavior vectors 0.1 2.3 4.3 5.2 3.2 Behavior vector 0.5 5.3 7.5 3.4 2.1 2.4 4.3 0.7 4.2 … 2.1 3.5 … 1.3 4.2 5.6 4.5 7.7 High average distance from other points

  13. Battle Domain • Evolved monsters (blue) • Monsters can hurt fighter • Scripted fighter (green) • Bat can hurt monsters • Three objectives • Deal damage • Avoid damage • Stay alive • Previous work required incremental evolution to solve

  14. Experimental Comparison • NN copied to 4 monsters • Homogeneous teams • In paper • Control: Plain NSGA-II • TUG: NSGA-II with TUG using expert initial goals • BD: NSGA-II with BD using random input vectors • Additional methods since publication • TUG-Low: NSGA-II with TUG using minimal initial goals • BD-Obs: NSGA-II with BD using inputs from evaluations • Each repeated 30 times

  15. Attainment Surfaces [4] • Result attainment surface • Shows space dominated by single Pareto front • Summary attainment surface s • Union of space dominated in at least s out of n runs • Surface s weakly dominates s+1, etc. Surface 1 Individual surfaces intersect Surface 2 Surface 3 Pareto Fronts (Approximation Sets) Result Attainment Surfaces Summary Attainment Surfaces [4] J. Knowles. A summary-attainment surface plotting method for visualizing the performance of stochastic multiobjective optimizers. 2005.

  16. Final Summary Attainment Surfaces Animation: worst to best summary attainment surface Control TUG BD TUG-Low BD-Obs

  17. Hypervolume Metric [5] • Hypervolume of result attainment surface • Simply “volume” for 3 domain objectives • WRT reference point • Slightly less than minimum scores • Pareto-compliant metric Hypervolume = A + B + C + D [5] E. Zitzler and L. Thiele. Multiobjective optimization using evolutionary algorithms – a comparative case study. 1998.

  18. Hypervolume

  19. Successful Behaviors BD TUG BD-Obs TUG-Low

  20. Discussion • Control: more extreme trade-offs • BD: more precise timing • BD-Obs and BD similar • “Real” inputs give no advantage • TUG: more teamwork • Particular initial objectives • TUG-Low more like BD than TUG • ALL are better than Control

  21. Future Work • How to combine TUG and BD • Naïve combination doesn’t work • Scaling up • Many objectives • More complex domains • Current work in Unreal Tournament promising

  22. Conclusion • BD and TUG improve MO evolution • Domain independence! • Contrast to task-based shaping • Expand MOEAs to a new range of domains

  23. Questions? Email: schrum2@cs.utexas.edu See movies at: http://nn.cs.utexas.edu/?fitness-shaping

  24. TUG Details • Persistence: • Recency-weighted average surpasses goal • Goals: • Initial values based on domain knowledge • Or simply the minimal values for objectives • Increase each goal when all are achieved • Objectives reactivated when no longer achieved Goal achieved

  25. TUG Cycles

More Related