Flexible Agent Based Simulation for Pedestrian Modelling on GPU Hardware

Flexible Agent Based Simulation for Pedestrian Modelling on GPU Hardware Paul Richmond The Department of Computer Science University of Sheffield, UK paul@dcs.shef.ac.uk www.dcs.shef.ac.uk/~paul • Richmond Paul, Coakley Simon, Romano Daniela, "Cellular Level Agent Based Modelling on the Graphics Processing Unit (with FLAME GPU)", To appear in the special issue: "Parallel and Ubiquitous methods and tools in Systems Biology" of the international journal: Briefings in Bioinformatics 2010 • Richmond Paul, Coakley Simon, Romano Daniela (2009), "Cellular Level Agent Based Modelling on the Graphics Processing Unit", Proc. of HiBi09 - High Performance Computational Systems Biology, 14-16 October 2009,Trento, Italy • Richmond Paul, Coakley Simon, Romano Daniela(2009), "A High Performance Agent Based Modelling Framework on Graphics Card Hardware with CUDA", Proc. of 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009), May, 10–15, 2009, Budapest, Hungary • Richmond Paul, Romano Daniela(2008), "A High Performance Framework For Agent Based Pedestrian Dynamics On GPU Hardware", Proceedings of EUROSIS ESM 2008 (European Simulation and Modelling), October 27-29, 2008, Universite du Havre, Le Havre, France

Introduction and Scope • Agent Based Modelling (ABM) • Emergence of Complex natural behaviour for simple rules • Individuals are agents with memory • Update own memory by considering neighbours • Of Pedestrian Behaviour • Continuous space mobile agents • Discrete time steps • On the GPU • Why?: Performance and real time visualisation • Aim is for Flexibility: Want to be able to harness the GPUs power without modellers having to understand GPU programming • Not Continuum based (Treuille 06) or using mobile discrete agents (D’Souza 07)

XML Schemas XML Model File Scripted Behaviour Simulation Code XSLT Simulation Templates Simulation Program ■■■ ■■■ XSLT Processor FLAME and FLAME GPU • What is FLAME (and what FLAME is not)? • Flexible Large-scale Agent Modelling Environment • XML Model specification based on the X-Machine (state based agents) • Template system for generating simulation code • Why extend FLAME to the GPU • Complete modelling environment (beyond that of simple swarms) • Formal and portable specification technique based on the X-Machine • Many existing models to be used for benchmarking • What is FLAME GPU • Data parallel implementation of FLAME using CUDA (with real time visualisation) • Cost effective solution for high performance ABM • XSLT Driven Templates (rather than the XParser)

Programming the GPU • Purpose of the GPU • Data parallel device for operation on streams of data • Programming for General Purpose Use • Graphics API Technique: Not ideal • High Level Alternatives Brook GPU (Buck 04): SIMD Stream programming extension for C Sh (McCool 02): C++ language with a Compiler for GPU backends • Hardware Specific Stream SDK: Low level ATI specific native instruction set and High Level support with Brook + CUDA: NVIDIA programming for GPU using a compiler and a C syntax with extensions OpenCL: New standard but growing, limited support • CUDA • GPU is a coprocessor to CPU (with its own global memory) • Many light weight parallel threads grouped into regular sized blocks (execution units) • Threads in same execution unit perform the instructions (SIMD)

Mapping Agent Functions to the GPU • Each transition function is wrapped by a GPU kernel • Each agent is a thread performing the function • Functions can input and output messages • Functions can output new agents (agent birth) • An agent can be removed (agent death) by returning non 0 value __FLAME_GPU_FUNC__ int input_function( xmachine_memory_pedestrian* xmemory, xmachine_message_pedestrian_location_list* location_messages) { /* Get the first message */ xmachine_message_pedestrian_location* location_message = get_first_pedestrian_location_message(location_messages); /* Repeat untill there are no more messages */ while(location_message) { /* Process the message */ if distance_check(xmemory, location_message) { updateSteerVelocity(xmemory, location_message); } /* Get the next message */ location_message = get_next_pedestrian_location_message(location_message, location_messages); } /* Update any other xmemory variables */ xmemory->x += xmemory->vel_x*TIME_STEP; ... return 0; }

typedef struct agent{ float x; float y; } xm_memory_agent_list [N]; typedef struct agent_list{ float x[N]; float y[N]; } xm_memory_agent_list; … … 0 1 2 3 N 0 1 2 3 N … … … 0 1 2 3 N Implementation Techniques used within FLAME GPU • Avoiding diversity across agents in execution blocks • Agents are stored and processed in state lists to avoid conditional branching • Sparse lists are compacted during births, filters and optional message outputs • Ensure data access is performed efficiently • Lists are stored using an Structure of Arrays (SoA) rather than an Array of Structures (AoS)

Message Communication • Brute Force Communication • Tile blocks of message lists into shared memory to reduce global memory access (Nyland 07) • Use of Shared memory has roughly an order of magnitude performance impact. • Spatially Partitioned Communication • Split the environment into uniform grid based on the message radius. • Each agent reads all messages from each neighbouring partition Requires the use of parallel sort and a boundary matrix • Roughly 2/3 messages are outside the message radius but much better than O(n)² • Discrete Agent Message Communication (CA) • Large block of messages loaded into shared memory • Or use the texture cache to minimise global reads.

A Pedestrian Model Example • Inter agent interaction (using spatially partitioned messaging) is based on a hybrid of Reynolds and Social Forces • Social repulsion force Navigates pedestrians to area of low concentration Limited forward Vision Preference over agents in direct line of sight Scaled depending on distance to neighbour • Close Range Interaction Force Very short range with no limited vision Acts as collision avoidance

Visualisation and Animation Technique • Agent data is already on the GPU for visualisation • Need to draw a copy of the agent for each in the simulation (instancing) • The model geometry can be stored on the GPU to reduce draw calls • Only requires a single call per agent • Each agent is displaced an orientated. • Use Levels of Detail to avoid rendering high detailed models for every agent • On the GPU so must remain parallel • Sort the agents by LOD Level and render in groups • Animation - Very simple • Interpolate between 2 key frames • Rotate the model depending on velocity direction

Demo Agents coloured by LOD

Performance Results Observables • Performance Dependant on Communication Radius • Larger communication = less partitions = more agents considered per update • LOD technique has a cost • Don’t use for small populations • Very large population sizes possible in real time

Environment Collision Avoidance • Discrete grid of agents to encode the environment • Static Discrete Agents • Repulsive forces direct agents from wall • Automatically generated in advance • Continuous Pedestrian Agents read discrete messages • Apply a collision force • Displace pedestrian agents by height value

Long Range Navigation • Many agents following similar paths so a global solution is used • Fluid flow route for each path through the environment • Calculated offline in advance by backtracking from exit point • Smooth movement around obstacles • Discrete Agents also responsible for pedestrian birth allocation

Conclusions and Future Work • Summary • Flexible agent architecture for the GPU suitable for force models • Easily extendible • Massive performance/cost benefits • Scope for Future Work • Multi GPU • Would enable extremely large populations of systems to be simulated • For Spatial partitioning only partition boundaries would need to be communicated between GPU devices • Improve pedestrian models • Improved collision detection (more accurate) • Long range individual path planning without flow grids • Physically accurate animation and movement • Much larger models (need appropriate scenarios)

References • A. Treuille, S. Cooper, and Z. Popović, "Continuum crowds," in SIGGRAPH '06: ACM SIGGRAPH 2006 Papers. New York, NY, USA: ACM, 2006, pp. 1160-1168. • R. M. D’Souza, M. Lysenko, and K. Rahmani. Sugarscape on steroids: simulating over a million agents at interactive rates. In Proceedings of Agent2007, 2007. • Samuel Eilenberg. Automata, Languages, and Machines. Academic Press, Inc., Orlando, FL, USA, 1974. • T. Balanescu, A. J. Cowling, H. Georgescu, M. Gheorghe, M. Holcombe, and C. Vertan. Communicating stream x-machines systems are no more than x-machines. j-jucs, 5(9):494–507, 1999. |http://www.jucs.org/jucs_5_9/communicating_stream_x_machines|. • Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for gpus: stream computing on graphics hardware. ACM Trans. Graph., 23(3):777–786, 2004. • Michael D. McCool, Zheng Qin, and Tiberiu S. Popa. Shader metaprogramming. In HWWS ’02: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pages 57–68, Aire-la-Ville, Switzerland, Switzerland, 2002. Eurographics Association. • Lars Nyland, Mark Harris, and Jan Prins. Fast n-body simulation with cuda. In Hubert Nguyen, editor, GPU Gems 3, chapter 31. Addison Wesley Professional, August 2007.

Flexible Agent Based Simulation for Pedestrian Modelling on GPU Hardware