SimuTools , Malaga, Spain March 16, 2010. Efficient Simulation of Agent-based Models on Multi-GPU & Multi-Core Clusters. Kalyan S. Perumalla, Ph.D. Senior R&D Manager Oak Ridge National Laboratory Adjunct Professor Georgia Institute of Technology. In a Nut Shell.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
SimuTools, Malaga, Spain
March 16, 2010
Kalyan S. Perumalla, Ph.D.
Senior R&D ManagerOak Ridge National Laboratory
Adjunct ProfessorGeorgia Institute of Technology
Dramatic improvements in speed
Game of Life
Afghan LeadershipABMS: Motivating Demonstrations
Threads get “scheduled” in batches on GPU hardware
CUDA claims extremely efficient thread-launch implementation
Millions of CUDA threads at onceComputation Kernels on each GPUE.g., CUDA Threads
P2,2Parallel Execution: Conventional Method
P2,2Our Solution: B2R Method
At any level in the hierarchy, total runtime F is given by:
Most interesting aspect
Cubic in R!
Total Execution Time
B2R can be applied at all levels!
E.g., CUDA Hierarchy
Over 100× speedup with MPI+CUDA
Speedup relative to naïve method with no latency-hiding
Additional material at our webpage:
Discrete Computing Systems