Large Scale Simulations HRL Shared Software Framework GPU Computing cluster

Large Scale SimulationsHRL Shared Software FrameworkGPU Computing cluster Narayan Srinivasa Aleksey Nogin Work performed by HRL under DARPA contract HRL0011-09-C-001

Shared Software Infrastructure Infrastructure overview – three aspects: Legal “limited LGPL” – like agreement General Public License (GPL) does not permit incorporating HRL code into proprietary programs. Since the HRL code is a subroutine library, you may consider it more useful to permit linking proprietary applications (which will be our partners code) with the library. This is allowed by LGPL. Subversion server for sharing code The API and the software itself Summary of the latest: Legal agreement is “stuck” on some technicalities and it would take time to resolve In the meantime, we will rely on existing subcontracts for HRL<->Sub sharing The subversion server “ExRep” is fully operational Already contains the HRL Shared Infrastructure code. The GPU cluster is fully operational We have ported our infrastructure to GPUs (full 1ms updates!) Most of the multi-GPU/multi-node code is written Some refactoring of the initialization and “glue” code still needed. Work performed by HRL under DARPA contract HRL0011-09-C-001 Work performed by HRL under DARPA contract HRL0011-09-C-001

HRL Shared Source Agreement • Terms (reminder): • LGPL-style, but limited to “SyNAPSE Team Members” and “SyNAPSE purposes” only • “Shared Source” code can be modified and redistributed to any “SyNAPSE Team Members” • Object code have to be accompanied by source – or source can be placed in the Subversion repository • Code for separate pieces that only use the “shared source” infrastructure through its APIs does not have to become a part of “shared source” • You do not have to release your models to “shared source” • Currently “stuck” on export restriction technicalities • Would take time to resolve • Unfortunately our legal turnaround is very slow • For now, we will rely on existing subcontracts for 2-way HRL ↔ Sub sharing • Disable Sub ↔ Sub sharing not covered by subcontracts: • Provide a Shared area with read-only access to not-HRL people • Separate areas for those who want to share with HRL Work performed by HRL under DARPA contract HRL0011-09-C-001 Work performed by HRL under DARPA contract HRL0011-09-C-001

Subversion Repository • The subversion server “ExRep” is fully operational • Already contains the HRL Shared Infrastructure code. • You have to agree to ExRep Term and Conditions to get access • This is not SyNAPSE-specific and separate from subcontracts and Shared Source Agreement • Agreement binds you as an ExRep user, not your Institution • E.g. you promise not to share your account credentials with others • Aleksey emailed all prospective users a copy of the Agreement • You need to send Aleksey an email stating that you agree. • SSH public keys are used to grant access • Aleksey have emailed all prospective users instructions • You need to email Aleksey a copy of your public key • ExRep is capable of sending email notifications for all commits • We are waiting on IT to allow outgoing emails to non-HRL accounts Work performed by HRL under DARPA contract HRL0011-09-C-001 Work performed by HRL under DARPA contract HRL0011-09-C-001

GPU-Based High Performance Computing Cluster Work performed by HRL under DARPA contract HRL0011-09-C-001 HRL has purchased a high-performance computing cluster at no cost to DARPA • SyNAPSE project will be the primary user • Head node: • 2of: NVIDIA Tesla C1060 GPUs, each with: • 933 GFLOP peak performance • 4GB of GDDR3 memory, at 102 GB/sec • PCIe 2.0 x16 interconnect (16 GB/sec) • 48GB RAM • 2 of: 4-core Nehalem 2.66 Ghz CPUs (64-bit) • 11TB HDDs (RAID configuration – 8.5TB usable) • 91 compute nodes, each: • 2 of: NVIDIA Tesla M1060 GPUs • 12 GB RAM • 2 of: 4-core Nehalem 2.26 Ghz CPUs (64-bit) • Hi-speed 20Gbps InfiniBand Interconnect • 1Gbps Ethernet switch The cluster is now fully operational

GPU Cluster – InfiniBand Fabric 96-port fast InfiniBand fabric 4x40Gbps 4x40Gbps 4x40Gbps 4x40Gbps 4x40Gbps 4x40Gbps 16 compute nodes (20Gbps each) … 16 compute nodes (20Gbps each) 36-port Switch 16 compute nodes (20Gbps each) … 36-port Switch … 36-port Switch • Switches run at 40Gbps • Interface cards run at 20Gbps. • Each 2 switches connected at 160Gbps 36-port Switch 36-port Switch 36-port Switch … 16 compute nodes (20Gbps each) … 16 compute nodes (20Gbps each) … 16 compute nodes (20Gbps each) Work performed by HRL under DARPA contract HRL0011-09-C-001

GPU and multi-GPU code We have ported our infrastructure to GPUs Full 1ms updates, do not have to rely on UCI 1s batching A closer match to CPU simulations and hardware Do not implement axonal delays Artificial “80%/20%” uniformly connected network: 105 neurons 107 synapses @ 10Hz – runs in real time A 2D 2-layer random Gaussian connectivity network: 0.3*105 neurons 0.8*107 synapses @10Hz – 3.2x faster than real time Generic experiment code runs the same on CPU/GPU based on a compilation flag in a configuration file. We have mostly implemented an MPI-based framework: Running on multiple GPUs, multiple CPUs, or even a mix of the two Initialization code needs to be rewritten to work with MPI The API for specifying the experiments need to be updated to work with the new code. Work performed by HRL under DARPA contract HRL0011-09-C-001

For each experiment, a custom binary is compiled, with 4 components: Shared Simulation & Experimentation Infrastructure • Portions of the code will be experiment-specific • Portions of the code will be provided by the shared infrastructure Work performed by HRL under DARPA contract HRL0011-09-C-001 Work performed by HRL under DARPA contract HRL0011-09-C-001

Neural Networks – Levels of Flexibility • Currently we support three different levels of flexibility: • Per-simulation – compile-time switches and compile-time global constants defined in build scripts (including “experiment definition files”). Fastest and most efficient, least flexible. • Per-neuron – including defining properties of synapses as a property of pre- or post-synaptic neurons. • Per-synapse – memory-intensive, would like to avoid. • In general, would prefer to have the least flexibility that we can get away with. • Simulator may support features that are not (yet?) expected to be included in hardware, but we have to be careful. Work performed by HRL under DARPA contract HRL0011-09-C-001 Work performed by HRL under DARPA contract HRL0011-09-C-001

Neural Model Flexibility Work performed by HRL under DARPA contract HRL0011-09-C-001 Work performed by HRL under DARPA contract HRL0011-09-C-001

API Overview Compute Execute simulation steps CPU CUDA Network Immutable portion of the network state (connectivity, parameters) BuildNetwork Incremental construction of neural networks State Mutable portion of the network state (weights, statistics) User’s code for constructing a network Users extend, if needed Statistics At regular interval – save data for future analysis, print basic stats Experiment Controls the computation API/control dependencies, not data flow Users extend, if needed InputGen Call-back functions to fill in input spike trains and/or currents Virtual Environment (optional) Main Work performed by HRL under DARPA contract HRL0011-09-C-001

Building networks incrementally.API Fragments (Simplified) struct NeuronKind { struct NeuronKind SetInhibitory(bool inhibotory = true); NumberGen a, b, c, d; // Izhikevich parameters – constant, or probability distribution parameters } class BuildNetwork { // Add a new set of neurons to the network Population NewPopulation (int size, NeuronKind & neuron); }; struct SynapseKind { NumberGen weight; // Initial weight NumberGen delay; // Axonal delay } class Population { // New synapses - to a different populations. Return the number of synapses int ConnectFull(NeuronPopulation& to, SynapseKind & synapse); int Connect1to1(NeuronPopulation& to, SynapseKind & synapse); int ConnectRandom(NeuronPopulation& to, float probability, SynapseKind & synapse); int ConnectGauss(NeuronPopulation& to, float max_probability, float expected_inputs, SynapseKind); int ConnectFixedPreNum(NeuronPopulation& to, float n, const SynapseKind & synapse); } Work performed by HRL under DARPA contract HRL0011-09-C-001 Work performed by HRL under DARPA contract HRL0011-09-C-001

Building networks incrementally.Example BuildNetwork build; Population excitatory = build.NewPopulation (800, NeuronKind()); Population inhibitory = build.NewPopulation(200, NeuronKind().SetInhibitory()); excitatory.ConnectRandom(excitatory, 0.2); // E -> E excitatory.ConnectRandom(inhibitory, 0.2); // E -> I inhibitory.ConnectRandom(excitatory, 0.2); // I -> E inhibitory.ConnectRandom(inhibitory, 0.2); // I -> I Work performed by HRL under DARPA contract HRL0011-09-C-001 Work performed by HRL under DARPA contract HRL0011-09-C-001

Overview of the Shared Framework Code Major components: The simulator and related glue code (meant to be immutable) mk/config file - selects the main parts: Which experiment to run (some experiments have variants) Which computation engine to use (cpu or cuda) Which communication engine to use (null or mpi) Experiment-definition file (roughly one per experiment): Defines the per-simulation parameters Specifies which files contain the experiment code modules Experiment code: May be split into several files Pieces of an experiment code can be reused in different experiments Analyzers for off-line data analysis Generic and experiment-specific Code in ExRep contains the complete simulator, and several sample experiments and analyzers. Work performed by HRL under DARPA contract HRL0011-09-C-001

ExRep Directory Structure ExRep repository root: svn+ssh://svn@exrep.hrl.com/exrep/ Note: you do not have access to ExRep root, only to some particular subdirectories Many versions of Subversion have a problem with it, make sure to use svn version 1.6.11 – this latest version fixes some bugs related to this scenario. SyNAPSE area in ExRep: …/CRAD/SyNAPSE/ SyNAPSE Shared Area – a subdirectory of …/SyNAPSE …/Code/Shared Mentioned by name in the Shared Source Agreement Right now you’ll only get read access – and only to this subdirectory Other subdirectories of SyNAPSE directory will be created as needed Work performed by HRL under DARPA contract HRL0011-09-C-001

SyNAPSE Shared Framework Directory Structure Undersvn+ssh://svn@exrep.hrl.com/exrep/CRAD/SyNAPSE/Code/Shared/ The OMake subdirectory contains some generic build scripts for the OMake Build Tool – CUDA, MPI, etc. The Sim subdirectory contains the framework itself, with subdirectories: hrlsim – core framework, C++ code and headers hrlsim/config.h – generated by the build process, summarizes all the per-simulation parameters (with comments) – more on next slide mk – core framework, build scripts mk/config – global configuration file for the build (not in ExRep, will be created on first invocation of the build tool) mk/compute-consts.om – default simulation parameters sample_exp – sample simulation experiments and helper/template code …/mk/*.exp – experiment definition files …/src/ – C++ source files for experiments: setup a network, generate inputs, print extra statistics in on-line mode …/analyzers/ – off-line analysis templates and samples (C++) …/scripts/ – shell/Python scripts for follow-up analysis and visualization Data – directory for temporary off-line data (weights, spikes, etc). Work performed by HRL under DARPA contract HRL0011-09-C-001

Running an Existing Experiment Once: Download the OMake Build Tool from http://omake.metaprl.org/ We will probably need to release an updated version soon Go to Sim directory Run “omake” – this will create a default mk/config file Edit the mk/config file It has several configuration variables, each fully commented Which experiment, which computation engine, initial RNG seed, etc The file is re-created by OMake on every run Only value changes for existing variables are allowed/preserved The list of valid experiments is generated from experiment definition files (sample_exp/mk/*.exp) Run “omake” to build the custom simulator Generates the ./sim or ./sim-cuda binary Will generate hrlsim/config.h in the process Useful summary of per-simulation parameters Will also build all applicable analyzers Run the custom simulator “./sim N” (or “./sim-cuda N”) Where “N” is the simulation duration in virtual seconds “N” can be omitted when the experiment definition file gives a default duration Work performed by HRL under DARPA contract HRL0011-09-C-001

Defining a New Experiment Create a Sim/private_exp directory With the subdirectories following the structure of the sample_exp Create a new experiment definition file Needs to go into Sim/private_exp/mk/ With a .exp extension Use an existing sample file as a template Create the C++ code Needs to go into Sim/private_exp/src/ The experiment definition file should list all the .cpp files you are using – from either private_exp/src of sample_exp/src Proceed as described in the previous slide After you create your new experiment definition file and run “omake” for the first time, the list of available experiments in mk/config will include your new experiment Work performed by HRL under DARPA contract HRL0011-09-C-001

Large Scale Simulations HRL Shared Software Framework GPU Computing cluster

Large Scale Simulations HRL Shared Software Framework GPU Computing cluster

Presentation Transcript

Neuroimaging on the Shared Computing Cluster (SCC)

Model Checking Large-Scale Software

GPU Requirements for Large Scale Scientific Applications

Lecture 6: Shared-memory Computing with GPU

Triton Shared Computing Cluster Project Update

GPU Computing

MigratinG to the Shared Computing Cluster (SCC)

GPU Programming using BU Shared Computing Cluster

Large Scale Simulations E nabled by BigSim

Large Scale Simulations of Reionization

Large Scale Distributed Computing Systems

Large scale simulations of astrophysical turbulence

Large Scale Computing Systems

quattor Framework for Managing Grid-enabled Large Scale Computing Fabrics

Large-scale Structure Simulations

GPU Computing

in Large-Scale Cluster

GPU Computing

Large-Scale Computing with Grids

Goals of the Large-Scale Cluster Computing Workshop

Large scale simulations of astrophysical turbulence