ARC Cluster

ARC Cluster Frank MuellerNorth Carolina State University

PIs & Funding • NSF funding level: $550k • NCSU: $60k (ETF) + $20+k (CSC) • NVIDIA: donations ~$30k • PIs/co-PIs: • Frank Mueller • Vincent Freeh • Helen Gu • Xuxian Jiang • Xiaosong Ma • Contributors: • Nagiza Samatova • George Rouskas

ARC Cluster: In the News • “NC State is Home to the Most Powerful Academic HPC in North Carolina” (CSC News, Feb 2011) • “Crash-Test Dummy For High-Performance Computing” (NCSU, The Abstract, Apr 2011) • “Supercomputer Stunt Double” (insideHPC, Apr 2011)

Purpose Create a mid-size computational infrastructure to support research in areas such as:

Researchers Already Active In the first week of public access: From groups from within NCSU: CSC, ECE, Chem/Bio Engineering, Materials, Operations Research ORNL Tsinghua University, Beijing, China

ARC - A Root Cluster Head/Login Nodes PFS Switch Stack Compute/Spare Nodes I/O Nodes Storage Array GEther Switch Stack IB Switch Stack SSD+ SATA Front Tier Interconnect Mid Tier Back Tier System Overview

Hardware 108 Compute Nodes 2-way SMPs with AMD Opteron 6128 processors with 8 cores per socket 16 cores per node! 32 GB DRAM per node 1728 compute cores available

Gigabit Ethernet interactive jobs, ssh, service Home directories 40Gbit/s Infiniband (OFEDstack) MPI Communication Open MPI, MVAPICH IP over IB Interconnects

GPUs NVIDIA Tesla C2050 (1 login + 36 nodes) 448 Compute cores per GPU Peak GigaFLOPS 515 SP/1030 DP Memory Amount 3GB Memory Interface 384-bit Memory Bandwidth (GB/sec) 144 NVIDIA GTX480 (10 nodes) 480 Compute cores per GPU Peak GigaFLOPS 1344.96 SP/ 168DP Memory Amount 3GB Memory Interface 384-bit Memory Bandwidth (GB/sec) 177.4 NVIDIA Tesla C2070 (2 nodes) 448 Compute cores per GPU Peak GigaFLOPS 515 SP/1030 DP Memory Amount 6GB Memory Interface 384-bit Memory Bandwidth (GB/sec) 144 NVIDIA 1060 GTX (1 node) NVIDIA 8800 GTX (1 node)

Solid State Drives All 108 compute nodes equipped with OCZ RevoDrive 120GB SSD Read: Up to 540 MB/s Write: Up to 480 MB/s Sustained Write: Up to 400 MB/s Random Write 4KB (Aligned): 75,000 IOPS

File Systems Available Today: NFS home directories over Gigabit Ethernet Local per-node scratch on spinning disks (ext3) Local per-node 120GB SSD (ext2) In the future: Parallel File Systems Lustre Separate dedicated nodes are available for parallel filesystems 1 MDS + 4 clients Are you interested in helping us set this up for your research projects??

Power Monitoring Watts Up Pro Serial and USB available. Connected in groups of: Mostly 4 nodes (sometimes just 3) 2x 1 node 1 w/ GPU 1 w/o GPU

Software Stack • Additional packages and libraries • upon request but… • Not free?  you need to pay • License required?  you need to sign it • Installation required?  you need to • Test it • Provide install script • check ARC website  constantly changing

Base System 64bit Rocks 5.3 (based off of CentOS) Batch system: Torque/Maui (PBS) All compilers and tools are available on the login nodes. Gcc, gfortran, … Compute nodes share the same base OS and libraries as the login nodes.

MPI Open MPI Operates over Infiniband Integrated with BLCR Already in your default PATH mpicc MVAPICH Infiniband support Requires changes to your path. See ARC site.

OpenMP The "#pragma omp" directive in C programs works. gcc -fopenmp -o fn fn.c

CUDA SDK Ensure you are using a node with a GPU Several types available to fine tune for your applications needs: Well-performing single or double precision devices. Requires environment changes: export PATH=".:~/bin:/usr/local/bin:/usr/bin:$PATH“ export PATH="/usr/local/cuda/bin:$PATH“ export LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/local/cuda/lib:$LD_LIBRARY_PATH“ export MANPATH="/usr/share/man:$MANPATH“ Or see site to make sure you have the latest paths…

PGI Compiler (Experimental) Awaiting site license update. export PATH=".:~/bin:/usr/local/bin:/usr/bin:$PATH“ export PATH="/usr/local/cuda/bin:$PATH“ export LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/local/cuda/lib“ export MANPATH="/usr/share/man“

Virtualization Goal: To allow a user to request VMs from the batch system just like they would any other resource User gets full root access to each VM requested with complete control over that VM. VMs will share the same network or may be grouped together into private networks across single or multiple nodes. Elegant VM creation scripts in place allow entire machine creation in a single line.

cannot SSH to a compute node must use PBS to submit jobs Either as batch or interactively Presently there are “hard” limits for job times and sizes. In general, please be considerate of other users and do not abuse the system. There are special queues for nodes with a GPU As we add additional specialized resources even more queues will become available. Job Submission

PBS Basics On the login node: to submit a job: qsub … to list your jobs: qstat to list everyone’s jobs: qstat –a to delete/cancel/stop your job: qdel … to check node status: pbsnodes

qsub Basics qsub -q cuda ... # job submitted to GPU/CUDA queue qsub -l ncpus=4 ... # ask for four tasks (processors) -- packed as up to 16 tasks per node qsub -l nodes=4:ppn=16 ... # job for four nodes with 16 processors on each node (64 tasks) qsub -l nodes=2:ppn=1 -q cuda ... # job for two tasks on two nodes with GPU/CUDA support qsub -l nodes=2,cput=00:5:00 ... # job for two tasks + 5 minutes CPU time to submit interactive: qsub -I # one node, shell will open up to submit interactive: qsub -I -nodes=20 #two nodes w/ 20 tasks to submit interactive: qsub -I -l host=compute-0-54.local #specifically on node 54 to submit interactive: qsub -I -l host=compute-0-54.local+compute-0-55.local #on 54+55 to submit interactive with X11: qsub -I -X ...

Listing your nodes Once your job begins, $PBS_NODEFILE points to a file that contains a list of your requested nodes. Open MPI is already integrated with PBS. Simply using mpirun … will automatically use all requested processes directly from PBS. For example, a CUDA programmer that wants to use 4 GPU nodes: [dfiala@login-0-0 ~]$ qsub -I -lnodes=4:ppn=1 -qcuda qsub: waiting for job 1774.arcs.csc.ncsu.edu to start qsub: job 1774.arcs.csc.ncsu.edu ready [dfiala@compute-0-2 ~]$ cat $PBS_NODEFILE compute-0-2.local compute-0-32.local compute-0-35.local compute-0-38.local ---SSHing between these nodes FROM the PBS session is allowed---

Handling problems If you find a node that is giving you trouble please report it to the mailing list. As a workaround, you can keep that node busy by queuing an empty job: echo sleep 600 | qsub -l host=compute-0-100,walltime=1000

Hardware in Action • 4 racks in server room

Running Large Jobs (and keeping cool) While our new cluster is surely state of the art… Our “dual action” cooling solution for the state of the art cluster State of the art cluster • The cooling system isn’t.

Temperature Monitoring It is the user’s responsibility to maintain room temperatures below 80 degrees while utilizing the cluster. ARC website has links to online browser-based temperature monitors. And the building staff have pagers that will alarm 24/7 when temperatures exceed the limit.

Connecting to ARC ARC access is restricted to on-campus IPs only. If you ever are unable to log in (connection gets dropped immediately before authentication) then this is likely the cause. Non-NCSU users may request remote access by providing a remote machine that their connections must originate from.

Summary Your ARC Cluster@Home: What can I do with it? Primary purpose: Advance Computer Science Research (HPC and beyond) Want to run a job over the entire machine? Want to replace parts of the software stack? Secondary purpose: Service to sciences, engineering & beyond Vision: Have domain scientists work w/ Computer Scientists on code http://moss.csc.ncsu.edu/~mueller/cluster/arc/ Equipment donations welcome  Ideas how to improve ARC?  let us know Qs?  send to mailing list (once you have an account) request an account: email dfiala<at>ncsu.edu Research topic, abstract, and compute requirements/time Must include your unity ID NCSU Students: Advisor sends email as means of their approval Non-NCSU: same + preferred username + hostname(your remote login location.

Slides provided by David Fiala Edited by Frank Mueller Current as of May 11, 2011.

ARC Cluster

ARC Cluster

Presentation Transcript

ARC reporting

The ARC

Arc Welding

ARC 406

Arc Welding

Reflex Arc

Arc 391

Arc Length

Arc h

ARC

ARC Rejoinders

ARC

ARC  gLite, gLite  ARC interoperation

ARC

Arc Wall

ARC reporting

Arc Length

Arc Welding

Arc Flash Suits - Arc Flash PPE - Arc Flash Hoods