The Design of the Red Storm High Performance Computer

The Design of the Red StormHigh Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories http://www.sandia.gov/~smkelly Abstract: Sandia National Laboratories has a long history of successfully applying massively parallel processing (MPP) technology to solve problems in the national interest for the US Department of Energy. We drew upon our experiences with numerous architectural and design features when planning the Red Storm computer system. This talk will present the key issues that were considered. Important principles are performance balance between the hardware components and scalability of the system software. Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Companyfor the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

What is High Performance Computing? • (n.) A branch of computer science that concentrates on developing supercomputers and software to run on supercomputers. A main area of this discipline is developing parallel processing algorithms and software: programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors. (http://www.webopedia.com/TERM/H/High_Performance_Computing.html) • The idea/premise of parallel processing is not new (http://www.sandia.gov/ASC/news/stories.html#nineteen-twenty-two)

Red Storm – a First Look • Sandia/Cray Inc. partnership: • Sandia architecture • Sandia & Cray System Software • Cray engineering and manufacturing • Sandia systems HW/SW expertise

Service Compute Partition Parallel I/O Users /home Net I/O Red Storm is a Massively Parallel Processor Supercomputer • 12,960 2.4 GHz Dual Core Opterons for computation (called nodes) • 2 GB Memory per core (in progress)

TorusInterconnectin Z Y=20 12,960ComputeNode Mesh 310 Service & I/O Nodes 310 Service& I/O Nodes Z=24 X=27 272024 3D Mesh/Torus + I/O

Usage Model ComputeResource I/O Linux Login (Service) Node

Key Performance Characteristics that Lead to a Balanced system • 124.42 TeraFLOPS (trillion floating point operations per second) • Aggregate system memory bandwidth of 83 TB/s • Sustained aggregate interconnect bandwidth of 120 TB/s • High-performance I/O subsystem (minimum sustained file system bandwidth of 100 GB/s to 340 TB of parallel disk storage and sustained external network bandwidth of 50 GB/s)

Red Storm has an Excellent Ratio of Computation to Bandwidth

Additional Architectural Features • Scalability: Red Storm’s hardware and system software scale from a single cabinet system to a 32,000 node system. • Functional Partitioning: Hardware and system software are carefully engineered to optimize the scalability and the performance of the system. • Reliability: A full system Reliability, Availability, Serviceability (RAS) is designed into the architecture. • Upgrade-ability: There is a designed-in path for system upgrades. • Custom Packaging: Red Storm is designed to be a high density, relatively low power system. • Price/Performance: It has excellent performance per dollar through the use of high volume commodity parts where feasible.

Red Storm Systems Software • Operating Systems • LINUX on service and I/O nodes • LWK (Catamount) on compute nodes • LINUX on RAS nodes • Run-Time System • Logarithmic loader • Node allocator • Batch system – Torque/MOAB • Libraries – MPI-2, I/O, Math • Compilers – Fortran, C, C++ • Totalview Debuggers • Performance analysis tool • Parallel File System • Lustre by Cluster File Systems – recently purchased by Sun

In Addition to Balanced Hardware,System Software must be Scalable

Scalable System SoftwareConcept #1 Do things in a hierarchical fashion

Scheduler Compute Node Allocator PBS mom … PBS Server Job Queues Fan out application . . . . . . . . . PBS Node Jobs Launch is Hierarchical Database Node Red Storm User Login Node Compute Node Allocator CPU Inventory Database Login & Job Launch (Yod) Start App User Application Linux

RAS monitoring is hierarchical

Scalable System SoftwareConcept #2 Minimize Compute Node Operating System Overhead

Light Weight Kernel Features • Virtual addressing, but no virtual memory • Portals communication protocol only • No dynamic process creation • No threads (will be necessary as hardware core counts continue to increase) • Gets the application going and gets out of the way

Operating System Interruptions Impede Progress of the Application

Scalable System SoftwareConcept #3 Minimize Compute Node Interdependencies

Calc 1      Calc 2      Calc 3      Calc 4      0 min 1 min 2 min 3 min 4 min Calculating Weather Minute by Minute

Calculation with Breaks • Calculation with Asynchronous Breaks Calc 1      Wait  Calc 2      Calc 3      Wait  Calc 4      6 min 1 min 2 min 3 min 4 min 5 min 0 min

Run Time Impact of UNIX/LinuxSystems Services (aka Daemons) • Say breaks take 50 S and occur once per second • On one CPU, wasted time is 50 s every second • Negligible .005% impact • On 100 CPUs, wasted time is 5 ms every second • Negligible .5% impact • On 10,000 CPUs, wasted time is 500 ms • Significant 50% impact • Red Storm with over 12,000 CPUs, does not have asynchronous services on compute nodes

Scalable System SoftwareConcept #4 Avoid linear scaling of buffer requirements

Connection-oriented protocolshave to reserve buffers for the worst case • If each node reserves a 100KB buffer for its peers, that is 1GB of memory per node for 10,000 processors. • Need to communicate using collective algorithms

Scalable System SoftwareConcept #5 Parallelize wherever possible

Use parallel techniques for I/O Compute Nodes C C C C C C C C C C C C C High Speed Network I I I I I N N N N L L L L I/O Nodes RAIDs 10 Gbit Ethernet 1 Gbit Ethernet I Parallel File System Servers (190 + MDS) • 140 MB/s per FC X 2 X 190 = 53 GB/s 10.0 GigE Servers (50) • 500 MB/s X 50 = 25 GB/s N Login Servers (10) • 1.0 GigE X 10 L

Conclusion • Hardware, system software, and application software are all important participants in achieving a high performing system. • Although originally designed to address the needs of a specific project, it has become a very popular commercial product around the world.

The Design of the Red Storm High Performance Computer

The Design of the Red Storm High Performance Computer

Presentation Transcript

The Energy Design Process for High Performance Buildings

Eye of the Storm

Eye of the Storm

Eye of the Storm

The Energy Design Process for High Performance Buildings

What’s New in the Cambridge High Performance Computer Service?

High-Performance System Design

The High Cost of Low Performance

Eye of the Storm

Eye of the Storm

The Storm

The Art of Computer Performance Analysis

“The Storm”

Storm of the Century

High Performance Tray Design

Eye of the Storm

The High Performance Team

Top500: Red Storm

High-Performance Computer Architecture

The Storm

High Performance Database Design