Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005

Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005

Outline An engineering level overview of the HW and SW that make up jacquard. • CPU’s • Memory • OS • Interconnect Will use seaborg as a point of reference.

G P F S main memory GPFS MPI seaborg.nersc.gov (review?) 16 way SMP NHII Node Seaborg: 380 x Colony Switch CSS0 CSS1 crossbar • 6080 dedicated CPUs, 96 shared login CPUs • Hierarchy of caching, speeds • Bottleneck determined by first depleted resource HPSS

G P F S Main Memory GPFS MPI jacquard.nersc.gov basics 2 way Opteron node Jacquard: 320 x Infiniband Switch HT IB • 640 dedicated CPUs, 8 shared login CPUs • Smaller caches, HT, Really Fast • SMP? NUMA? SUMO. HPSS

Opteron Block Diagram : Not strictly SMP SDRAM SDRAM Switch, I/O 1 TLB per CPU 1K entries 4K pages  4MB coverage

Hyper Transport: Good Stuff Little conflict between data movement and computation

SMP size and memory contention Why is Jacquard 2 way SMP? Jacquard’s numbers 1 task : 100 % 2 tasks: 98%

Flops @ 2.2 GHz • Peak Theoretical Flops • Double (64 bit) floats : 1 add + 1 mult = 2.2 GFlop/s • Single (32 bit) floats : 2 add + 2 mult = 4.4 GFlop/s • Peak Realized Flops • Double (64 bit) floats : 1.9 GFlop/s • Single (32 bit) floats : 3.4 GFlop/s • Your Flops? • Walltime is more important than flops • For a known algorithm flops are a sanity check Memory BW 4 GB/sec per CPU

MPI Bandwidth: seaborg

MPI Bandwidth: Jacquard

Linux for AIX Users Linux and AIX are more similar than different • Linux is not as good as AIX in keeping processes scheduled of the same CPU  processor affinity work. • Linux has easy interfaces to architectural and process performance information /proc/cpuinfo, /proc/self, etc. • AIX MPI is in /usr/{bin,lib}, Linux MPI is in modules • Linux doesn’t need –bmaxdata ! • Little vs. Big Endian

Conclusions • The underlying HW technologies HT, IB, etc. are quite promising. Opteron systems are delivering great price/performance. • Still working some SDRAMM, OS, and SW issues. • What’s useful to you? Let us know.

Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005

Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005

Presentation Transcript

Lecture 1 An Overview of High-Performance Computer Architecture

The Performance Bottleneck Application, Computer, or Network

Introductions to Parallel Programming Using OpenMP

Group Processes

CHAPTER 5 ENTERPRISE ARCHITECTURES

PSoC 3 / PSoC 5 101: Architecture Overview

2014 Grant Application Workshop

Performance Tuning Workshop - Architecture

F5 VMware Solution Overview

Citrix NetScaler Overview

Windows Azure Platform Overview

FRBR Overview and Application

Computer Performance and Cost

Modification/Delivery Order (MDO) Version 1.0.1

Tutorial Notes: WRF Software 2.1

MPLS Architecture Overview

Etude de cas d’une application construite avec CCM

ARCHITECTURE PERFORMANCE EVALUATION Matthew Jacob

Performance Tuning Tips

Module 3: Common Threats

ANNUAL REPORT 2011/12 AND BUDGET RECOMMENDATIONS

Software Architecture - 2