Introduction to High Performance Cluster Computing

Introduction toHigh Performance Cluster Computing Courseware Module H.1.a August 2008

HPC = High Performance ComputingIncludes Supercomputing • HPCC = High Performance Cluster ComputingNote: these are NOT High Availability clusters • HPTC = High Performance Technical Computing • The ultimate aim of HPC users is to max out the CPUs! What is HPC

Agenda Parallel Computing Concepts Clusters Cluster Usage

A central concept in computer science is concurrency: • Concurrency: Computing in which multiple tasks are active at the same time. There are many ways to use Concurrency: • Concurrency is key to all modern Operating Systems as a way to hide latencies. • Concurrency can be used together with redundancy to provide high availability. • Parallel Computing uses concurrency to decrease program runtimes. Concurrency and Parallel Computing HPC systems are based on Parallel Computing

Hardware for Parallel Computing Parallel computers are classified in terms of streams of data and streams of instructions: MIMD Computers: Multiple streams of instructions acting on multiple streams of data. SIMD Computers: A single stream of instructions acting on multiple streams of data. Parallel Hardware comes in many forms: On chip: Instruction level parallelism (e.g. IPF) Multi-core: Multiple execution cores inside a single CPU Multiprocessor: Multiple processors inside a single computer. Multi-computer: networks of computers working together.

Hardware for Parallel Computing Parallel Computers Single Instruction Multiple Data (SIMD)* Multiple Instruction Multiple Data (MIMD) Shared Address Space Disjoint Address Space Non-uniform Memory Architecture (NUMA) Symmetric Multiprocessor (SMP) Massively Parallel Processor (MPP) Cluster Distributed Computing

HPC Platform Generations In the 1990’s, it was a massively parallel computer. In the 1980’s, it was a vector SMP. Custom components throughout Commodity Off The Shelf CPUs, everything else custom … but today, it is a cluster. COTS components everywhere Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries.

A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource. A typical cluster uses: • Commodity off the shelf parts • Low latency communication protocols What is an HPC Cluster

What is HPCC? Master Node LAN/WAN File Server / Gateway Interconnect Cluster Management Tools Compute Nodes

A Sample Cluster Design

Application Parallel Benchmarks: Perf, Ring, HINT, NAS, … Real Applications Middleware shmem MPI PVM Linux Other OSes OS TCP/IP VIA Proprietary Protocol Ethernet Quadrics Infiniband Myrinet Interconnect desktop Workstation Server 4U + Server 1P/2P Hardware Cluster Architecture View

The Node • A single element within the cluster • Compute Node • Just computes – little else • Private IP address – no user access • Master/Head/Front End Node • User login • Job scheduler • Public IP address – connects to external network • Management/Administrator Node • Systems/cluster management functions • Secure administrator address • I/O Node • Access to data • Generally internal to cluster or to data centre Cluster Hardware

Interconnect

Agenda Parallel Computing Concepts Clusters Cluster Usage

Performance Measurements Usage Model Application Classification Application Behaviour Cluster Usage

1 GFlops = 1 billion floating point operations per second Theoretical v Real GFlops Xeon Processor • 1 Core theoretical peak = 4 x Clock speed (double precision) • Xeons have 128 bit SSE registers which allows the processor to carry out 2 double precision floating point add and 2 multiply operations per clock cycle • 2 computational cores per processor • 2 processors per node (4 cores per node) Sustained (Rmax) = ~35-80% of theoretical peak (interconnect dependent) You’ll NEVER hit peak! The Mysterious FLOPS

Other Measures of CPU Performance • SPEC (www.spec.org) • Spec CPU2000/2006 Speed – single core performance indicator • Spec CPU2000/2006 Rate – node performance indicator • SpecFP – Floating Point performance • SpecINT – Integer performance • Many other performance metrics may be required • STREAM - memory bandwidth • HPL – High Performance Linpack • NPB – NASA suite of performance tests • Pallas Parallel Benchmark – another suite • IOZone – file system throughput

Technology Advancements in 5 Years • Example: • * From November 2001 top500 supercomputer list (cluster of Dell Precision 530) • ** Intel internal cluster built in 2006

Usage Model Many Users Mixed size Parallel/Serial jobs Ability to Partition and Allocate Jobs to Nodes for Best Performance Meteorology Seismic Analysis Fluid Dynamics Molecular Chemistry Electronic Design Monte Carlo Design Optimisation Parallel Search Many Serial Jobs (Capacity) One Big Parallel Job (Capability) Batch Usage Appliance Usage Interconnect More Important Load Balancing More Important Job Scheduling very important Normal Mixed Usage

HPC clusters run parallel applications, and applications in parallel! One single application that takes advantage of multiple computing platforms • Fine-Grained Application • Uses many systems to run one application • Shares data heavily across systems • PDVR3D (Eigenvalues and Eigenstates of a matrix) • Coarse-Grained Application • Uses many systems to run one application • Infrequent data sharing among systems • Casino (Monte-Carlo stochastic methods) • Embarrassingly Parallel Application • An instance of the entire application runs on each node • Little or no data sharing among compute nodes • BLAST (pattern matching) A shared memory machine will run all sorts of application Application and Usage Model

Forward Modelling Inversion Signal Processing Searching/Comparing Types of Applications

Solving linear equations Grid Based Parallelization by domain decomposition (split and distribute the data) Finite element/finite difference Forward Modelling

From measurements (F) compute models (M) representing properties (d) of the measured object(s). Deterministic • Matrix inversions • Conjugate gradient Stochastic • Monte Carlo, Markov chain • Genetic algorithms Generally large amounts of shared memory Parallelism through multiple runs with different models Inversion

Convolution model (stencil) Matrix computations (eigenvalues…) Conjugate gradient methods Normally not very demanding on latency and bandwidth Some algorithms are embarrassingly parallel Examples: seismic migration/processing, medical imaging, SETI@Home Signal Processing/Quantum Mechanics

Signal Processing Example

Integer operations are more dominant than floating point IO intensive Pattern matching Embarrassingly parallel – very suitable for grid computing Examples: encryption/decryption, message interception, bio-informatics, data mining Examples: BLAST, HMMER Searching/Comparing

Applications • FEA – Finite Element Analysis • The simulation of hard physical materials, e.g. metal, plasticCrash test, product design, suitability for purpose • Examples: MSC Nastran, Ansys, LS-Dyna, Abaqus, ESI PAMCrash, Radioss • CFD – Computational Fluid Dynamics • The simulation of soft physical materials, gases and fluidsEngine design, airflow, oil reservoir modelling • Examples: Fluent, Star-CD, CFX • Geophysical Sciences • Seismic Imaging – taking echo traces and building a picture of the sub-earth geology • Reservoir Simulation – CFD specific to oil asset management • Examples: Omega, Landmark VIP and Pro/Max, Geoquest Eclipse Application Classes

Applications • Life Sciences • Understanding the living world – genome matching, protein folding, drug design, bio-informatics, organic chemistry • Examples: BLAST, Gaussian, other • High Energy Physics • Understanding the atomic and sub-atomic world • Software from Fermi-Lab or CERN, or home-grown • Financial Modelling • Meeting internal and external financial targets particularly regarding investment positions • VaR – Value at Risk – assessing the impact of economic and political factors on the bank’s investment portfolio • Trader Risk Analysis – what is the risk on a trader’s position, a group of traders Application Classes

Introduction to High Performance Cluster Computing

Introduction to High Performance Cluster Computing

Presentation Transcript

High Performance Cluster Computing Architectures and Systems

High Performance Cluster Computing Architectures and Systems

High Performance Computing – Introduction to Unix

High Performance Cluster Computing: Architectures and Systems

Introduction to Grid Computing with High Performance Computing

Introduction to High Performance Computing

High Performance Computing Cluster OSCAR

High Performance Computing – Introduction to C

High Performance Cluster Computing Architectures and Systems

High Performance Cluster Computing Architectures and Systems

High Performance Cluster Computing Architectures and Systems

High Performance Cluster Computing Architectures and Systems

Introduction to Cluster Computing

High Performance Cluster Computing

High Performance Cluster Computing Architectures and Systems

High Performance Cluster Computing: Architectures and Systems

High Performance Cluster Computing

High Performance Cluster Computing Architectures and Systems

Introduction to High Performance Computing

High Performance Computing Cluster (HPCC)

High Performance Cluster Computing Architectures and Systems

High Performance Cluster and Grid Computing