1 / 81

Introduction to High Performance Computing

Introduction to High Performance Computing. Jon Johansson Academic ICT University of Alberta. Agenda. What is High Performance Computing? What is a “supercomputer”? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing

Download Presentation

Introduction to High Performance Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to High Performance Computing Jon Johansson Academic ICT University of Alberta

  2. Agenda • What is High Performance Computing? • What is a “supercomputer”? • is it a mainframe? • Supercomputer architectures • Who has the fastest computers? • Speedup • Programming for parallel computing • The GRID??

  3. High Performance Computing • HPC is the field that concentrates on developing supercomputers and software to run on supercomputers • a main area of this discipline is developing parallel processing algorithms and software • programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors

  4. High Performance Computing • HPC is about “big problems”, i.e. need: • lots of memory • many cpu cycles • big hard drives • no matter what field you work in, perhaps your research would benefit by making problems “larger” • 2d → 3d • finer mesh • increase number of elements in the simulation

  5. Grand Challenges • weather forecasting • economic modeling • computer-aided design • drug design • exploring the origins of the universe • searching for extra-terrestrial life • computer vision • nuclear power and weapons simulations

  6. Grand Challenges – Protein To simulate the folding of a 300 amino acid protein in water: # of atoms: ~ 32,000 folding time: 1 millisecond # of FLOPs: 3  1022 Machine Speed: 1 PetaFLOP/s Simulation Time: 1 year (Source: IBM Blue Gene Project) Ken Dil and Kit Lau’s protein folding model. IBM’s answer: The Blue Gene Project US$ 100 M of funding to build a 1 PetaFLOP/s computer Charles L Brooks III, Scripps Research Institute

  7. Grand Challenges - Nuclear • National Nuclear Security Administration • http://www.nnsa.doe.gov/ • use supercomputers to run three-dimensional codes to simulate instead of test • address critical problems of materials aging • simulate the environment of the weapon and try to gauge whether the device continues to be usable • stockpile science, molecular dynamics and turbulence calculations http://archive.greenpeace.org/comms/nukes/fig05.gif

  8. Grand Challenges - Nuclear ASCI White • March 7, 2002: first full-system three-dimensional simulations of a nuclear weapon explosion • simulation used more than 480 million cells (grid: 780x780x780) • if the grid is a cube • 1,920 processors on IBM ASCI White at the Lawrence Livermore National laboratory • 2,931 wall-clock hours or 122.5 days • 6.6 million CPU hours Test shot “Badger” Nevada Test Site – Apr. 1953 Yield: 23 kilotons http://nuclearweaponarchive.org/Usa/Tests/Upshotk.html

  9. Grand Challenges - Nuclear • Advanced Simulation and Computing Program (ASC) • http://www.llnl.gov/asc/asc_history/asci_mission.html

  10. Agenda • What is High Performance Computing? • What is a “supercomputer”? • is it a mainframe? • Supercomputer architectures • Who has the fastest computers? • Speedup • Programming for parallel computing • The GRID??

  11. What is a “Mainframe”? • large and reasonably fast machines • the speed isn't the most important characteristic • high-quality internal engineering and resulting proven reliability • expensive but high-quality technical support • top-notch security • strict backward compatibility for older software

  12. What is a “Mainframe”? • these machines can, and do, run successfully for years without interruption (long uptimes) • repairs can take place while the mainframe continues to run • the machines are robust and dependable • IBM coined a term advertise the robustness of their mainframe computers : • Reliability, Availability and Serviceability (RAS)

  13. What is a “Mainframe”? • Introducing IBM System z9 109 • Designed for the On Demand Business • IBM is delivering a holistic approach to systems design • Designed and optimized with a total systems approach • Helps keep your applications running with enhanced protection against planned and unplanned outages • Extended security capabilities for even greater protection capabilities • Increased capacity with more available engines per server

  14. What is a Supercomputer?? • at any point in time the term “Supercomputer” refers to the fastest machines currently available • a supercomputer this year might be a mainframe in a couple of years • a supercomputer is typically used for scientific and engineering applications that must do a great amount of computation

  15. What is a Supercomputer?? • the most significant difference between a supercomputer and a mainframe: • a supercomputer channels all its power into executing a few programs as fast as possible • if the system crashes, restart the job(s) – no great harm done • a mainframe uses its power to execute many programs simultaneously • e.g. – a banking system • must run reliably for extended periods

  16. What is a Supercomputer?? • to see the worlds “fastest” computers look at • http://www.top500.org/ • measure performance with the Linpack benchmark • http://www.top500.org/lists/linpack.php • solve a dense system of linear equations • the performance numbers give a good indication of peak performance

  17. Terminology • combining a number of processors to run a program is called variously: • multiprocessing • parallel processing • coprocessing

  18. Terminology • parallel computing – harnessing a bunch of processors on the same machine to run your computer program • note that this is one machine • generally a homogeneous architecture • same processors, memory, operating system • all the machines in the Top 500 are in this category

  19. Terminology • distributed computing - harnessing a bunch of processors on different machines to run your computer program • heterogeneous architecture • different operating systems, cpus, memory • the terms “parallel” and “distributed” computing are often used interchangeably • the work is divided into sections so each processor does a unique piece

  20. Terminology • some distributed computing projects are built on BOINC (Berkeley Open Infrastructure for Network Computing): • SETI@home – Search for Extraterrestrial Intelligence • Proteins@home – deduces DNA sequence, given a protein • Hydrogen@home – enhance clean energy technology by improving hydrogen production and storage (this is beta now)

  21. Quantify Computer Speed • we want a way to compare computer speeds • count the number of “floating point operations” required to solve the problem • + - x / • results of the benchmark are so many Floating point Operations Per Second (FLOPS) • a supercomputer is a machine that can provide a very large number of FLOPS

  22. Floating Point Operations • multiply 2 1000x1000 matrices • for each resulting array element • 1000 multiplies • 999 adds • do this 1,000,000 times • ~109 operations needed • increasing array size has the number of operations increasing as O(N3)

  23. Agenda • What is High Performance Computing? • What is a “supercomputer”? • is it a mainframe? • Supercomputer architectures • Who has the fastest computers? • Speedup • Programming for parallel computing • The GRID??

  24. High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures have • processors and some combination cache • some form of memory and IO • the processors are separated from the other processors by some distance • there are major differences in the way that the parts are connected • some problems fit into different architectures better than others

  25. High Performance Computing • increasing computing power available to researchers allows • increasing problem dimensions • adding more particles to a system • increasing the accuracy of the result • improving experiment turnaround time

  26. Flynn’s Taxonomy • Michael J. Flynn (1972) • classified computer architectures based on the number of concurrent instructions and data streams available • single instruction, single data (SISD) – basic old PC • multiple instruction, single data (MISD) – redundant systems • single instruction, multiple data (SIMD) – vector (or array) processor • multiple instruction, multiple data (MIMD) – shared or distributed memory systems: symmetric multiprocessors and clusters • common extension: • single program (or process), multiple data (SPMD)

  27. Architectures • we can also classify supercomputers according to how the processors and memory are connected • couple processors to a single large memory address space • couple computers, each with its own memory address space

  28. Architectures • Symmetric Multiprocessing (SMP) • Uniform Memory Access (UMA) • multiple CPUs, residing in one cabinet, share the same memory • processors and memory are tightly coupled • the processors share memory and the I/O bus or data path

  29. Architectures • SMP • a single copy of the operating system is in charge of all the processors • SMP systems range from two to as many as 32 or more processors

  30. Architectures • SMP • "capability computing" • one CPU can use all the memory • all the CPUs can work on a little memory • whatever you need

  31. Architectures • UMA-SMP negatives • as the number of CPUs get large the buses become saturated • long wires cause latency problems

  32. Architectures • Non-Uniform Memory Access (NUMA) • NUMA is similar to SMP - multiple CPUs share a single memory space • hardware support for shared memory • memory is separated into close and distant banks • basically a cluster of SMPs • memory on the same processor board as the CPU (local memory) is accessed faster than memory on other processor boards (shared memory) • hence "non-uniform" • NUMA architecture scales much better to higher numbers of CPUs than SMP

  33. Architectures

  34. Architectures University of Alberta SGI Origin SGI NUMA cables

  35. Architectures • Cache Coherent NUMA (ccNUMA) • each CPU has an associated cache • ccNUMA machines use special-purpose hardware to maintain cache coherence • typically done by using inter-processor communication between cache controllers to keep a consistent memory image when the same memory location is stored in more than one cache • ccNUMA performs poorly when multiple processors attempt to access the same memory area in rapid succession

  36. Architectures Distributed Memory Multiprocessor (DMMP) • each computer has its own memory address space • looks like NUMA but there is no hardware support for remote memory access • the special purpose switched network is replaced by a general purpose network such as Ethernet or more specialized interconnects: • Infiniband • Myrinet Lattice: Calgary’s HP ES40 and ES45 cluster – each node has 4 processors

  37. Architectures • Massively Parallel Processing (MPP) Cluster of commodity PCs • processors and memory are loosely coupled • "capacity computing" • each CPU contains its own memory and copy of the operating system and application. • each subsystem communicates with the others via a high-speed interconnect. • in order to use MPP effectively, a problem must be breakable into pieces that can all be solved simultaneously

  38. Architectures

  39. Architectures • lots of “how to build a cluster” tutorials on the web – just Google: • http://www.beowulf.org/ • http://www.cacr.caltech.edu/beowulf/tutorial/building.html

  40. Architectures • Vector Processor or Array Processor • a CPU design that is able to run mathematical operations on multiple data elements simultaneously • a scalar processor operates on data elements one at a time • vector processors formed the basis of most supercomputers through the 1980s and into the 1990s • “pipeline” the data

  41. Architectures • Vector Processor or Array Processor • operate on many pieces of data simultaneously • consider the following add instruction: • C = A + B • on both scalar and vector machines this means: • add the contents of A to the contents of B and put the sum in C' • on a scalar machine the operands are numbers • on a vector machine the operands are vectors and the instruction directs the machine to compute the pair-wise sum of each pair of vector elements

  42. Architectures • University of Victoria has 4 NEC SX-6/8A vector processors • in the School of Earth and Ocean Sciences • each has 32 GB of RAM • 8 vector processors in the box • peak performance is 72 GFLOPS

  43. Agenda • What is High Performance Computing? • What is a “supercomputer”? • is it a mainframe? • Supercomputer architectures • Who has the fastest computers? • Speedup • Programming for parallel computing • The GRID??

  44. BlueGene/L • The fastest on the Nov. 2007 top 500 list: • http://www.top500.org/ • installed at the Lawrence Livermore National Laboratory (LLNL) (US Department of Energy) • Livermore California

  45. http://www.llnl.gov/asc/platforms/bluegenel/photogallery.htmlhttp://www.llnl.gov/asc/platforms/bluegenel/photogallery.html

  46. BlueGene/L • processors: 212992 • memory: 72 TB • 104 racks – each has 2048 processors • the first 64 had 512 GB of RAM (256 MB/processor) • the 40 new racks have 1 TB of RAM (512 MB/processor) • a Linpack performance of 478.2 TFlop/s • in Nov 2005 it was the only system ever to exceed the 100 TFlop/s mark • there are now 10 machines over 100 TFlop/s

  47. The Fastest Six

  48. # of Processors with Time The number of processors in the fastest machines has increased by about a factor of 200 in the last 15 years

  49. # of Gflops Increase with Time Machine speed has increased by more than a factor of 5000 in the last 15 years.

  50. Future BlueGene

More Related