Scalable Parallel ComputIng - PowerPoint PPT Presentation

scalable parallel computing n.
Skip this Video
Loading SlideShow in 5 Seconds..
Scalable Parallel ComputIng PowerPoint Presentation
Download Presentation
Scalable Parallel ComputIng

play fullscreen
1 / 22
Download Presentation
Scalable Parallel ComputIng
Download Presentation

Scalable Parallel ComputIng

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Scalable Parallel ComputIng CENG 546 Dr. EsmaYıldırım

  2. What is a computing cluster? A computing cluster consists of a collection of interconnected stand-alone/complete computers, which can cooperatively working together as a single, integrated computing resource. Cluster explores parallelism at job level and distributed computing with higher availability. A typical cluster: Merging multiple system images to a SSI (single-system image ) at certain functional levels. Low latency communication protocols applied Loosely coupled than an SMP with a SSI

  3. What is a Commodity Cluster • It is a distributed/parallel computing system • It is constructed entirely from commodity subsystems • All subcomponents can be acquired commercially and separately • Computing elements (nodes) are employed as fully operational standalone mainstream systems • Two major subsystems: • Compute nodes • System area network (SAN) • Employs industry standard interfaces for integration • Uses industry standard software for majority of services • Incorporates additional middleware for interoperability among elements • Uses software for coordinated programming of elements in parallel

  4. Multicomputer Clusters: • Cluster: A network of computers supported by middleware and interacting by message passing • PC Cluster (Most Linux clusters) • Workstation Cluster (NOW, COW) • Server cluster or Server Farm • Cluster of SMPs or ccNUMA systems • Cluster-structured massively parallel processors (MPP) – about 85% of the top-500 systems

  5. Operational Benefits of Clustering System availability (HA) : Cluster offers inherent high system availability due to the redundancy of hardware, operating systems, and applications. Hardware Fault Tolerance: Cluster hassome degree ofredundancy in most system components including both hardware and software modules. OS and application reliability : Run multiple copies of the OS and applications, and through this redundancy Scalability : Adding servers to a cluster or adding more clusters to a network as the application need arises. High Performance : Running cluster enabled programs to yield higher throughput.

  6. Scalability • The ability to deliver proportionally greater sustained performance through increased system resources • Strong Scaling • Fixed size application problem • Application size remains constant with increase in system size • Weak Scaling • Variable size application problem • Application size scales proportionally with system size • Capability computing • in most pure form: strong scaling • Marketing claims tend toward this class • Capacity computing • Throughput computing • Includes job-stream workloads • In most simple form: weak scaling • Cooperative computing • Interacting and coordinating concurrent processes • Not a widely used term • Also: “coordinated computing”

  7. Performance Metrics • Peak floating point operations per second (flops) • Peak instructions per second (ips) • Sustained throughput • Average performance over a period of time • flops, Mflops, Gflops, Tflops, Pflops • flops, Megaflops, Gigaflops, Teraflops, Petaflops • ips, Mips, ops, Mops … • Cycles per instruction • cpi • Alternatively: instructions per cycle, ipc • Memory access latency • cycles per second • Memory access bandwidth • bytes per second (Bps) • bits per second (bps) • or Gigabytes per second, GBps, GB/s

  8. Basic Uni-processor Architecture elements • I/O Interface • Memory Interface • Cache hierarchy • Register Sets • Control • Execution pipeline • Arithmetic Logic Units

  9. Multiprocessor • A general class of system • Integrates multiple processors in to an interconnected ensemble • MIMD: Multiple Instruction Stream Multiple Data Stream • Different memory models • Distributed memory • Nodes support separate address spaces • Shared memory • Symmetric multiprocessor • UMA – uniform memory access • Cache coherent • Distributed shared memory • NUMA – non uniform memory access • Cache coherent • PGAS • Partitioned global address space • NUMA • Not cache coherence • Hybrid : Ensemble of distributed shared memory nodes • Massively Parallel Processor, MPP

  10. Massively Parallel Processor • MPP • General class of large scale multiprocessor • Represents largest systems • IBM BG/L • Cray XT3 • Distinguished by memory strategy • Distributed memory • Distributed shared memory • Cache coherent • Partitioned global address space • Custom interconnect network • Potentially heterogeneous • May incorporate accelerator to boost peak performance

  11. DM - MPP

  12. IBM Blue Gene/L

  13. IBM BlueGene/L Supercomputer: The World Fastest Message-Passing MPP built in 2005 Built jointly by IBM and LLNL teams and funded by US DoE ASCI Research Program

  14. Symmetric Multiprocessor(SMP) • Building block for large MPP • Multiple processors • 2 to 32 processors • Now Multicore • Uniform Memory Access (UMA) shared memory • Every processor has equal access in equal time to all banks of the main memory • Cache coherent • Multiple copies of variable maintained consistent by hardware

  15. SMP - UMA

  16. SMP Node Diagram Legend : MP : MicroProcessor L1,L2,L3 : Caches M1.. : Memory Banks S : Storage NIC : Network Interface Card MP MP MP MP L1 L1 L1 L1 L2 L2 L2 L2 L3 L3 M1 M1 Mn-1 S PCI-e Controller JTAG Ethernet S Peripherals USB NIC NIC

  17. DSM - NUMA Distributed Shared Memory- Non-uniform memory access

  18. System Area Network System Area Network 16X 16X 16X 16X 4X 4X 4X 4X 4X 4X 4X 4X 4X 4X 4X 4X 4X 4X 4X 4X 64 Processor Constellation Commodity Clusters vs “Constellations” • An ensemble of N nodes each comprising p computing elements • The p elements are tightly bound shared memory (e.g., smp, dsm) • The N nodes are loosely coupled, i.e., distributed memory • p is greater than N • Distinction is which layer gives us the most power through parallelism 64 Processor Commodity Cluster

  19. System Stack Science Problems : Environmental Modeling, Physics, Computational Chemistry, etc. Application : Coastal Modeling, Black hole simulations, etc. Algorithms : PDE, Gaussian Elimination, 12 Dwarves, etc. Program Source Code Programming Languages: Fortran, C, C++ , UPC, Fortress, X10, etc. Compilers : Intel C/C++/Fortran Compilers, PGI C/C++/Fortran, IBM XLC, XLC++, XLF, etc. Runtime Systems : Java Runtime, MPI etc. Model of Computation Operating Systems : Linux, Unix, AIX etc. Systems Architecture : Vector, SIMD array, MPP, Commodity Cluster Firmware : Motherboard chipset, BIOS, NIC drivers, Microarchitectures : Intel/AMD x86, SUN SPARC, IBM Power 5/6 Logic Design : RTL Circuit Design : ASIC, FPGA, Custom VLSI Device Technology : NMOS, CMOS, TTL, Optical

  20. Historical Top-500 List

  21. Clusters Dominate Top-500