1 / 17

Real Parallel Computers

Real Parallel Computers. Modular data centers. Background Information. Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005. Short history of parallel machines. 1970s: vector computers

lelia
Download Presentation

Real Parallel Computers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real Parallel Computers

  2. Modular data centers

  3. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005

  4. Short history of parallel machines • 1970s: vector computers • 1990s: Massively Parallel Processors (MPPs) • Standard microprocessors, special network and I/O • 2000s: • Cluster computers (using standard PCs) • Advanced architectures (BlueGene) • Comeback of vector computer(Japanese Earth Simulator) • IBM Cell/BE • 2010s: • Multi-cores, GPUs • Cloud data centers

  5. Performance development and predictions

  6. Clusters • Cluster computing • Standard PCs/workstations connected by fast network • Good price/performance ratio • Exploit existing (idle) machines or use (new) dedicated machines • Cluster computers vs. supercomputers (MPPs) • Processing power similar: based on microprocessors • Communication performance was the key difference • Modern networks have bridged this gap • (Myrinet, Infiniband, 10G Ethernet)

  7. Overview • Cluster computers at our department • DAS-1: 128-node Pentium-Pro / Myrinet cluster (gone) • DAS-2: 72-node dual-Pentium-III / Myrinet-2000 cluster • DAS-3: 85-node dual-core dual Opteron / Myrinet-10G • DAS-4: 72-node cluster with accelerators (GPUs etc.) • Part of a wide-area system: • Distributed ASCI Supercomputer

  8. Distributed ASCI Supercomputer(1997-2001)

  9. DAS-2 Cluster (2002-2006) • 72 nodes, each with 2 CPUs (144 CPUs in total) • 1 GHz Pentium-III • 1 GB memory per node • 20 GB disk • Fast Ethernet 100 Mbit/s • Myrinet-2000 2 Gbit/s (crossbar) • Operating system: Red Hat Linux • Part of wide-area DAS-2 system(5 clusters with 200 nodes in total) Ethernet switch Myrinet switch

  10. DAS-3 Cluster (Sept. 2006) • 85 nodes, each with 2 dual-core CPUs(340 cores in total) • 2.4 GHz AMD Opterons (64 bit) • 4 GB memory per node • 250 GB disk • Gigabit Ethernet • Myrinet-10G 10 Gb/s (crossbar) • Operating system: Scientific Linux • Part of wide-area DAS-3 system (5 clusters; 263 nodes), using SURFnet-6 optical network with 40-80 Gb/s wide-area links

  11. Nortel 5530 + 3 * 5510 ethernet switch 85 compute nodes 85 * 1 Gb/s ethernet 1 or 10 Gb/s Campus uplink 10 Gb/s ethernet 8 * 10 Gb/s eth (fiber) 85 * 10 Gb/s Myrinet 80 Gb/s DWDM SURFnet6 Nortel OME 6500 with DWDM blade 10 Gb/s Myrinet 10 Gb/s ethernet blade Myri-10G switch Headnode (10 TB mass storage) DAS-3 Networks

  12. DAS-3 Networks Myrinet Nortel

  13. DAS-4 (Sept. 2010) • 72 nodes (2 quad-core Intel Westmere Xeon E5620, 24 GB memory, 2 TB disk) • 2 fat nodes with 94 GB memory • Infiniband network + 1 Gb/s Ethernet • 16 NVIDIA GTX 480 graphics accelerators (GPUs) • 2 Tesla C2050 GPUs

  14. DAS-4 performance • Infiniband network: • - One-way latency: 1.9 microseconds • - Throughput: 22 Gbit/s • CPU performance: • - 72 nodes (576 cores): 4399.0 GFLOPS

  15. Blue Gene/L Supercomputer

  16. Blue Gene/L System 64 Racks, 64x32x32 Rack 32 Node Cards Node Card 180/360 TF/s 32 TB (32 chips 4x4x2) 16 compute, 0-2 IO cards 2.8/5.6 TF/s 512 GB Compute Card 2 chips, 1x2x1 90/180 GF/s 16 GB Chip 2 processors 5.6/11.2 GF/s 1.0 GB 2.8/5.6 GF/s 4 MB

  17. Blue Gene/L Networks 3 Dimensional Torus • Interconnects all compute nodes (65,536) • Virtual cut-through hardware routing • 1.4Gb/s on all 12 node links (2.1 GB/s per node) • 1 µs latency between nearest neighbors, 5 µs to the farthest • Communications backbone for computations • 0.7/1.4 TB/s bisection bandwidth, 68TB/s total bandwidth Global Collective • One-to-all broadcast functionality • Reduction operations functionality • 2.8 Gb/s of bandwidth per link • Latency of one way traversal 2.5 µs • Interconnects all compute and I/O nodes (1024) Low Latency Global Barrier and Interrupt • Latency of round trip 1.3 µs Ethernet • Incorporated into every node ASIC • Active in the I/O nodes (1:8-64) • All external comm. (file I/O, control, user interaction, etc.) Control Network

More Related