1 / 10

BGL Photo (system)

BGL Photo (system). BlueGene/L. IBM Journal of Research and Development, Vol. 49, No. 2-3. <http://www.research.ibm.com/journal/rd49-23.html>. Main Design Principles. Some science & engineering applications scale up to and beyond 10,000 parallel processes ;

Download Presentation

BGL Photo (system)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3. <http://www.research.ibm.com/journal/rd49-23.html>

  2. Main Design Principles • Some science & engineering applications scale up to and beyond 10,000 parallel processes; • Improve computing capability, holding total system cost; • Cost/perf trade-offs considering the end-use: • Applications <> Architecture <> Packaging • Reduce complexity and size. • ~25KW/rack is max for air-cooling in standard room. • Need to improve performance/power ratio. • 700MHz PowerPC440 for ASIC has excellent FLOP/Watt. • Maximize Integration: • On chip: ASIC with everything except main memory. • Off chip: Maximize number of nodes in a rack.. • Large systems require excellent reliability, availability, serviceability (RAS)

  3. Physical Layout of BG/L

  4. The Compute Chip • System-on-a-chip (SoC) • 1 ASIC • 2 PowerPC processors • L1 and L2 Caches • 4MB embedded DRAM • DDR DRAM interface and DMA controller • Network connectivity hardware • Control / monitoring equip. (JTAG)

  5. Compute and Node Cards

  6. Node Architecture • IBM PowerPC embedded CMOS processors, embedded DRAM, and system-on-a-chip technique is used. • 11.1-mm square die size, allowing for a very high density of processing. • The ASIC uses IBM CMOS CU-11 0.13 micron technology. • 700 Mhz processor speed close to memory speed. • Two processors per node. • Second processor is intended primarily for handling message passing operations

  7. Midplane and Rack • 1 rack holds 1024 nodes • Nodes optimized for low power • ASIC based on SoC technology • Outperform commodity clusters while saving on power • Aggressive packaging of processor, memory and interconnect • Power efficient & space efficient • Allows for latencies and bandwidths that are significantly better than those for nodes typically used in ASC scale supercomputers

  8. The Torus Network • 64 x 32 x 32 • Each compute node is connected to its six neighbors: x+, x-, y+, y-, z+, z- • Compute card is 1x2x1 • Node card is 4x4x2 • 16 compute cards in 4x2x2 arrangement • Midplane is 8x8x8 • 16 node cards in 2x2x4 arrangement • Each uni-directional link is 1.4Gb/s, or 175MB/s. • Each node can send and receive at 1.05GB/s. • Supports cut-through routing, along with both deterministic and adaptive routing. • Variable-sized packets of 32,64,96…256 bytes • Guarantees reliable delivery

  9. BG/L System Software • System software supports efficient execution of parallel applications • Compiler support for MPI-based C, C++, Fortran • Front-end nodes are commodity PCs running Linux • I/O nodes run a customized Linux kernel • Compute nodes: extremely lightweight custom kernel • Space sharing, single-thread/processor (dual-threaded per node) • Flat address space, no paging • Physical resources are memory-mapped • Service node is a single multiprocessor machine running a custom OS

  10. Space Sharing • BG/L system can be partitioned into electronically isolated sets of nodes (power of 2) • Single-user, reservation-based for each partition • Faulty hardware are electrically isolated to allow other nodes to continue to run in the presence of component failures.

More Related