BGL Photo (system)

BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3. <http://www.research.ibm.com/journal/rd49-23.html>

Main Design Principles • Some science & engineering applications scale up to and beyond 10,000 parallel processes; • Improve computing capability, holding total system cost; • Cost/perf trade-offs considering the end-use: • Applications <> Architecture <> Packaging • Reduce complexity and size. • ~25KW/rack is max for air-cooling in standard room. • Need to improve performance/power ratio. • 700MHz PowerPC440 for ASIC has excellent FLOP/Watt. • Maximize Integration: • On chip: ASIC with everything except main memory. • Off chip: Maximize number of nodes in a rack.. • Large systems require excellent reliability, availability, serviceability (RAS)

Physical Layout of BG/L

The Compute Chip • System-on-a-chip (SoC) • 1 ASIC • 2 PowerPC processors • L1 and L2 Caches • 4MB embedded DRAM • DDR DRAM interface and DMA controller • Network connectivity hardware • Control / monitoring equip. (JTAG)

Compute and Node Cards

Node Architecture • IBM PowerPC embedded CMOS processors, embedded DRAM, and system-on-a-chip technique is used. • 11.1-mm square die size, allowing for a very high density of processing. • The ASIC uses IBM CMOS CU-11 0.13 micron technology. • 700 Mhz processor speed close to memory speed. • Two processors per node. • Second processor is intended primarily for handling message passing operations

Midplane and Rack • 1 rack holds 1024 nodes • Nodes optimized for low power • ASIC based on SoC technology • Outperform commodity clusters while saving on power • Aggressive packaging of processor, memory and interconnect • Power efficient & space efficient • Allows for latencies and bandwidths that are significantly better than those for nodes typically used in ASC scale supercomputers

The Torus Network • 64 x 32 x 32 • Each compute node is connected to its six neighbors: x+, x-, y+, y-, z+, z- • Compute card is 1x2x1 • Node card is 4x4x2 • 16 compute cards in 4x2x2 arrangement • Midplane is 8x8x8 • 16 node cards in 2x2x4 arrangement • Each uni-directional link is 1.4Gb/s, or 175MB/s. • Each node can send and receive at 1.05GB/s. • Supports cut-through routing, along with both deterministic and adaptive routing. • Variable-sized packets of 32,64,96…256 bytes • Guarantees reliable delivery

BG/L System Software • System software supports efficient execution of parallel applications • Compiler support for MPI-based C, C++, Fortran • Front-end nodes are commodity PCs running Linux • I/O nodes run a customized Linux kernel • Compute nodes: extremely lightweight custom kernel • Space sharing, single-thread/processor (dual-threaded per node) • Flat address space, no paging • Physical resources are memory-mapped • Service node is a single multiprocessor machine running a custom OS

Space Sharing • BG/L system can be partitioned into electronically isolated sets of nodes (power of 2) • Single-user, reservation-based for each partition • Faulty hardware are electrically isolated to allow other nodes to continue to run in the presence of component failures.

BGL Photo (system)

BGL Photo (system)

Presentation Transcript

Photo Story 3

Joint U.S.-Canada Power System Outage Investigation

System

SYSTEM

Photo Editor Mobicards

STUDENT PHOTO

Planets2 Photo Album

TAKE PHOTO

Photo booth rental for wedding

Photo Here

System

TAKE PHOTO

SYSTEM

TAKE PHOTO

Photo Navigator

system