The blue gene experience
This presentation is the property of its rightful owner.
Sponsored Links
1 / 8

The Blue Gene Experience PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on
  • Presentation posted in: General

The Blue Gene Experience. Manish Gupta IBM T. J. Watson Research Center Yorktown Heights, NY. Blue Gene/L (2005). 136.8 Teraflop/s on LINPACK (64K processors). System. Blue Gene/L. 64 Racks, 64x32x32. Rack. 32 Node Cards. Node Card. 180/360 TF/s 32 TB. (32 chips 4x4x2)

Download Presentation

The Blue Gene Experience

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The blue gene experience

The Blue Gene Experience

Manish Gupta

IBM T. J. Watson Research Center

Yorktown Heights, NY


Blue gene l 2005

Blue Gene/L (2005)

136.8 Teraflop/s on LINPACK (64K processors)


The blue gene experience

System

Blue Gene/L

64 Racks, 64x32x32

Rack

32 Node Cards

Node Card

180/360 TF/s

32 TB

(32 chips 4x4x2)

16 compute, 0-2 IO cards

2.8/5.6 TF/s

512 GB

Compute Card

2 chips, 1x2x1

90/180 GF/s

16 GB

Chip

2 processors

5.6/11.2 GF/s

1.0 GB

2.8/5.6 GF/s

4 MB


Blue gene l compute asic

Blue Gene/L Compute ASIC

  • Low power processors

  • Chip-level integration

  • Powerful networks


Blue gene l networks

Blue Gene/L Networks

3 Dimensional Torus

  • Interconnects all compute nodes (65,536)

  • Virtual cut-through hardware routing

  • 1.4Gb/s on all 12 node links (2.1 GB/s per node)

  • 1 µs latency between nearest neighbors, 5 µs to the farthest

  • Communications backbone for computations

  • 0.7/1.4 TB/s bisection bandwidth, 68TB/s total bandwidth

    Global Collective

  • One-to-all broadcast functionality

  • Reduction operations functionality

  • 2.8 Gb/s of bandwidth per link

  • Latency of one way traversal 2.5 µs

  • Interconnects all compute and I/O nodes (1024)

    Low Latency Global Barrier and Interrupt

  • Latency of round trip 1.3 µs

    Ethernet

  • Incorporated into every node ASIC

  • Active in the I/O nodes (1:8-64)

  • All external comm. (file I/O, control, user interaction, etc.)

    Control Network


Ras reliability availability serviceability

RAS (Reliability, Availability, Serviceability)

  • System designed for reliability from top to bottom

    • System issues

      • Redundant bulk supplies, power converters, fans, DRAM bits, cable bits

      • Extensive data logging (voltage, temp, recoverable errors … ) for failure forecasting

      • Nearly no single points of failure

    • Chip design

      • ECC on all SRAMs

      • All dataflow outside processors is protected by error-detection mechanisms

      • Access to all state via noninvasive back door

    • Low power, simple design leads to higher reliability

    • All interconnects have multiple error detections and correction coverage

      • Virtually zero escape probability for link errors


Blue gene l system architecture

C-Node 63

C-Node 63

C-Node 0

C-Node 0

CNK

CNK

CNK

CNK

Blue Gene/L System Architecture

tree

Pset 0

Service Node

I/O Node 0

SystemConsole

Front-endNodes

FileServers

Linux

app

app

fs client

ciod

Functional Gigabit Ethernet

CMCS

torus

DB2

I/O Node 1023

Linux

I2C

app

app

Control GigabitEthernet

LoadLeveler

fs client

ciod

IDo chip

JTAG

Pset 1023


Example performance graphs molecular dynamics

Example performance graphs (Molecular dynamics)


  • Login