Interconnection and packaging in ibm blue gene l
1 / 28

Interconnection and Packaging in IBM Blue Gene - PowerPoint PPT Presentation

  • Uploaded on

Interconnection and Packaging in IBM Blue Gene/L. Yi Zhu Feb 12, 2007. Outline. Design goals Architecture Design philosophy. Main Design Goals for Blue Gene/L. Improve computing capability, holding total system cost. Reduce cost/FLOP. Reduce complexity and size.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Interconnection and Packaging in IBM Blue Gene' - candie

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Outline l.jpg

  • Design goals

  • Architecture

  • Design philosophy

Main design goals for blue gene l l.jpg
Main Design Goals for Blue Gene/L

  • Improve computing capability, holding total system cost.

  • Reduce cost/FLOP.

  • Reduce complexity and size.

    • ~25KW/rack is max for air-cooling in standard room.

    • 700MHz PowerPC440 for ASIC has excellent FLOP/Watt.

  • Maximize Integration:

    • On chip: ASIC with everything except main memory.

    • Off chip: Maximize number of nodes in a rack..

Blue gene l packaging l.jpg
Blue Gene/L Packaging

  • 2 nodes per compute card.

  • 16 compute cards per node board.

  • 16 node boards per 512-node midplane.

  • Two midplanes in a 1024-node rack.

  • 64 racks

Dimensions l.jpg

  • Compute card: 206 mm x 55 mm

  • Node card: near to 0.46 m x 0.61 m

  • Midplane: 0.64m tall x 0.8m x 0.5m

  • Rack: 2m tall x 0.91 m x 0.91 m

Topology l.jpg

  • On one midplane: 16 node cards x 16 computer cards x 2 chips – 8x8x8 torus

  • Among midplanes: three network switches, one per dimension – 8x4x4 torus

Other networks l.jpg
Other Networks

  • A global combining/broadcast tree for collective operations

  • A Gigabit Ethernet network for connection to other systems, such as hosts and file systems.

  • A global barrier and interrupt network

  • And another Gigabit Ethernet to JTAG network for machine control

Node architecture l.jpg
Node Architecture

  • IBM PowerPC embedded CMOS processors, embedded DRAM, and system-on-a-chip technique is used.

  • 11.1-mm square die size, allowing for a very high density of processing.

  • The ASIC uses IBM CMOS CU-11 130nm micron technology.

  • 700 Mhz processor speed close to memory speed.

  • Two processors per node.

  • Second processor is intended primarily for handling message passing operations

First level packaging l.jpg
First Level Packaging

  • Dimension: 32mm x 25mm

    • 474 pins

      • 328 signals for the memory interface

    • A bit-serial torus bus

    • A 3-port double-bit-wide bus

    • 4 global OR signals for fast asynchronous barriers

Design philosophy l.jpg

Computer Cards

Bus widths

# pins, # ports

Design Philosophy

  • Key: determine the parameters from high-level package to chip pin assignment

Interconnection Networks

Routing and Pin assignment

Card connectors, dimensions

Interconnection networks l.jpg
Interconnection Networks

  • Cables are bigger, costlier and less reliable than traces.

    • So want to minimize the number of cables.

    • 3-dimensional torus is chosen as main BG/L network, with each node connected to 6 neighbors.

    • Maximize number of nodes connected via circuit card(s) only.

Interconnection networks16 l.jpg
Interconnection Networks

  • BG/L midplane has 8*8*8=512 nodes.

  • (Number of cable connections) / (all connections)

    = (6 faces * 8 * 8 nodes) / (6 neighbors * 8 * 8 * 8 nodes)

    = 1 / 8

Compute card17 l.jpg
Compute Card

  • Determined by the trade off space, function and cost

  • Fewest possible computer ASICs per card has lowest cost for test, rework and replacement

  • Two ASICs per card are more space-efficient due to the share SDRAM

Bus widths l.jpg
Bus Widths

  • Bus width of the torus network was decided primarily by # cables that could be physically connected to a midplane

  • Collective network and interrupt bus widthsand topology were determined by computer card form

Pins and ports l.jpg
# Pins and # Ports

  • # Pins per ASIC is determined by the choice of collective network and interrupt bus widths + # ports escaping each ASIC

  • # collective ports per ASIC & between card connectors was a tradeoff between collective network latency and system form factor

Final choices l.jpg
Final Choices

  • 3 collective ports per ASIC

  • 2 bidirectional bits per collective port

  • 4 bidirectional global interrupt bit per interrupt bus

  • 32mmx25mm package

  • Other factors (computer card form, widths of various buses…) are determined to yield the maximal density of ASICs per rack

Design philosophy21 l.jpg
Design Philosophy

  • Next to determine:

    • Circuit card connectors

    • Card cross section

    • Card wiring

  • Objectives

    • Compactness

    • Low cost

    • Electrical signaling quality

Card to card connectors l.jpg
Card-to-Card Connectors

  • Differential: because all high-speed buses are differential

  • Two differential signal pairs per column of pins

    • Signal buses to spread out horizontally across nearly the entire width of each connection

    • Fewer layers to escape, fewer crosses

  • Final choice: Metral 4000 connector

Circuit card cross sections l.jpg
Circuit Card Cross Sections

  • Fundamental requirement: high electrical signaling quality

  • Alternating signal and ground layers

  • 14 total layers except the midplane (18 layers)

  • Node card requires additional power layers to distribute 1.5V core voltage to computer cards

Circuit card cross sections24 l.jpg
Circuit Card Cross Sections

  • In some layers with long distance nets, need low resistive loss

    • Wide (190 um to 215 um) 1.0-ounce copper traces

  • Other layers, minimize card thickness

    • Narrow (100 um) 0.5-ounce nets

  • Card dielectrics: low-cost FR4

    • Sufficient for signaling speed 1.4 Gb/s

Card sizes l.jpg
Card Sizes

  • Determined by a combination of manufacturability and system form factor consideration

  • Node cards are near to the maximum card size obtainable from the industry-standard low cost 0.46m x 0.61m

  • Midplane is confined to the largest panel size that could still be manufactured by multiple card vendors

Card wiring l.jpg
Card Wiring

  • Goal: minimize card layers (minimize card cost)

  • Routing order

    • 3d torus network (most regular and numerous) on cards

    • Pin assignment for torus network to minimize net signal crossing

Card wiring27 l.jpg
Card Wiring

  • Routing order (cont’d)

    • Global collective network & interrupt bus

      • Exact logical structures determined to minimize # layers

    • Layout of 16-byte-wide SDRAM

      • Optimize package escape and # routing layers

    • ASIC pin assignment

    • High-speed clocks

    • Low-speed nets

References l.jpg

  • “Overview of the Blue Gene/L system architecture”, IBM J Res. & Dev., Vol. 49, No. 2/3, March/May 2005

  • “Packaging the Blue Gene/L supercomputer”, IBM J Res. & Dev., Vol. 49, No. 2/3, March/May 2005

  • “Blue Gene/L torus interconnection network”, IBM J Res. & Dev., Vol. 49, No. 2/3, March/May 2005