slide1
Download
Skip this Video
Download Presentation
Overview of Hitachi’s Super Technical Server SR8000

Loading in 2 Seconds...

play fullscreen
1 / 21

Overview of Hitachi’s Super Technical Server SR8000 - PowerPoint PPT Presentation


  • 161 Views
  • Uploaded on

The Third International Workshop on Next Generation Climate Models. Overview of Hitachi’s Super Technical Server SR8000. March, 2001. Yoshiro Aihara. Hitachi, Ltd. Enterprise Server Division. Advanced RISC Parallel. RISC Parallel. Vector Type. HITACHI Supercomputers.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Overview of Hitachi’s Super Technical Server SR8000' - carl


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

The Third International Workshop on Next Generation Climate Models

Overview of Hitachi’s

Super Technical Server SR8000

March, 2001

Yoshiro Aihara

Hitachi, Ltd. Enterprise Server Division

slide2

Advanced

RISC Parallel

RISC Parallel

Vector Type

HITACHI Supercomputers

New concept machine

for advanced HPC users

(Combination of Parallel and Vector)

10T

SR8000

First commercially available

distributed memory parallel processor

SR2201

Series

1T

Single CPU peak performance

8GFlops (Fastest in the world)

100G

S-3000

Series

Single CPU peak performance

3GFlops

Peak Performance(FLOPS)

10G

S-820

Series

First Japanese

Vector Supercomputer

1G

S-810

Series

Integrated Array

Processor system

0.1G

M-680

M-280H

IAP

IAP

0.01G

M-200H

IAP

\'77

\'78

\'79

\'80

\'81

\'82

\'83

\'84

\'85

\'86

\'87

\'88

\'89

\'90

\'91

\'92

\'93

\'94

\'95

\'96

\'97

\'98

\'99

\'00

‘01

Year

Announcement

IAP:Integrated Array Processor

slide3

Design Concept of SR8000

SR8000: New Concept combining advantages of

Vector processor and RISC Parallel Processor

Hitachi’s Solution

Target of Design

Vector processor

SR8000 New Feature

- PVP feature

- Vector processing

High Single Node

Performance

- COMPAS feature

- Element parallel processing

- High Memory Throughput

Multi-dimensional Crossbar Network

(High-speed inter-node network)

High Scalability

Short Development Cycle Easy Enhancement:

RISC based processor (HITACHI developed)

PVP: Pseudo Vector Processing

COMPAS: Co-operative Micro-Processors in single Address Space

slide4

Basic Configuration of SR8000

COMPAS: CO-operative Micro-Processors in single Address Space

High performance RISC

Microprocessor (Hitachi develop.) Pseudo-Vector Processing

Multi-dimensional Crossbar Network

High speed inter-node network

Node

Node

Node

High

performance

RISC

High

performance

RISC

SP

System control

PCI

Network control

Main memory

I/O adapter

MCD

SVP : SerVice Processor

MCD : Maintenance Console Device

SP : System Processor

Ether

ATM

HiPPi

SVP

I/O Device

RAID Disk

slide5

2 nodes

8 nodes

y

z

x

8 nodes

: X axis crossbar

: Nodes

: Y axis crossbar

Multi-dimensional Crossbar network

Ex) 8x8x2 (128 nodes) configuration

slide7

Pseudo-Vector Processing(PVP)

Problems of conventional RISC processor

- Reduction of performance for large scale

simulations because of cache-overflow

- Sustained : Under 10% of peak

Main memory

Prefetch

Cache memory

Preload

Prefetch

- Read data from main memory to cache before calculation

- Accelerate sequential data access

Preload

- Read data from main memory to Extended Floating

Registers before calculation

- Accelerate stride memory access and indirectly addressed

memory access

load

Extended floating point registers(160)

FPU

slide8

COMPAS Feature of SR8000

Realization of elementwise parallel processing of DO Loops, employed in

vector supercomputer, by multiple processors in a node

(Automatic elementwise parallelization in a node by compiler)

Program Behavior

IP

IP

IP

IP

(waiting for startup)

(waiting for startup)

(waiting for startup)

Scalar Part

・・・

Start Parallel Inst.

Loop

Part

Loop

Part

Loop

Part

Loop

Part

End Parallel Inst.

Scalar Part

Hardware Feature(COMPAS)

・・・

IP

IP

IP

IP

Realization of high speed

processing of multiple processors

by hardware high-speed

communication mechanism

SC

High-speed

Communication

Mechanism

MS

IP:Instruction Processor

SC:Storage Controller

MS:Main Storage

COMPAS: CO-operative Micro-Processors in single Address Space

slide10

Physical Data of SR8000

Example; 128 Node Configuration (G1 model)

Power Consumption; approx. 370 kVA

Heat Dissipation; approx. 330 kW

Cooling Air Inlet Temperature; 16--22 deg C

Weight; approx. 15,000 kg

Floor Space; approx. 50 sq. meters

(incl. service area)

approx. 8.0 m

Foot Print (128 node)

approx. 3.3 m

Height: approx. 1.8 m

overview of software products
Overview of Software Products

HI-UX/MPP

OSF/1 Microkernelbased OS

NQS, BGT, DIFF, SFF, PFF

OS

Language Processor

Optimizing FORTRAN77/90, HPF,

Optimizing C, C++, OpenMP (Ver1)

Program Development

Parallel Library

MPI-2, PVM, PARALLELWARE

Numerical Calculation

MATRIX/MPP,MATRIX/MPP/SSS,MSL2

Development Support

Symbolic Debugger OptimizingC /FORTRAN90

Performance Monitor(for HP-UX)

Graphics

X11R6, Motif1.2

GUI

Graphic Library

GKS, PEX, PHIGS

Network

Ethernet / Fast Ethernet, GbE, HiPPi, ATM

TCP/IP, NFS V3, telnet, rlogin

slide12

3500

Series

H-9000V

Series

WS

PC

X Terminal

UNIX(OSF/1) Server (Functional

co-operation with other nodes)

Micro-kernel (Control of all IPs)

Single UNIX System

  • Single UNIX System : Single System Operation (File system, Process control, Network)
  • Open System (Standardized OS, Compiler, Network)
  • Flexible System Operation (Partitioning Operation, Automatic Operation)
  • Scalable System (4 to 512 nodes)

SR8000

Console

Graphic

3D-XB

Other Vendor

(SGI, etc........)

Node

Node

Node

Node

Disk

HIPPI

Node

Node

Node

Node

Network

Single UNIX

System

RAID

Node

Node

Node

Node

Node

Node

Disk

HIPPI

SR2000

Series

3D-XB

Ethernet

Node

Node

Node

Node

COMPAS Feature

Main Storage

...

IP

IP

IP

IP

IP

SP

COMPAS (CO-operative Micro-Processors in single Address Space)

IP:Instruction Processor

3D-XB: 3-dimensional Cross-bar Network

remote dma transfer

Remote DMA Transfer

● Direct Memory Copy between User Program

on Different Nodes that minimizes OS Overhead

Protocol Processing

Context Switch

Interrupt Handling

Remote DMA Transfer

No Buffering in Kernel

No OS System Call

Normal Transfer

Node

Node

Program

Program

data

data

memory copy

memory copy

OS

OS

Send Buffer

Receive Buffer

data

data

Crossbar Network

slide14

Examples of ISV Package

MSC.Nastran

MSC.Marc

LS-DYNA

PAM_CRASH

ABAQUS/Standard

ABAQUS/Explicit

Structural

Analysis

STAR-CD

PHOENICS

SCRYU

STREAM

Computational

Fluid

Dynamics

FLUENT

Chemical

Analysis

GAUSSIAN98

AMBER

NAG

Libraries

IMSL

TotalView

Vampir

Tools AVS/EXPRESS

slide15

Leibniz Rechenzentrum (Germany)

High Energy Accelerator Research Organization

University of Tokyo

Japan Meteorological Agency

University of Tokyo / Institute for Solid State Physics

Tsukuba advanced Computing Center - TACC / AIST

Meteorological Research Institute

Hokkaido University

Institute of Statistical Mathematics

HWW / Universitat Stuttgart & DLR (Germany)

..

SR8000 Installation Sites (Example)

slide16

TOP500 Supercomputing Sites - November 3rd, 2000

Rmax/Rpeak > 75 % Hitachi SR8000 works efficiently.

slide17

TOP500 Supercomputing Sites - November 3rd, 2000

Rmax/Rpeak = 85.3 % on SR8000/128

Rmax/Rpeak = 90.0 % on SR8000-E1/80

Hitachi SR8000 works efficiently.

slide18

SR8000 F1 & G1 LINPACK Performance

SR8000G1

SR8000F1

313.30 Gflop/s on SR8000F1/32 with Nmax=65000

↓ 6% Speed Up

331.50 Gflop/s on SR8000F1/32 with Nmax=84800

↓ 20% Speed Up

398.50 Gflop/s on SR8000G1/32 with Nmax=84800

slide19

NAS Parallel Benchmark (FT)

Model G1 is 1.28~1.30 times faster

than Model F1.

FT: A 3-D fast-Fourier transform partial differential equation benchmark

slide20

NAS Parallel Benchmark (MG)

Model G1 is 1.22~1.24 times faster

than Model F1.

MG: a simple 3D multigrid benchmark

slide21

MPI Ping-Pong Performance

Remote DMA (Direct Memory Access) is sender driven and makes memory to memory copy of data.

Remote DMA provides a high-speed inter-processor communication function without redundant copying.

ad