Cray sv1
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

Cray SV1 PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on
  • Presentation posted in: General

Cray SV1. by Kent Milfeld U. of Texas, ACCES Advanced Computing Center for Engineering and Science. SV1 OUTLINE. SV1 Processor SV1 Memory Multi-Streaming Processing (MSP) GigaRing. Cray SV1. aurora.hpc.utexas.edu SMP System 16 Processors 16 GB Memory Vector Processors

Download Presentation

Cray SV1

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cray sv1

Cray SV1

by

Kent Milfeld

U. of Texas, ACCES

Advanced Computing Center for Engineering and Science


Sv1 outline

SV1 OUTLINE

  • SV1 Processor

  • SV1 Memory

  • Multi-Streaming Processing (MSP)

  • GigaRing


Cray sv11

Cray SV1

aurora.hpc.utexas.edu

SMP System

16 Processors

16 GB Memory

Vector Processors

64-bit representation

  • SV1-1A Evolved from J90 Series

  • Air Cooled, 4 processors/board

V


Software overview

Software Overview

  • Queuing system

    NQS/NQE

  • Compilers and Programming Tools

    F90, C++, C

    { totalview, apprentice, ATExpert, profview, hpm }

  • Libraries

    libsci, FFIO, IMSL, NAG

  • Applications and Tools

    abaqus, ls-dyna3d …, G98, gamess, amber, …


Programming model s

Programming Model(s)

  • Shared Memory

    • Multitasking, OpenMP, Pthreads

    • MPI available through MPT (message passing toolkit)

    • MSP MultiStreaming Processing


Performance considerations

Performance Considerations

local

global

1

2

Cache Latency

Instruction Issue

Memory Latency

Instruction Issue

scalar

pipelining or vectorization

0.5-1.0*

0.5*

Cache

Bandwidth

Memory

Bandwidth

3

4

vector

cache blocking

Optimal Coding

Style

0.75*

.25-.3*

*Performance Relative to T90


Sv1 processor

SV1 Processor

  • 300 MHz 0.18 m technology

  • 2-Vector Pipes

  • Pipes (8, 9, 16 CP pipes for +, *, /)

  • 1.2GF = 300MHz*2(flops/triad)*2(pipes)

  • 256KB Vector/Scalar-Cache

  • Cray 64-bit FP Representation


Cray sv1

Functional Units

Vector

Functional

Units

Scalar

Functional

Units

Address

Functional

Units

8 S

8 A

Execution

Shared

Registers

32SM

8SB,8ST

8x64 Vector

Registers

Instruction

Buffers 8x32

64 T

64 B

128W

Cache

Memory

J90 Block Diagram


Cray sv1

Functional Units

Vector

Functional

Units

Scalar

Functional

Units

Address

Functional

Units

2nd Vector

Functional

Units

8 S

8 A

Execution

Shared

Registers

32SM

8SB,8ST

8x64 Vector

Registers

Instruction

Buffers 8x32

64 T

64 B

32KW “Vector/Scalar” Cache

128W

Cache

Memory

SV1 Block Diagram

J90 Block Diagram


Cray sv1

Vector/Scalar Cache

1.2GF

CPU

9.6 GB/s

256KB

Cache

4-way associative

1 word per cache line

PE 1

9.6 GB/s

VA/VB

Memory fan-in/fan-out

Memory


Bandwidth limits

Bandwidth Limits

0

1

4

5

8

9

12

13

CPU

Modules

2

3

6

7

10

11

14

15

4 reads or

2 reads 2 writes

4 reads or

2 reads 2 writes

4 reads or

2 reads 2 writes

4 reads or

2 reads 2 writes

}

PER CPU

8 read or

4 writes+4reads

(8 different sections)

8 read or

4 writes+4reads

(8 different sections)

8 read or

4 writes+4reads

(8 different sections)

8 read or

4 writes+4reads

(8 different sections)

}

PER

Module

}

Module

Interface

}

Memory

section 0

section 1

section 2

section 3

section 4

section 5

section 6

section 7


Sv1 multiprocessing

SV1 Multiprocessing

  • Shared processorsAutotasking (autotask lib: Compiler/Directives)OpenMP (Directives)

  • “Dedicated” processorsMSP (Multi-Streaming Processor) 4-CPUsImplemented in Software (by compiler).Compiler creates multiple instructionstreams for vector operations on each PE (does not use autotask lib)


Cray sv1

8-Pipe MSP

4-PE Module

4-PE Module

4-PE Module

4-PE Module

MSP

4.8 GFLOPS

8 Pipes

PE 1

PE 2

PE 3

PE 4

6.4 GB/s

Memory


Gigaring

GigaRing

  • Two “counter rotating rings”, each 400MB/sec.

  • One GigaRing channel adapter per module.

  • Clusters are interconnected through a GigaRing.


Gigaring topology

GigaRing Topology

machine

DISK

TAPE

SV1

Ethernet

FDDI

HiPPI


Gigaring1

GigaRing

  • 64 bit Client Interface, Fault Tolerant

  • MPN -FDDI, ATM, SCSI, Ethernet

  • FCN -Raid3, 100MB/sec (2 channels)

  • HPN -HiPPI (100/200 MB/s)


  • Login