1 / 16

Cray SV1

Cray SV1. by Kent Milfeld U. of Texas, ACCES Advanced Computing Center for Engineering and Science. SV1 OUTLINE. SV1 Processor SV1 Memory Multi-Streaming Processing (MSP) GigaRing. Cray SV1. aurora.hpc.utexas.edu SMP System 16 Processors 16 GB Memory Vector Processors

cybil
Download Presentation

Cray SV1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cray SV1 by Kent Milfeld U. of Texas, ACCES Advanced Computing Center for Engineering and Science

  2. SV1 OUTLINE • SV1 Processor • SV1 Memory • Multi-Streaming Processing (MSP) • GigaRing

  3. Cray SV1 aurora.hpc.utexas.edu SMP System 16 Processors 16 GB Memory Vector Processors 64-bit representation • SV1-1A Evolved from J90 Series • Air Cooled, 4 processors/board V

  4. Software Overview • Queuing system NQS/NQE • Compilers and Programming Tools F90, C++, C { totalview, apprentice, ATExpert, profview, hpm } • Libraries libsci, FFIO, IMSL, NAG • Applications and Tools abaqus, ls-dyna3d …, G98, gamess, amber, …

  5. Programming Model(s) • Shared Memory • Multitasking, OpenMP, Pthreads • MPI available through MPT (message passing toolkit) • MSP MultiStreaming Processing

  6. Performance Considerations local global 1 2 Cache Latency Instruction Issue Memory Latency Instruction Issue scalar pipelining or vectorization 0.5-1.0* 0.5* Cache Bandwidth Memory Bandwidth 3 4 vector cache blocking Optimal Coding Style 0.75* .25-.3* *Performance Relative to T90

  7. SV1 Processor • 300 MHz 0.18 m technology • 2-Vector Pipes • Pipes (8, 9, 16 CP pipes for +, *, /) • 1.2GF = 300MHz*2(flops/triad)*2(pipes) • 256KB Vector/Scalar-Cache • Cray 64-bit FP Representation

  8. Functional Units Vector Functional Units Scalar Functional Units Address Functional Units 8 S 8 A Execution Shared Registers 32SM 8SB,8ST 8x64 Vector Registers Instruction Buffers 8x32 64 T 64 B 128W Cache Memory J90 Block Diagram

  9. Functional Units Vector Functional Units Scalar Functional Units Address Functional Units 2nd Vector Functional Units 8 S 8 A Execution Shared Registers 32SM 8SB,8ST 8x64 Vector Registers Instruction Buffers 8x32 64 T 64 B 32KW “Vector/Scalar” Cache 128W Cache Memory SV1 Block Diagram J90 Block Diagram

  10. Vector/Scalar Cache 1.2GF CPU 9.6 GB/s 256KB Cache 4-way associative 1 word per cache line PE 1 9.6 GB/s VA/VB Memory fan-in/fan-out Memory

  11. Bandwidth Limits 0 1 4 5 8 9 12 13 CPU Modules 2 3 6 7 10 11 14 15 4 reads or 2 reads 2 writes 4 reads or 2 reads 2 writes 4 reads or 2 reads 2 writes 4 reads or 2 reads 2 writes } PER CPU 8 read or 4 writes+4reads (8 different sections) 8 read or 4 writes+4reads (8 different sections) 8 read or 4 writes+4reads (8 different sections) 8 read or 4 writes+4reads (8 different sections) } PER Module } Module Interface } Memory section 0 section 1 section 2 section 3 section 4 section 5 section 6 section 7

  12. SV1 Multiprocessing • Shared processors Autotasking (autotask lib: Compiler/Directives)OpenMP (Directives) • “Dedicated” processors MSP (Multi-Streaming Processor) 4-CPUsImplemented in Software (by compiler). Compiler creates multiple instruction streams for vector operations on each PE (does not use autotask lib)

  13. 8-Pipe MSP 4-PE Module 4-PE Module 4-PE Module 4-PE Module MSP 4.8 GFLOPS 8 Pipes PE 1 PE 2 PE 3 PE 4 6.4 GB/s Memory

  14. GigaRing • Two “counter rotating rings”, each 400MB/sec. • One GigaRing channel adapter per module. • Clusters are interconnected through a GigaRing.

  15. GigaRing Topology machine DISK TAPE SV1 Ethernet FDDI HiPPI

  16. GigaRing • 64 bit Client Interface, Fault Tolerant • MPN -FDDI, ATM, SCSI, Ethernet • FCN -Raid3, 100MB/sec (2 channels) • HPN -HiPPI (100/200 MB/s)

More Related