1 / 36

CAS2001

CAS2001. NEC Supercomputers for Meteorological Applications Road Map and Product Strategy. Oct. 29, 2001 Tadashi Watanabe. Solutions. /. History of High Performance Computers. Earth Simulator. ASCI Q. ASCI White. SR8000F1. 13. SX-6. 10. VPP5000. ASCI. SX-5/512. SX-4/512.

malory
Download Presentation

CAS2001

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CAS2001 NEC Supercomputers for Meteorological Applications Road Map and Product Strategy Oct. 29, 2001 Tadashi Watanabe Solutions /

  2. History of High Performance Computers Earth Simulator ASCI Q ASCI White SR8000F1 13 SX-6 10 VPP5000 ASCI SX-5/512 SX-4/512 VPP700/512 12 T 10 T3E VPP500/222 SR2201 CM-5 11 Vector Multiprocessors 10 CRAY T932 S3800/480 Scalar Vector SX-3/44R CRAY C90 SX-3/44 10 10 VP2600 CRAY Y-MP8 SX-2 FLOPS S-820/80 G 9 10 CRAY Y-MP VP-200 CYBER205 S-810/20 8 CRAY-1 10 ILLIAC IV STAR-100 TI ASC 7 10 CDC7600 Microprocessors M CDC6600 6 10 1970 '80 '90 2000 05

  3. Architecture of Supercomputers Distributed Memory Parallel Processor Distributed Shared PP Vector Processing (Memory to Memory) Vector Processing (Vector Register) Scalar Processing Shared Memory Multiprocessors Vector Processing Bottleneck in Memory Throughput Vector Register Vectorizing Compiler Performance Limitation by Single Processor Multiprocessor Parallelizing Compiler Bottleneck in Memory Throughput Distributed Memory Difficult to Code Distributed Shared Memory Performance Limitation by Scalar Processing SMP SMP Vector Processor Vector Processor Processor Scalar Processor Vector Processor Vector Pipes Vector Pipes Main Memory Vector Register Network Main Memory Main Memory Main Memory Main Memory Network CRAY-XMP/YMP CRAY-C90/T90 SX-3/SX-4/SX-5 VP2000 S3800 VPP500 T3E SP-2 CM5 nCUBE PARAGON CRAY-1 SX-2 VP-200 S810/S820 CYBER200 Mainframe CDC6600/7600 SX-5/SX-6 RS6000/SP O2K TX7

  4. Capacity Computing Capability Computing ・Goals:Workload and Throughput Wall clock time is secondary ・Many small problems - Not challenging ・Fit on μ-P Based MPP or Workstation Clusters ・Goal:Wall clock time(TAT) ・Large critical problems - not fit in conventional systems ・Best fit on SMP with powerful processors Capacity Computing and Capability Computing …

  5. Products for Capacity and Capability Computing Performance for Capability Computing SX-6 Supercomputer Series for Capability Computing IA-64 Scalable Server for Capacity Computing Performance for Capacity Computing (Throughput)

  6. Vector Processor vs Scalar Processor Large Small Small Large ・ Vector : Capability Computing ・ Scalar : Capacity Computing Vector Oriented Weather/Climate Genome Data Size Chemistry Scalar Oriented Crash CFD IA-64 Architecture Itanium(800MHZ) Max 16way Max 64GB Shared Memory Structural Analysis AzusA Arithmetic Operations

  7. *code name AzusA* NEC’s Strategic Itanium product • The world’s first large scale Itanium server in operation • Leverages NEC’s expertise on supercomputers and mainframes to develop highly scalable and reliable Itanium servers

  8. AzusA Features Cell#1 Cell#3 Cell#2 CPU CPU CPU Cell#0 MEM MEM MEM NEC MEM CPU AzusA Features • AzusA advanced features by NEC Original Chipset • Based on the expertise on super-computers and mainframes High Performance: 16 Intel ItaniumTM Processors 16 Intel ItaniumTM Large Memory Space: 64bit addressing 64GB main memory Large Configuration: 128 PCI slots(33MHz) or 64 slots (66MHz) • High Availability: • Replaceable parts hot-swappable • - CPU CELL, PCI card, FAN, Power supply • Data paths are ECC and/or parity protected PCI Box 8 Disk Drives Flexibility: Partitioning(into up to 4 systems) Higher scalability, availability, and flexibility

  9. Future Products NEC Itanium Server Roadmap 32-512CPU SCALABILTY 16,32-512 16-32 CPU Itanium 16CPU 16 Madison 8CPU AzusA McKinley 8CPU High-End Madison 4CPU McKinley 4CPU Itanium 4CPU Midrange Low-End McKinley Itanium Madison 2003 2000 2001 2002 Note: plan subject to change

  10. SX-6:The facts

  11. SX-Series History GLOBAL ALLIANCES ACCUMULATED HPC TECHNOLOGY STATE-OF-OF-THE-ART BOARD PACKAGING TECHNOLOGY ◆THE LATEST TECHNOLOGY IN THE SX-SERIESNEC INTRODUCES SX-6: A NEW GENERATION OF SUPERCOMPUTERS USE THE LATEST TECHNOLOGY TO BUILD UP AND DEVELOP THE NEW SUPERCOMPUTER SX-SERIES NEW GENERATION 2001 1998 1994 SX-6 Series - SINGLE-CHIP VECTOR PROCESSOR -GREATER SCALABILITY 1989 SX-5 Series -HIGH SUSTAINED PERFORMANCE -Large Capacity SHARED MEMORY 1983 SX-4 Series -CMOSINNOVATIVE TECHNOLOGY -ENTIRELY AIR-COOLING WITH THE COLLABORATION OF ISV AND USERS SX-3 Series -SHARED MEMORY・MULTI-FUNCTION PROCESSOR -UNIX OS SX Series -THE FIRST COMPUTER IN THE WORLD SURPASSING 1GFLOPS CRAY, BULL To Be THE MARKET LEADER IN LARGE SCALE HPC MARKET

  12. SX-6 single node system • High performance supercomputer • Ultra-high bandwidth shared memory subsystem • Maximum 8 processors, 8 Gigaflops each • Maximum 64 Gigabyte memory • Maximum 64 Gigaflops per node

  13. SX-6 multi node system • Maximum 128 nodes • Maximum 1024 CPUs, max 8 TFLOPS • Internode crossbar Switch • 8 GB/s interconnect bandwidth per node • 1 TB/s maximum interconnect bandwidth per system

  14. SX-6 system software • Proven Operating System: Super UX • Development Tools: C, C++, Fortran90, MPI, OpenMP, Vampir/SX, TotalView • Enhanced Multi-Node Batch System • Enhanced System Management Tools • User friendly middleware

  15. Focus Markets Environment & Meteorology DMI, DKRZ, CHMI, IAP, INGV, … MSC, INPE, BOM, KMA, JAMSTECH, NIES,... Aerospace NLR, DLR, EADS Airbus, ONERA,NAL ... Automotive IFP, Mecalog,Volkswagen Porsche ,DaimlerChrysler, Renault, Toyota, Mazda, Nissan, ... Research HLRS Stuttgart, CSCS, MPG, … NIFS, Tohoku University, Osaka University, ... Seismic Veritas, IFP, ...

  16. SX-6:The technology

  17. SX Series Processor Evolution SX - 4 SX- 5 SX- 6 Vector CPU 16 Vector Pipe 8 Vector Pipe 225 x 225 mm 457 x 386 mm 8 GFLOPS at 2.0ns 0.15µm CMOSSingle Chip Processor Performance : 2 GFLOPS at 8.0 ns : 8 GLOPS at 4.0 ns : 0.35μm CMOS LSI : 0.25μm CMOS : 37 Chips : 32 Chips

  18. SX series memory evolution SX - 4 SX- 5 SX- 6 457 x 386 mm 457 x 386 mm 105 x 176mm Capacity : 256MB / Card 4- 8GB / Card 2GB / Card Memory Chip : 4Mb SSRAM 32Mb SDRAM 64 - 128Mb SDRAM 256Mb DDR-SDRAM

  19. Size Comparison CPU : 128GFlops(64GF*2Node) Memory : 128GB SX - 6 64GF/Cab SX - 5 CPU : 160 GFlops Memory : 128GB 2.0m 1.1m 2.8m 1.8m 3.2m 6.8m ~ 7.4m

  20. SX-6:Parallel Processing and Performance

  21. Keys for Efficiencies in Parallel Processing ・Load Balancing ・Communication Overhead ・Synchronization

  22. Load Balancing Few/Powerful CPUs Many/less powerful CPUs … ● ● ● …… ● ● ● ● ● ● ● ● ● ● ● ● CPUs …… …… Job Small number of large tasks Many number of small tasks Which is more efficient and easier?

  23. Communication Overhead Many/less powerful CPUs Few/powerful CPUs ● CPU ● ● ● ● ● ● ● ● ● ● ● ・Many number of small tasks ・Low bandwidth and many paths among CPUs ・Small number of large tasks ・High bandwith and few paths among CPUs Which is more efficient and easier?

  24. Synchronization Many/less powerful CPUs Few/Powerful CPUs Fork Fork …… …… Join Join ・Many number of small tasks ・Small number of large tasks Which is more efficient and easier?

  25. … … NEC’s Approach for Capability Computing (SX-6 Systems Configuration) IXS Full Non-blocking X-bar 8GB/Sec Bisection Bandwidth Memory Memory Memory Large Number of Independent Memory Banks (4096 Banks) Full Non-blocking X-bar (256GB/Sec) 32GB/Sec Bandwidth 8GF/CPU Vector CPU ・Few but Powerful CPUs with Vector ・Powerful SMP ・High Bandwidth with Non-Blocking X-bar

  26. SX-6 vs SX-5 SX-5 [8GF] SX-6 [8GF] 2.5 Climate codes 2.0 1.5 Improvemment Ratio of User Time (SX-5 User Time/SX-6 User Time) 1.0 0.5 97.0 97.5 98.0 98.5 99.0 99.5 100.0 Vector Operation Ratio (%)

  27. Performance on SX-6/SX-5 (Electro Magnetic Field Analysis) 24 SX-6[8GF] 8CPU 20 SX-5[8GF] 16 4CPU 12 Effective GFLOPS 8 2CPU 4 0 8 16 24 32 40 48 56 64 Peak GFLOPS

  28. Performance on SX-6/SX-5 (Crystal Structure Analysis) 24 SX-6[8GF] 20 8CPUs SX-5[8GF] 16 4CPUs Effective GFLOPS 12 2CPUs 8 4 0 8 16 24 32 40 48 56 64 Peak GFLOPS

  29. Vector vs Scalar(Climate App.) 200 64CPUs 180 160 48CPUs 140 SX-6(8GF/CPU) 120 32CPUs Effective Gflops 100 SX-5(8GF/CPU) 80 Scalar Server(10%eff.) 60 40 Scalar Server(15%eff.) 20 0 0 100 200 300 400 500 600 Peak Gflops

  30. Technological Competence All Technologies for High Performance Computing are available internally within NEC: • Semiconductor Devices • Packaging • Hardware Design • Interconnections and Network • Operating Systems Software • Languages and Tools • Applications Tuning and Support

  31. Bits/Chip Tr/Chip Memory Chip and Tr in μ-Processor nm Tr bits 250 64G 16G 200 4G 1G 1G 100 256 100M (ITRS’99)

  32. Project of Science & Technology Agency Simulating “Earth” on Supercomputer Supercomputer Simulation: - can visualize - can virtually experiment - can forecast the future However, current supercomputers are not enough for further analysis of problems on Planet Earth Each CPUs executes their share of computation (North American 24hours Precipitation) Power x 1000 The Earth Simulator > 40TFLOPS 1Q2002 NEC SX-4

  33. Earth Simulator

  34. HPC Road Map Earth Simulator SX Series SX-6XX SX-6X SX-6 SX-5 SX-4 95 96 01 98 97 99 00

  35. Where NEC is ・Technology Leader in High Performance Computing ・Leading Supplier of HPC Platforms for Large Scale Technical and Engineering Computing ・Committed to Development of Vector Supercomputing ・Key Contributor to Vector Supercomputer Development

  36. E N D

More Related