1 / 29

Hybrid Technology Petaflops System

Superconducting Section 100 GHz Processors. P 0. P n. P n-1. CRAM 0. CRAM n. CRAM n-1. INTERCONNECT. L. i. q. u. i. d. N. R. e. g. i. m. e. Liquid N 2 Region. 2. Buffer. Buffer. Buffer. SRAM. SRAM. SRAM. OPTICAL PACKET SWITCH. DRAM. DRAM. DRAM. OPTICAL STORAGE.

Download Presentation

Hybrid Technology Petaflops System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WSCCG - Thomas Sterling

  2. Superconducting Section 100 GHz Processors P0 Pn Pn-1 CRAM0 CRAMn CRAMn-1 INTERCONNECT L i q u i d N R e g i m e Liquid N2 Region 2 Buffer Buffer Buffer SRAM SRAM SRAM OPTICAL PACKET SWITCH DRAM DRAM DRAM OPTICAL STORAGE Hybrid Technology Petaflops System • New device technologies • New component designs • New subsystem architecture • New system architecture • New latency management paradigm and mechanisms • New algorithms/applications • New compile time and runtime software WSCCG - Thomas Sterling

  3. Complementing Technologies Yield Superior Power/Price/Performance Sense Amps Sense Amps Memory Stack Memory Stack Decode Basic Silicon Macro Sense Amps Sense Amps Node Logic Sense Amps Sense Amps Memory Stack Memory Stack Sense Amps Sense Amps Single Chip Superconductor RSFQ logic provides X100 Performance Processor in Memory (PIM) High Memory Bandwidth and Low Power Data Vortex Optical Communication. Very High Bi-section Bandwidth with Low Latency Holographic Storage High Capacity with Low Power at Moderate speeds WSCCG - Thomas Sterling

  4. DIVA PIM Smart Memory for Irregular Data Structures and Dynamic Databases • Processor in Memory • Merges memory & logic on single chip • Exploits high internal memory bandwidth • Enables row-wide in-place memory operations • Reduces memory access latencies • Significant power reduction • Efficient fine grain parallel processing • DIVA PIM Project • DARPA sponsored $12.2M USC ISI prime with Caltech ($2.4M over 4 years), Notre Dame, U of Del • Greatly accelerate scientific computing of irregular data structures and commercial dynamic databases • 0.25 m 256 Mbit part delivered 4Qtr 00 • 4 processor/memory nodes • Key innovation of Multithreaded execution for high efficiency through latency management • Active message driven object oriented computation • Direct PIM to PIM interaction without host processor intervention WSCCG - Thomas Sterling

  5. HTMT Percolation Model CRYOGENIC AREA DMA to CRAM Split-Phase Synchronization to SRAM start done C-Buffer A-Queue I-Queue Parcel Dispatcher & Dispenser Parcel Assembly & Disassembly Parcel Invocation & Termination Re-Use D-Queue T-Queue Run Time System SRAM-PIM DMA to DRAM-PIM WSCCG - Thomas Sterling

  6. From Toys to Teraflops Bridging the Beowulf Gap Thomas Sterling California Institute of Technology NASA Jet Propulsion Laboratory September 3, 1998

  7. Death of Commercial High-End Parallel Computers? • No market for high end computers • minimum growth in last five years • The Great Extinction • KSR, Alliant, TMC, Intel, CRI, CCC, Multiflow, Maspar, BBN, Convex, ... • Must use COTS • fabrication costs skyrocketing • development lead times too short • Federal Agencies Fleeing • NSF, DARPA, NIST, NIH • No New Good IDEAS WSCCG - Thomas Sterling

  8. BEOWULF-CLASS SYSTEMS • Cluster of PCs • Intel x86 • DEC Alpha • Mac Power PC • Pure M2COTS • Unix-like O/S with source • Linux, BSD, Solaris • Message passing programming model • PVM, MPI, BSP, homebrew remedies • Single user environments • Large science and engineering applications WSCCG - Thomas Sterling

  9. Emergence of Beowulf Clusters WSCCG - Thomas Sterling

  10. Focus Tasks for Beowulf R&D • Applications • Scalability to high end • Low level enabling software technology • Grendel: Middle-ware for managing ensembles • Technology transfer WSCCG - Thomas Sterling

  11. Beowulf at Work WSCCG - Thomas Sterling

  12. Beowulf Scalability WSCCG - Thomas Sterling

  13. A 10 Gflops Beowulf Center for Advance Computing Research 172 Intel Pentium Pro microprocessors California Institute of Technology WSCCG - Thomas Sterling

  14. Avalon architecture and price. WSCCG - Thomas Sterling

  15. The Back-Ground WSCCG - Thomas Sterling

  16. Network Topology Scaling Latencies (s) WSCCG - Thomas Sterling

  17. Petaflops Clusters at POWR David H. Bailey * James Bieda Remy Evard Robert Clay Al Geist Carl Kesselman David E. Keyes Andrew Lunsdaine James R. McGraw Piyush Mehrotra Daniel Savarese Bob Voigt Michael S. Warren WSCCG - Thomas Sterling

  18. Critical System Software • A cluster node Unix-based OS (i.e. Linux or the like), scalable to 12,500+ nodes. • Fortran-90, C and C++ compilers, generating maximum performance object code, usable under the Linux OS. • An efficient implementation of MPI, scalable to 12,500+ nodes. • System management and job management tools, usable for systems of this size. WSCCG - Thomas Sterling

  19. System Software Research Tasks • Can a stripped down Linux-like operating system be designed that is scalable to 12,500+ nodes? • Can vendor compilers be utilized in a Linux node environment? • If not, can high-performance Linux-compatible compilers be produced by third party vendors, keyed to needs of scientific computing? • Can MPI be scaled to 12,500+ nodes? • Can system management and batch submission (i.e. PBS or LSF) tools be scaled to 12,500+ nodes? • Can an effective performance management tool be produced for systems with 12,500+ nodes? • Can an effective debugger be produced for systems with 12,500+ nodes? Can the debugger being specified by the Parallel Tools consortium be adapted for these systems? WSCCG - Thomas Sterling

  20. Technology Transfer • Information-hungry neo-users • how to implement • how to maintain • how to apply • Web based assembly and how-to information • Redhat CD-ROM including Extreme Linux • Tutorials • MIT Press book: “How to Build a Beowulf” • DOE and NASA workshops • JPC4: joint personal computer cluster computing conference • so many talks WSCCG - Thomas Sterling

  21. Godzilla meets BambiNT versus Linux • Not in competition, complements each other • Linux was not created by suits • created by people who wanted to create it • distributed by people who wanted to share it • used by people who want to use it • If Linux dies • it will not be killed by NT • it will be buried by Linux users • Linux provides • Unix-like O/S which has been mainstream of scientific computing • Open source code • Low/no cost WSCCG - Thomas Sterling

  22. Have to Run Big Problems on Big Machines? • Its work, not peak flops • A user’s throughput over application cycle • Big machines yield little slices • due to time and space sharing • But data set memory requirements • wide range of data set needs, three order of magnitude • latency tolerant algorithms enable out-of-core computation • What is Beowulf breakpoint for price-performance? WSCCG - Thomas Sterling

  23. WSCCG - Thomas Sterling

  24. Alternative APIs • Mostly MPI • PVM, also • custom messaging for performance • BSP • SPMD, global name space, implicit messaging • Hrunting • software supported distributed shared memory • EARTH • Guang Gao, Un. Of Delaware • software supported multithreaded WSCCG - Thomas Sterling

  25. Grendel Suite • Targets effective management of ensembles • Embraces “NIH” (nothing in-house) • Surrogate customer for Beowulf community • Borrow software products from research projects • Capabilities required: • communication layers • numerical libs • program development tools • scheduling and runtime • debug and availability • external I/O • secondary/mass storage • general system admin WSCCG - Thomas Sterling

  26. Towards the Future:what can we expect • 2 GFLOPS peak processors • $1000 per processor • 1 Gbps at < $250 per port • new backplane performance e.g. PCI++ • Light-weight communications, < 10 usec latency • Optimized math libraries • 1 Gbyte main memory per node • 24 Gbyte disk storage per node • defecto standardized middle-ware WSCCG - Thomas Sterling

  27. WSCCG - Thomas Sterling

  28. Million $$ Teraflops Beowulf? • Today, $3M peak Tflops • < year 2002 $1M peak Tflops • Performance efficiency is serious challenge • System integration • does vendor support of massive parallelism have to mean massive markup • System administration, boring but necessary • Maintenance without vendors; how? • New kind of vendors for support • Heterogeneity will become major aspect WSCCG - Thomas Sterling

  29. WSCCG - Thomas Sterling

More Related