1 / 30

Experience with a Windows NT Supercomputer

Experience with a Windows NT Supercomputer. Andrew A. Chien University of California, San Diego and National Computational Science Alliance Workshop on Clusters and Computational Grids for Scientific Computing September 2-4, 1998 Blackberry Farms, TN. NT Cluster Team Members.

goldy
Download Presentation

Experience with a Windows NT Supercomputer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experience with a Windows NT Supercomputer Andrew A. Chien University of California, San Diego and National Computational Science Alliance Workshop on Clusters and Computational Grids for Scientific Computing September 2-4, 1998 Blackberry Farms, TN

  2. NT Cluster Team Members • CSAG (UIUC Department of Computer Science) • Andrew Chien (Faculty) • Qian Liu, Greg Koenig (Research staff) • Scott Pakin, Louis Giannini, Kay Connelly, Matt Buchanan, Sudha Krishnamurthy, Geetanjali Sampemane, Luis Rivera, and Oolan Zimmer (Graduate Students) • NCSA Leading Edge Site • Robert Pennington (Technical Program Manager) • Mike Showerman (Systems Programmer) • Qian Liu*, Avneesh Pant (Systems Engineers)

  3. Outline • Motivation • Technology Base • HPVM Systems • Application Performance • Plans • => No discussion of the MANY computer systems research issues and important results which • Enable the demonstrated performance • Enable increasing capability of these platforms

  4. Motivation Gigabit Networks - Myrinet, SCI, FC-AL, Giganet,GigE,ATM • Killer micros: Low cost Gigaflop processors here for a few kilo$$’s /processor • Killer networks: Gigabit network hardware, high performance software (e.g. Fast Messages), soon at centi-$$$/ connection • Leverage HW, commodity SW (Windows NT), build key technologies => high performance computing in a RICH software environment

  5. Scaleup Confusion • A pile of PC’s is not a large-scale server. • Why? • Performance and programming model

  6. Application Program “Virtual machine Interface” Actual system configuration Ideal Model: HPVM’s • HPVM = High Performance Virtual Machine • Provides a simple uniform programming model, abstracts and encapsulates underlying resource complexity • Simplifies use of complex resources

  7. HPVM = Cluster Supercomputers • Turnkey Supercomputing Clusters • Standard APIs, high performance communication, convenient use, coordinated resource management • Windows NT and Linux, provides front-end Queueing & Mgmt (LSF integrated) PGI HPF MPI Put/Get Global Arrays HPVM 1.0 Released Aug 19, 1997 Fast Messages Myrinet and Sockets

  8. High Performance Communication Switched Multigigabit, User-level access Networks Switched 100 Mbit OS mediated access • Level of network interface support + NIC/network router latency • Overhead and latency of communication -> effective bandwidth usable • High-performance communication enables Programmability! • low latency, low overhead, high bandwidth cluster communication • … much more is needed ...

  9. FM on Commodity PC’s FM Host Library FM Device Driver FM NIC Firmware • Host Library: API presentation, flow control, segmentation/reassembly, multithreading • Device driver: protection, memory mapping, scheduling monitors • NIC Firmware: link management, incoming buffer management, routing, multiplexing/demultiplexing Pentium Pro/II. NIC 1280Mbps ~400 MIPS ~50 MIPS PCI P6 bus

  10. 80 90+ MB/s 70 60 Bandwidth (MB/s) 50 40 30 20 n1/2 10 Msg size (bytes) 0 1,024 4,096 16,384 65,536 4 16 64 256 Fast Messages 2.x Performance • Latency 9ms, BW 90+MB/s, N1/2 ~250 bytes • Fast in absolute terms (compares to MPP’s, internal memory BW) • Delivers a large fraction of hardware performance for short messages • Technology transferred in emerging cluster standards Intel/Compaq/Microsoft’s Virtual Interface Architecture.

  11. 80 70 60 50 FM 40 Bandwidth (MB/s) MPI-FM 30 20 10 0 Msg Size (bytes) 4 8 16 32 64 128 256 512 1024 2048 4096 FM 2.x Evaluation (MPI) • MPI-FM: 80+MB/s, 11ms latency, ~2.5ms overhead • Short messages much better than IBM SP2, PCI limited • Latency ~ SGI O2K

  12. 100 90 80 70 60 % Efficiency 50 40 30 20 10 0 Msg Size (Bytes) 4 8 16 32 64 128 256 512 1024 2048 4096 FM2.x Evaluation (MPI) cont. • High Transfer Efficiency, approaches 100% • Other systems much lower even at 1KB (100Mbit: 40%, 1Gbit: 5%)

  13. Related Work • Drivers of Commercial VIA Standard (AM,U-Net,PM,Shrimp, BIP); Trapeze, GM, etc. • Lightweight Protocols, Fast Protocol Processing • XTP[Chesson87,Strayer92], TCP optimizations [van JacobsonXX] • don’t solve whole problem, short message problem • 0-copy Networking • fbufs,ADCs [Druschel&Petersen93] • Container Shipping [Pasquale&Anderson94] • Optimizations [Brustoloni&Steenkiste96] • High Performance Network Hardware • ATM [various...] • Myrinet [Boden&Seitz95] • Servernet [Horst95] • Fujitsu Mercury [ISCA97], VMMC-2 [Li97]

  14. Real HPVM’s

  15. Myrinet HPVM I • LSF Front End, HPVM Communication and Scheduling • Windows NT, Linux • HPVM I: 30xPentium Pro, January 1997 6 Gflops • Myrinet • Joint w/ Marianne Winslett

  16. Servernet Myrinet HPVM II • HPVM II: 64xPentium II, January 1998, 20Gflops • Dual networks (Myrinet, Servernet)

  17. HPVM III (“NT Supercluster”) • 256xPentium II, April 1998, 77Gflops • 3-level fat tree (large switches), scalable bandwidth, modular extensibility • => 512xPentium II(Deschutes), Late’98/Early’99, 200 Gflops • Both with National Center for Supercomputing Applications 200 GF, Late ‘98 77 GF, April 1998

  18. HPVM III • Andrew Chien, CS UIUC-->UCSD • Rob Pennington, NCSA • Myrinet Network, HPVM, Fast Messages • Microsoft NT OS, MPI API, etc. 192 Hewlett Packard, 300 MHz 64 Compaq, 333 MHz

  19. HPVM III

  20. Application Performance

  21. Applications on the HPVM NT Supercluster • Zeus-MP (192P, Mike Norman) • ISIS++ (192P, Robert Clay) • ASPCG (128P, Danesh Tafti) • Cactus (128P, Paul Walker/John Shalf/Ed Seidel) • QMC (128P, Lubos Mitas) • Boeing CFD Test Codes (128P, David Levine) • In progress: • SPRNG (Ashok Srinivasan), Gamess, MOPAC (John McKelvey), freeHEP (Doug Toussaint), AIPS++ (Dick Crutcher), Amber (Balaji Veeraraghavan), Delphi/Delco Codes, Parallel Sorting => No code retuning required (generally)

  22. Supercomputer Performance Characteristics Mflops/ProcFlops/ByteFlops/NetworkRT Cray T3E 1200 ~2 ~2,500 SGI Origin2000 500 ~0.5 ~1,000 HPVM NT Supercluster 300 ~3.2 ~6,000 Berkeley NOW II 100 ~3.2 ~2,000 IBM SP2 550 ~3.7 ~38,000 Beowulf (100Mbit) 300 ~25 ~500,000 • Compute/communicate and compute/latency ratios • Clusters can provide programmable characteristics at a dramatically lower system cost

  23. Solving 2D Navier-Stokes Kernel - Performance of Scalable Systems Preconditioned Conjugate Gradient Method With Multi-level Additive Schwarz Richardson Pre-conditioner (2D 1024x1024) Danesh Tafti, Rob Pennington, NCSA; Andrew Chien (UIUC, UCSD)

  24. NCSA NT Supercluster Solving Navier-Stokes Kernel Single Processor Performance: MIPS R10k 117 MFLOPS Intel Pentium II 80 MFLOPS Preconditioned Conjugate Gradient Method With Multi-level Additive Schwarz Richardson Pre-conditioner (2D 1024x1024) Danesh Tafti, Rob Pennington, Andrew Chien NCSA

  25. Excellent Scaling to 128P, Single Precision ~25% faster Solving 2D Navier-Stokes Kernel (cont.) Preconditioned Conjugate Gradient Method With Multi-level Additive Schwarz Richardson Pre-conditioner (2D 4094x4094) Danesh Tafti, Rob Pennington, NCSA; Andrew Chien (UIUC, UCSD)

  26. Near Perfect Scaling of Cactus - 3D Dynamic Solver for the Einstein GR Equations Cactus was Developed by Paul Walker, MPI-Potsdam UIUC, NCSA Ratio of GFLOPs Origin = 2.5x NT SC Paul Walker, John Shalf, Rob Pennington, Andrew Chien NCSA

  27. Quantum Monte Carlo Origin and HPVM Cluster Origin is about 1.7x Faster than NT SC T. Torelli (UIUC CS), L. Mitas (NCSA, Alliance Nanomaterials Team)

  28. PGI HPF MPI Put/Get Global Arrays Sockets DCOM Java RMI Fast Messages Myrinet and Sockets Servernet, VIA, etc. HPVM 1.0 HPVM Project Directions • Dynamic, Wide-area Clusters • Federated Clusters w/ high speed WANs and heterogeneous networks • Distributed computing API’s • Network RPC == local RPC (47ms Java RMI, 105ms MSRPC) • Tight coupling with Massive Storage

  29. HPF Windows NT Cluster Plans • Transition NCSA cluster to Production usage • Unassisted access; parallel jobs;Production I/O, System stability • Scaleup to 512 processors • Wide-area federation of large clusters • Link UCSD/CSE and NCSA LES clusters via vBNS (OC-12) or other high speed WANs (applications wanted) • Access resources using Grid Technology Planned MPI Software DSM Put/Get

  30. Summary • FM and HPVM are vehicles for turnkey supercomputing • Key element is high performance communication delivered by Fast Messages • Applications can be ported with minimal retuning (only NT issues) • Cluster systems can scale well, deliver high performance and cost effective performance • Coming soon to a neighborhood near you! • Web pages: http://www-csag.cs.uiuc.edu/

More Related