1 / 17

Making Red Storm a Success Subtitle: It Takes a Village to Build a Supercomputer

Making Red Storm a Success Subtitle: It Takes a Village to Build a Supercomputer. June 7-9, 2006 Sue Kelly Sandia National Laboratories smkelly@sandia.gov , 505-845-9770 SAND-2006-3384P Unlimited Release. Outline. Red Storm brief background A year of continual improvement

boone
Download Presentation

Making Red Storm a Success Subtitle: It Takes a Village to Build a Supercomputer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Red Storm a SuccessSubtitle: It Takes a Village to Build a Supercomputer June 7-9, 2006 Sue Kelly Sandia National Laboratories smkelly@sandia.gov, 505-845-9770 SAND-2006-3384P Unlimited Release Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

  2. Outline • Red Storm brief background • A year of continual improvement • Sandia contributions to Red Storm • Current status and future work

  3. Red Storm Project Background • ASC Program Capability (versus capacity) HPC machine • Resource to NNSA’s Stockpile Stewardship program for advanced simulations • Available on a limited basis to other national security programs and scientific endeavors • Timeline • Contract awarded to Cray in September, 2002 • Hardware delivered Sept 2004 thru Jan 2005 • Achieved initial operation in March, 2005 • Achieved limited availability in September, 2005 • Machine general availability targeted for September, 2006

  4. Red Storm Configuration

  5. Topic 2 - A Year of Continual Improvement • Significant efforts in hardware and software reliability bore fruit • Performance improvements were integrated into production source base, giving the best of both worlds – performance and reliability MPI Latency (lower is better)

  6. Topic 3 - Sandia Contributions During Initial Development • Programmatic: • Active management • Mentorship of Cray developers • Milestone-based payment schedule • Technical: • Sandia-developed architecture • Based on more than a decade of experience with MPP (Massively Parallel Processor) systems • Created a Statement of Work that embodied the design • Software for application run-time environment • Compute node light weight kernel operating system • Virtual file system library • Logarithmic job launcher • Compute processor allocator

  7. Risk Mitigation Efforts • Developed C version of Seastar NIC (Network Interface Chip) firmware • Implementation reduced latency from >25 msec to 7 msec • Deployed in version 1.2 of Cray’s XT3 software release • Provided first version of NIC-resident network software/firmware (protocol offload engine) • Introduced compatible MPI enhancements • Developed interim booting mechanism • Used for development • Scaled to several thousand nodes • Developed interim parallel I/O file system since Lustre was not ready in time for initial operation • The HCFS was an extension of PVFS

  8. Ties to Research • Light Weight Kernel OS (Operating System) • SUNMOS -> Puma -> Cougar -> Catamount • Network Interconnects • Ongoing portals work; Red Storm implements portals V3.3 • Prototypes of NIC-based portals implementations • CANNOT OVER EMPHASIZE THE IMPORTANCE OF THIS PRIOR RESEARCH • MPI • Collectives • Overlap and independent progress • I/O • Parallel I/O • High Performance I/O • Reliability, Availability, and Serviceability (RAS) • Cluster Integration Tool Kit • Theoretical RAS analysis Supplemental Slides contain references

  9. The Hero Runs • The early adopters • Science runs • POP, SEAM, CTH • Application verification • CTH, ITS, Sage, Partisn, Salinas, Alegra, Presto, Calore • Application scaling • POP, SEAM, CTH, Salinas, Sage, ITS • Benchmarking • UMT2K, sPPM, HPL, HPCC • The first “production” users • Bert Still – “redstorm-s seems to be well ahead of other MPP machines at a comparable point in their development cycle; other systems i've used are continuing to have lustre problems well into their 2nd and 3rd years of service.  ultimately, the people make the system work, and the staff at SNL are outstanding” • Tim Jones – “Analysis codes run fast on red storm”, “I/O improvements has helped my analysis time tremendously; jobs are ~3-4 time faster” • Ref: Daly talk

  10. The Village • Champions for the computer facility • The networking guys • Viz and data services • Security team • User account processes and programs • Accounting system • Help desk • Machine oversight and work prioritization committee

  11. Topic 4: Current StatusExceptional scaling results SEAM Benchmark Red Storm and Blue Gene Sage Results

  12. Numerous Production Successes • LANL classified work • 5000 nodes • Running since January • “Scaling is nearly linear” • Mean Time Between Interrupts (MTBI) was 17 hours during last 3 months • CTH • 2000, 5000 node jobs • Ran for 5 days • Salinas • 400 nodes – multiple jobs • Each simulation ran approximately 5 days • Scaling study up to 2048 nodes • Fuego • 1024 nodes – multiple jobs • Running since February • Presto • 512, 1024 nodes • “achieved a computational rate of 7.5msec/day (simulation time over clock time). This is the fastest computational rate that we have seen for a full body B61 model (1.5 million elements)” • pF3d • 128, 256 nodes (due to unavailability of more nodes) • Running since Fall ’05 • “There are now significant changes in the NIF target designs, which will vastly improve margin and robustness…with the information gained from the Redstorm runs, we can streamline a lot of our work by reducing parameter space."

  13. Current & Future Red Storm Efforts • Ramping up to support hardware upgrade: 5th row, dual-cores, SeaStar 2.1 NIC—all due this summer • Added support for dual-core AMD Opterons to the compute node light weight kernel OS • Implemented as two virtual nodes • Master processor does network I/O for both • Adding support for network protocol offload engine in the Seastar network interface chip • reduces zero-byte latency from ~7msec to ~4msec • Combining above two efforts to support 4-way AMD Opterons • Formed a team to improve Lustre performance • Initial efforts are aimed at assessing current state • Developing a methodology for analyzing progress • Stratifying the relevant components and measuring performance at each point in the pipe line

  14. Summary • This talk focused on Sandia contributions and how the Sandia research program was critical to the success of the efforts. • While many challenges remain, Red Storm has evolved to a high performing, scalable platform for production use. • The XT3 product line, based on Red Storm has helped other scientific communities accomplish their goals (PSC, ORNL, ERDC, CSCS, AWE, …).

  15. Selected References for Each Research Area • Light Weight Kernel OS (Operating System) • Brightwell, Ron, Rolf Riesen, Keith Underwood, Trammell B Hudson, Patrick Bridges, Arthur B Maccabe, "A Performance Comparison of Linux and a Lightweight Kernel,"Conference Paper, IEEE International Conference on Cluster Computing, December 2003. • Maccabe, Arthur B., Patrick G. Bridges, Ron B. Brightwell, Rolf E. Riesen, Trammell B. Hudson, "Highly Configurable Operating Systems for Ultrascale Systems," Workshop Paper, First International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters, June 2004. • Kelly, Suzanne M, Ron B Brightwell, John P VanDyke, "Catamount Software Architecture with Dual Core Extensions,"Conference Paper, Cray User Group, May 2006. • Network Interconnects • Pedretti, Kevin and Ron Brightwell, “A NIC-Offload Implementation of Portals for Quadrics QsNet,” Proceedings of the Fifth LCI International Conference on Linux Clusters, May 2004. • Brightwell, Ron B, Douglas Doerfler, Keith D Underwood, "A Comparison of 4X Infiniband and Quadrics Elan-4 Technologies,"Conference Paper, 2004 International Conference on Cluster Computing (Cluster 2004), September 2004. • Brightwell, Ron, Trammell Hudson, Kevin Pedretti, Keith D Underwood, Rolf Riesen, "Implementation and Performance of Portals 3.3 on the Cray XT3," Conference Paper, IEEE International Conference on Cluster Computing, September 2005. • Brightwell, Ron, Trammell Hudson, Kevin Pedretti, Keith D Underwood, "Cray's SeaStar Interconnect: Balanced Bandwidth for Scalable Performance," Journal Article, IEEE MIcro, Accepted/Published June 2006.

  16. Selected References for Each Research Area (cont) • MPI • Brightwell, Ron, Keith D Underwood, "Evaluation of an Eager Protocol Optimization for MPI," Conference Paper, Tenth European PVM/MPI User Group Conference, September 2003. • Brightwell, Ron, Sue Goudy, Arun Rodrigues, Keith D Underwood, "Implications of Application Usage Characteristics for Collective Communication Offload," Journal Article, Internation Journal of High-Performance Computing and Networking Special Issue: Design and Performance Evaluation of Group Communication in Parallel and Distributed Systems, Vol. 4, No. 2, Accepted/Published February 2006. • Brightwell, Ron B, Rolf Riesen, Keith D Underwood, "Analyzing the Impact of Overlap, Offload, and Independent Progress for MPI," Journal Article, International Journal of High Performance Computing Applications, Vol. 19, No. 2, pp. 103–117, Accepted/Published August 2005. • I/O • Coloma, Kenin, Alok N. Choudhary, Wei-keng Liao, Lee Ward, Eric Russell, Neil Pundit, “Scalable High-level Caching for Parallel I/O,” IPDPS 2004 • Oldfield, Ron A, David F Kotz, "Improving data access for computational grid applications,"Journal Article, Cluster Computing: The Journal of Networks, Software Tools and Applications, Accepted/Published June 2005. • Coloma, Kenin, Alok N. Choudhary, Avery Ching, Wei-keng Liao, Seung Woo Son, Mahmut T. Kandemir, Lee Ward, “Power and Performance in I/O for Scientific Applications,” IPDPS 2005.

  17. Selected References for Each Research Area (cont) • Reliability, Availability, and Serviceability (RAS) • Laros, James H., III, Lee Ward, Nathan W. Dauchy, Ron B. Brightwell, Trammell B. Hudson, Ruth A. Klundt, "An Extensible, Portable, Scalable Cluster Management Software Architecture,"Conference Paper, IEEE International Conference on Cluster Computing, September 2002. • Laros, James H., III, Lee H. Ward, Nathan W. Dauchy, James Vasak, Ruth A. Klundt, Glenn A. Laguna, Marcus R. Epperson, Jon R. Stearley, "The Cluster Integration Toolkit,"Conference Paper, Cluster World Conference and Expo, June 2003. • Kelly, Suzane M, “A Use Case Model for RAS in an MPP Environment,"Conference Paper, Cray User Group, May 2004. • Stearley, Jon R, "Defining and Measuring Supercomputer Reliability, Availability, and Serviceability (RAS),"Conference Paper, Linux Clusters Institute (LCI05), April 2005. • Laros, James H, III, "A Software and Hardware Architecture for a Modular, Portable, Extensible Reliability Availability and Serviceability System,"Conference Paper, 2nd Workshop on High Performance Computing Reliability Issues in conjunction with the 12th International Symposium on High Performance Computer Architecture, February 2006.

More Related