1 / 13

Extreme Scalability Working Group (XS-WG): Status Update

Extreme Scalability Working Group (XS-WG): Status Update. Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center October 22, 2009. Extreme Scalability Working Group (XS-WG): Purpose.

lilian
Download Presentation

Extreme Scalability Working Group (XS-WG): Status Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extreme Scalability Working Group (XS-WG):Status Update Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing CenterOctober 22, 2009

  2. Extreme Scalability Working Group (XS-WG): Purpose • Meet the challenges and opportunities of deploying extreme-scale resources into the TeraGrid, maximizing both scientific outputand user productivity. • Aggregate, develop, and share wisdom • Identify and address needs that are common to multiple sites and projects • May require assembling teams and obtaining support for sustained effort • XS-WG benefits from active involvement of all Track 2 sites, BlueWaters, tool developers, andusers. • The XS-WG leverages and combines RPs’ interests to deliver greater value to the computational science community.

  3. XS-WG Participants • Amit Majumdar SDSC, TG AUS AD • Mahin Mahmoodi PSC, Tools lead • Allen Malony Univ. of Oregon(P) • David O’Neal PSC • Dmitry Pekurovsky SDSC • Wayne Pfeiffer SDSC • Raghu Reddy PSC, Scalability lead • Sergiu Sanielevici PSC • Sameer Shende Univ. of Oregon(P) • Ray Sheppard IU • Alan Snavely SDSC • Henry Tufo NCAR • George Turner IU • John Urbanic PSC • Joel Welling PSC • Nick Wright SDSC(P) • S. Levent Yilmaz* CSM, U. Pittsburgh(P) • Nick Nystrom PSC, XS-WG lead • Jay Alameda NCSA • Martin Berzins Univ. of Utah(U) • Paul Brown IU • Shawn Brown PSC • Lonnie Crosby NICS, IO/Workflows lead • Tim Dudek GIG EOT • Victor Eijkhout TACC • Jeff Gardner U. Washington(U) • Chris Hempel TACC • Ken Jansen RPI(U) • Shantenu Jha LONI • Nick Karonis NIU(G) • Dan Katz LONI • Ricky Kendall ORNL • Byoung-Do Kim TACC • Scott Lathrop GIG, EOT AD • Vickie Lynch ORNL • U: user; P: performance tooldeveloper; G: grid infrastructure developer; *: joined XS-WG since last TG-ARCH update

  4. Technical Challenge Area #1:Scalability and Architecture • Algorithms, numerics, multicore, etc. • Robust, scalable infrastructure (libraries, frameworks, languages) for supporting applications that scale to O(104–6) cores • Numerical stability and convergence issues that emerge at scale • Exploiting systems’ architectural strengths • Fault tolerance and resilience • Contributors • POC: Raghu Reddy (PSC) • Members: Reddy, Majumdar, Urbanic, Kim, Lynch, Jha, Nystrom • Current activities • Understanding performance tradeoffs in hierarchical architectures • e.g. partitioning between MPI/OpenMP for different node architectures, interconnects, and software stacks • candidate codes for benchmarking: HOMB, WRF, perhaps others • Characterizing bandwidth-intensive communication performance

  5. Investigating the Effectiveness of Hybrid Programming (MPI+OpenMP) • Begun in XS-WG, extended through AUS effort in collaboration with Amit Majumdar • Examples of applications with hybrid implementations: WRF, POP, ENZO • To exploit more memory per task, threading offers clear benefits. • But what about performance? • Prior results are mixed; pure MPI often seems at least as good. • Historically, systems had fewer cores/socket and fewer cores/node than we have today, and far fewer than they will have in the future. • Have OpenMP versions been as carefully optimized? • Reasons to look into hybrid implementations now • Current T2 systems have 8-16 cores per node. • Are we at the tipping point for threading offering a win? If not, is there one, and at what core count, and for which kinds of algorithms? • What is the potential for performance improvement?

  6. Hybrid OpenMP-MPI Benchmark (HOMB) • Developed by Jordan Soyke, while a student intern at PSC, subsequently enhanced by Raghu Reddy • Simple benchmark code • Permits systematic evaluation by • Varying computation-communication ratio • Varying message sizes • Varying MPI vs. OpenMP balance • Allows characterization of performance bounds • Characterizing the potential hybrid performance of an actual application is possible with adequate understanding of its algorithms and their implementations.

  7. Characteristics of the Benchmark • Perfectly parallel with both MPI/OpenMP • Perfectly load balanced • Distinct computation and communication sections • Only nearest-neighbor communication • Currently no reduction operations • No overlap of computation and communication • Can easily vary computation/communication ratio • Current tests are with large messages

  8. Preliminary Results on Kraken:MPI vs. MPI+OpenMP, 12 threads/node • The hybrid approach provides increasing performance advantage as communication fraction increases. • … for the current core count per node. • Non-threaded sections of an actual application would have an Amdahl’s Law effect; these results constitute a best case limit. • Hybrid could be beneficial because of other reasons: • Application has limited scalability because of the decomposition • Application needs more memory • Application has dynamic load imbalance

  9. Technical Challenge Area #2:Tools • Performance tools, debuggers, compilers, etc. • Evaluate strengths and interactions; ensure adequate installations • Analyze/address gaps in programming environment infrastructure • Provide advanced guidance to RP consultants • Contributors • POC: Mahin Mahmoodi (PSC) • Members: Mahmoodi, Wright, Alameda, Shende, Sheppard, Brown, Nystrom • Current activities • Focus on testing debuggers and performance tools at large core counts • Ongoing, excellent collaboration between SDCI tool projects, plus consideration of complementary tools • Submission for a joint POINT/IPM tools tutorial to TG09 • Installing and evaluating strengths of tools as they apply to complex production applications

  10. Collaborative Performance Engineering Tutorials • TG09: Using Tools to Understand Performance Issues on TeraGrid Machines: IPM and the POINT Project (June 22, 2009) • Karl Fuerlinger (UC Berkeley), David Skinner (NERSC/LBNL), Nick Wright (then SDSC), Rui Liu (NCSA), Allen Malony (Univ. of Oregon), Haihang You (UTK), Nick Nystrom (PSC) • Analysis and optimization of applications on the TeraGrid, focusing on Ranger and Kraken. • SC09: Productive Performance Engineering of Petascale Applications with POINT and VI-HPS (Nov. 16, 2009) • Allen Malony and Sameer Shende (Univ. of Oregon), Rick Kufrin (NCSA), Brian Wylie and Felix Wolf (JSC), Andreas Knuepfer and Wolfgang Nagel (TU Dresden), Shirley Moore (UTK), Nick Nystrom (PSC) . • Addresses performance engineering of petascale, scientific applications with TAU, PerfSuite, Scalasca, and Vampir. • Includes hands-on exercises using a Live-DVD containing all of the tools, helping to prepare participants to apply modern methods for locating and diagnosing typical performance bottlenecks in real-world parallel programs at scale.

  11. Technical Challenge Area #3: Workflow, data transport, analysis, visualization, and storage • Coordinating massive simulations, analysis, and visualization • Data movement between RPs involved in complex simulation workflows; staging data from HSM systems across the TeraGrid • Technologies and techniques for in situ visualization and analysis • Contributors • POC: Lonnie Crosby (NICS) • Members: Crosby, Welling, Nystrom • Current activities • Focus on I/O profiling and determining platform-specific recommendations for obtaining good performance for common parallel I/O scenarios

  12. Co-organized a Workshop on Enabling Data-Intensive Computing: from Systems to Applications • July 30-31, 2009, University of Pittsburghhttp://www.cs.pitt.edu/~mhh/workshop09/index.html • 2 days: presentations, breakout discussions • architectures • software frameworks and middleware • algorithms and applications • Speakers • John Abowd - Cornell University • David Andersen - Carnegie Mellon University • MagdaBalazinska - The University of Washington • Roger Barga - Microsoft Research • Scott Brandt - The University of California at Santa Cruz • MootazElnozahy - International Business Machines • Ian Foster - Argonne National labs • Geoffrey Fox - Indiana University • Dave O'Hallaron - Intel Research • Michael Wood-Vasey - University of Pittsburgh • MazinYousif - The University of Arizona • Taieb Znati - The National Science Foundation From R. Kouzes et al., The Changing Paradigm of Data-Intensive Computing, IEEE Computer, January 2009

  13. Next TeraGrid/Blue Waters Extreme-Scale Computing Workshop • To focus on parallel I/O for petascale applications, addressing: • multiple levels of applications, middleware (HDF, MPI-IO, etc.), and systems • requirements for data transfers to/from archives and remote processing and management facilities. • Tentatively scheduled for the week of March 22, 2010, in Austin • Builds on preceding Petascale Application Workshops • December 2007, Tempe: general issues of petascale applications • June 2008, Las Vegas: more general issues of petascale applications • March 2009, Albuquerque: fault tolerance and resilience; included significant participation from NNSA, DOE, and DoD

More Related