1 / 10

Extreme Scalability Working Group (XS-WG): Status Update

Extreme Scalability Working Group (XS-WG): Status Update. Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center May 20, 2010. Extreme Scalability Working Group (XS-WG): Purpose.

lel
Download Presentation

Extreme Scalability Working Group (XS-WG): Status Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extreme Scalability Working Group (XS-WG):Status Update Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing CenterMay 20, 2010

  2. Extreme Scalability Working Group (XS-WG): Purpose • Meet the challenges and opportunities of deploying extreme-scale resources into the TeraGrid, maximizing both scientific outputand user productivity. • Aggregate, develop, and share wisdom • Identify and address needs that are common to multiple sites and projects • May require assembling teams and obtaining support for sustained effort • XS-WG benefits from active involvement of all Track 2 sites, BlueWaters, tool developers, andusers. • The XS-WG leverages and combines RPs’ interests to deliver greater value to the computational science community.

  3. XS-WG Participants • Amit Majumdar SDSC, TG AUS AD • Mahin Mahmoodi PSC, Tools lead • Allen Malony Univ. of Oregon(P) • David O’Neal PSC • Dmitry Pekurovsky SDSC • Wayne Pfeiffer SDSC • Raghu Reddy PSC, Scalability lead • Sergiu Sanielevici PSC • Sameer Shende Univ. of Oregon(P) • Ray Sheppard IU • Alan Snavely SDSC • Henry Tufo NCAR • George Turner IU • John Urbanic PSC • Joel Welling PSC • Nick Wright NERSC(P) • S. Levent Yilmaz CSM, U. Pittsburgh(P) • Nick Nystrom PSC, XS-WG lead • Jay Alameda NCSA • Martin Berzins Univ. of Utah(U) • Paul Brown IU • Lonnie Crosby NICS, IO/Workflows lead • Tim Dudek GIG EOT • Victor Eijkhout TACC • Jeff Gardner U. Washington(U) • Chris Hempel TACC • Ken Jansen RPI(U) • Shantenu Jha LONI • Nick Karonis NIU(G) • Dan Katz U. of Chicago • Ricky Kendall ORNL • Byoung-Do Kim TACC • Scott Lathrop GIG, EOT AD • Vickie Lynch ORNL U: user; P: performance tool developer; G: grid infrastructure developer; *: joined XS-WG since last TG-ARCH update

  4. Technical Challenge Area #1:Scalability and Architecture • Algorithms, numerical methods, multicore performance, etc. • Robust, scalable infrastructure (libraries, frameworks, languages) for supporting applications that scale to O(104–6) cores • Numerical stability and convergence issues that emerge at scale • Exploiting systems’ architectural strengths • Fault tolerance and resilience • Contributors • POC: Raghu Reddy (PSC) • Recent and ongoing activities: hybrid performance • Raghu submitted a technical paper to TG10 with Annick Pouquet • Synergy with AUS; work by Wayne Pfeiffer and Dmitry Pekurovsky • Emphasis on documenting & disseminating guidance • Raghu’s work on the HOMB benchmark, Pfeiffer, Pekurovsky, others

  5. Technical Challenge Area #2:Tools • Performance tools, debuggers, compilers, etc. • Evaluate strengths and interactions; ensure adequate installations • Analyze/address gaps in programming environment infrastructure • Provide advanced guidance to RP consultants • Contributors • POC: Mahin Mahmoodi (PSC) • Recent and ongoing activities: reliable tool installations • Nick and Mahin visited NICS in December to give a seminar on performance engineering and tool use • Mahin and NICS staff developed efficient, sustainable proceduresand policies for keeping tool installations up to date and functional • Ongoing application of performance tools at scale to complex applications to ensure their correct functionality; identify & remove problems • Nick, Sameer, Riu Liu, and Dave Cronk co-presented a performance engineering tutorial at LCI10 (March 8, 2010, Pittsburgh)

  6. Collaborative Performance Engineering Tutorials • SC09: Productive Performance Engineering of Petascale Applications with POINT and VI-HPS (November 16, 2009) • Allen Malony and Sameer Shende (Univ. of Oregon), Rick Kufrin (NCSA),Brian Wylie and Felix Wolf (JSC), Andreas Knuepfer andWolfgang Nagel (TU Dresden), Shirley Moore (UTK), Nick Nystrom (PSC) • Addresses performance engineering of petascale, scientific applications with TAU, PerfSuite, Scalasca, and Vampir • Includes hands-on exercises using a Live-DVD containing all of the tools, helping to prepare participants to apply modern methods for locating and diagnosing typical performance bottlenecks in real-world parallel programs at scale • LCI10: Using POINT Performance Tools: TAU, PerfSuite, PAPI, Scalasca, and Vampir (March 8, 2010) • Sameer Shende (Univ. of Oregon), David Cronk (Univ. of Tennessee at Knoxville), Nick Nystrom (PSC), and Rui Liu (NCSA) • Targeted multicore performance issues

  7. Technical Challenge Area #3: Workflow, data transport, analysis, visualization, and storage • Coordinating massive simulations, analysis, and visualization • Data movement between RPs involved in complex simulation workflows; staging data from HSM systems across the TeraGrid • Technologies and techniques for in situ visualization and analysis • Contributors • POC: Lonnie Crosby (NICS) • Current activities • Extreme Scale I/O and Data Analysis Workshop

  8. Extreme Scale I/O and Data Analysis Workshop • March 22-24, 2010, Austin • http://www.tacc.utexas.edu/petascale-workshop/ • Sponsored by the Blue Waters Project, TeraGrid, and TACC • Builds on preceding Petascale Application Workshops • December 2007, Tempe and June 2008, Las Vegas: petascale applications • March 2009, Albuquerque: fault tolerance and resilience; included significant participation from NNSA, DOE, and DoD • 48 participants from 30 institutions • 2 days: presentations + lively discussion • application requirements; filesystems; I/O libraries and middleware; large-scale data management

  9. Extreme Scale I/O and Data Analysis Workshop:Some Observations & Findings • Users are doing parallel I/O using a variety of means • Rolling their own, HDF, netCDF, MPI-IO, ADIOS, …: no one size fits all • Data volumes can exceed the capability of analysis resources • E.g. ~0.5-1.0 TB per wall clock day for certain climate simulations • The greatest complaint was large variability in I/O performance • 2-10× slowdown cited as common; 300× observed • The causes are well understood. How to avoid them is not. • Potential research direction: Extensions to schedulers to support file information from jobs being submitted plus detailed knowledge of parallel filesystem characteristics might enable I/O quality of service and allow effective workload optimization.

  10. Questions?

More Related