HPC University Training Meeting Welcome!! March 26-27, 2008 http://www.teragridforum.org/mediawiki/index.php?title=Training_Implementation
Goals, Objectives and Outcomes • Understand community needs in relation to available resources • Identify competencies and gaps in offerings • Establish mechanisms to disseminate and promote quality resources • Expand breadth and depth of training resources to address community needs • Foster continued information sharing and collaborations • Other?
Perspectives from the Field • Defining HPC University - Scott • HPC RAT recommendations - Laura • Computational Science Competencies – Steve • On-line instruction methodologies - Sandie • Collaborative efforts - Leslie • Petascale training gaps - Shawn • Community engagement strategies - Kathy • Dissemination and Quality assurance – Joiner • Survey feedback - Julia
Defining HPC University • Establishing competencies for skilled HPC educators, researchers, and practitioners • Defining roadmap for acquiring these competencies - from K-12 to researchers • Providing access to high-quality resources • Broadly disseminating information about events, activities, and resources • Cross-cutting among all disciplines • Requires collaboration among multiple agencies and organizations for broad impact • Certificate and degree-granting opportunities
Proposed Discussion Topics • On-line instruction methodologies • Quality assurance – VV&A • Promotion, scaling and dissemination • Petascale training gaps • HPC Roadmap • Collaboration and coordination strategies
Survey Summary • Responses from 12 sites • Audience: current and potential users, undergrads, grads, postdocs, industry, senior researchers, non-traditional communities, professionals, sysadmins
Short-term Goals • Train users in advanced parallel programming (MPI, OpenMP and hybrid) through hands-on workshops, in-depth consulting, and knowledgeable online content to move them into tera- and petascale computing • Help users learn performance tuning, optimization and scaling to peta-scale systems. • ducate users to best use HPC facilities and services • Beginning to advanced courses in parallel programming • Facilitate effective use of high performance computing resources • Disseminate knowledge of tools and application software • Familiarize users with introductory grid computing strategies
Long-term Goals • On-line self paced training, record all live training sessions • Enhance all training events • Develop next generation of top HPC researchers • Prepare users for effective use of future resources • Prepare people to be effective grid users • Broaden participation in supercomputing amongst a variety of scientific disciplines and user communities • Provide more proactive personalized help, supplemented with online resources and infrastructure more capable of responding quickly to user needs.
Selecting Training Topics • Provide both ‘getting started’ courses and advance courses on scaling and optimization • Help users to effectively use facilities: • access machines, batch systems, program to best use the available hardware, transfer data, use mass storage, performance and debugging tools, compile for performance, etc. • We use our ticketing system, suggestions on training surveys and through email, suggestions from researchers and topics our HPC support staff are interested in to select topics • interacting with consulting and applications support staff to identify what users need based on their interactions. Depends on subject matter experts being available to develop content • Feedback from workshop evaluations, new architectures and tools, and their fit with Ralph Regula School of Computational Science curricula and competencies
Selecting Topics (cont.) • Topic selection and level customized for individual • Job management, data management, security, workflow management systems, storage resource managers. • Site admin training. • Perceived need and user requests and of course instructor availability. • We always have an abundance of introductory courses because we continually have new students on campus. Our clientele is mostly graduate students so the need is there.
Evaluating Impact • Workshop feedback forms • Annual User Survey • Suggestion period at the end of each full or multi-day event • Post-event evaluation forms for live events • Optional online form for online tutorials • Number of projects ported to the OSG grid, number of jobs run, number of papers published in which our infrastructure was used to produce results, number of new students/faculty joining our efforts, number of grid computing courses introduced at different institutions as a results of our training. • Assessment responses about participants’ new knowledge & skills that they can apply to their research after the training class
Live vs. Asych • Formal training has been done as f2f events • Local presentations are provided as WebEx meetings and teleconferences. • We have done a few remote-only events (access grid), but they were poorly attended • We try to make all presentation and lab materials available online for reference • Present intermediate to advanced topics at live events and cover introductory topics and how-to programming topics on-line • We hold small-group meetings with discipline specific groups to gain a greater understanding of their computational and scientific needs • Developing simulation-based modules for use as curriculum in the classroom • Propose to capture as many workshops, seminars, presentations and deliver asynch
New Development • Additional sources of info available via the Web • Asynch training via web and NCAST videos of current training • New topics as our users become more sophisticated • Multi-core capabilities • Revising User Information website • Tutorials associated with conferences and workshops put online • A textbook lab training text (with exercises) • More asynchronous through web technology • Meeting with members of non-traditional HPC disciplines to identify requirements to bring them to the resources available, looking for ways to help them transform their science • Introduction to Parallel Programming and MPI” and “Scientific Visualization” using ParaView • Asynchronous webcasts of training classes to broaden participation • Synchronous training class via videoconference (AccessGrid, Polycom)
Major Gaps • Specialized training for computational scientists running on machines vs. computer scientists • Debugging, performance measurement, IO strategies, memory management, project management. • taking a new user from the introductory training sessions to someone who can actually parallelize their thinking thus their code • Multi-core parallelization. • Getting word of HPC libraries to users. • start a new project, including training on what tools/code/methods are easily available, what resource providers are accessible and how to pick one versus another, scaling • Competencies • Application specific guidance; i.e. it would be desired to have help available for applications in biology, chemistry, mathematics, etc. • Basic and advanced parallel programming and software design • Discipline specific parallel programming • Real coursework at the university level
Gaps (cont.) • Workshops are too few and far between. Workshop content is not delivered with a synchronous remote capability for interested participants who cannot physically attend. Workshop content is not captured for post-workshop asynchronous delivery • inconsistency from one system to the next (compiler commands, for example). Each system should be pre-installed with sample code guaranteed to run on that system as well as supplemental training resources specific to that system. • Online tutorials lack specificity to a particular system; sample code does not run on most. • Lack of pro-active personalized support • Scaling up code from tens/hundreds of cores to thousands. • Scaling up code to petascale levels, cores >> 2048. • Methodology: synchronous and asynchronous training • Quality Assurance: accurate, verified, and validated training via synchronous and asynchronous methods • Coherent and guided set of online training tutorials/modules
Gaps (cont.) • C programming • Fortran 90/95/2003 programming • Unix and Linux as applied in HPC • Parallel computing/programming • Distributed & grid • Data analysis & visualization
What do you want to learn? • Ideas for training programs and roadmaps • What training topics offered? Who coordinates and presents? Audiences? Effective methods? Is effective remote training possible and, if so, what technologies are used? Are there opportunities for collaborative training events? • Ideas to improve our training so that users are best served • Interest for joint development of online tutorials and for collaborating on live training • Petascale computing techniques • To identify new and better ways to share materials or develop materials • New and different ways of making training available • Understand the training priorities for other sites, and be sensitive to any political issues
RAT Report Focus • Mentoring, training the trainers, becoming source and editors for CSERD • Provide more details to the training map and the gaps e.g. identify multiple training paths • Training the trainers • Petascale computing aspects • Identification of good parallel computing course • V,V&A of collected training materials. • Targeting underrepresented populations • Capture expertise for asynchronous delivery
Whew! • The scope and need is much broader than anything that can ever be accomplished via the limited funding for training • Setting priorities • Fostering collaboration • Avoid duplication of effort • Share best practices, resources, materials, etc.