1 / 22

SAB meeting Jan 8-9 2009 User Support: Current Levels and Methods

SAB meeting Jan 8-9 2009 User Support: Current Levels and Methods. Ralph Roskies Scientific Director PSC January 9, 2009. User Survey Results.

amma
Download Presentation

SAB meeting Jan 8-9 2009 User Support: Current Levels and Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAB meeting Jan 8-9 2009User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

  2. User Survey Results “I consider the user support people to be the most valuable aspect of the TeraGrid because the infrastructure is only as good as the people who run and support it. The experiences we have had with people running and supporting TeraGrid systems has been generally very good.” – Martin Berzins, University of Utah 2008 user survey satisfaction ratings: Helpfulness of TeraGrid user support staff 83.75% Promptness of ticket resolution 82.25% Effectiveness of user support in solving problems 79.5%

  3. User Support Overview • For Q4 2008, 1,118 PIs, 1,413 users charging SUs. • In PY4, TG has ~70 FTE involved with user support. • Managed in concert by the GIG ADs for Operations, User Support Coordination, Advanced Support, User Facing Presence, EOT, and Science Gateways, with guidance from the Science Director. Note- does not include substantial training efforts in HPC University, or work on Common User Environments.

  4. Frontline User Support “ • Ticket Resolution and User Engagement Provide efficient and effective resolution of trouble tickets by TeraGrid-wide sharing of technical information and best practices. Refer issues that require >1 FTE-month to Advanced Support. Provide technical content for online information based on recent problems and user feedback. Provide ongoing personal contacts via the User Champions, Campus Champions, and Pathways programs. Organize the 2008 and 2009 user satisfaction surveys. PY4: 32.3 FTE Entire NCCU physics computational group is very grateful for your prompt action and in helping us to resolve for us this very significant problem. (regarding rapid set up of an account) Branislav Vlahovic ------------------------------ Hey Rick, …you are awesome man, thanks for all the help. (setting up priority queue) Abhijit Ramachandran. Depart. Of Bioengineering. U. Texas at Arlington ------------------------------ …The support staff is extremely patient and helpful. Keshav Pingali, Cornell

  5. Advanced User Support “ • Advanced Support for TeraGrid Applications Provide targeted, >1 FTE-months support to users’ application development and optimization efforts. Responsible for many of the TG Highlights Can be requested as Startup and Supplemental via POPs Sometimes results in co-authorship, co-Pis in proposals, … PY4: 15.25 FTE for ~25 ASTA collaborations • Happy New Year! • I just wanted to say that Roberto has really been working above and beyond the call of duty to get this all going -- we've been getting mail from him on his way to bed and on waking in the morning with his kids... our runs are all going in this morning, and with any luck at all we'll have a nice set of results to discuss at the AAS in San Diego next week. • Thanks for your help! • Mordecai-Mark Mac Low • American Museum of • Natural History (Jan 3) • ------------------------------

  6. Advanced User Support “ • A million thanks to you, and to all the folks … for your help with this BIG job. Without it, it could not have been done. Jacobo Bielak, CMU • ------------------------------ • Our (consultant) sometimes comes up with his own suggestions before we even have a problem! • Steve Gottlieb (Indiana) • Advanced Support for Projects Identify, deploy, harden, optimize and benchmark tools and application packages that benefit large numbers of users in a particular domain or across multiple domains. Examples include molecular dynamics (NAMD, AMBER, GROMACS, CHARMM, LAMMPS and DESMOND) and materials codes (CPMD, VASP, SIESTA, ABINIT), heavily used in TG. PY4: 8.25 FTE for at least 3 cross-TG application infrastructure projects

  7. Advanced User Support - Gateways • Subset of Advanced User Support program • Same request process, just looking for different expertise • Perhaps Grid computing and workflows rather than optimization and scaling • Some may request support in multiple expertise areas • Targeted support a hallmark of the Gateway program early on • As the program was being formed, all gateway developers were guinea pigs, many received advanced support • Today, moving toward a more sustainable production environment PY4: 5.7 FTE for at least 10 SGW projects

  8. Online User Support “ • User Information Presentation Develop and maintain methods to provide users with current, accurate information from across the TeraGrid in a dynamic environment of resources, software and services. PY4: 3.5 FTE • Information Production and Organization Maintain and update documentation content, including a knowledge base of brief answers, with follow-up references, to frequent user questions. PY4: 2.25 FTE for 250 new documents (particular TG site) has exemplary organization for its user guides that simplifies migration to a new machine. All the right information is available for machine parameters, compilation and batch script development. It really reduces the barrier to starting out on a new computer. Steve Gottlieb Indiana University

  9. Support for EOT • Advanced Support for Education, Outreach and Training Prepare and deliver advanced HPC/CI content for the HPC University, as well as for education and outreach activities First 3 quarters of 2008, new contents have included: • Intro to Multi-Core Programming, • TeraGrid New User Training, • Hybrid Programming for Shared-Memory and Clustered SMP Systems, • Introduction to Data Transfer and File Management on the TeraGrid , • Introduction to Parallel Programming on Ranger, Clouds and Web 2.0 PY4: 4.25 FTE

  10. Questions?

  11. Backup Slides

  12. Training via HPC University Program • Support for TeraGrid Training Provide a broad range of live, synchronous and asynchronous training opportunities. Work with external organizations to identify and promote all HPC training resources and opportunities for participation Over the first three quarters of 2008, TeraGrid has provided training for 5,306 people through 75 training events and through access to 22 on-line tutorials www.hpcuniv.org

  13. Support for EOT • Support for Education, Outreach and Training Prepare current and future, and significantly larger and more diverse generations, of STEM practitioners to actively contribute to advancing scientific discovery. Over the first three quarters of 2008, EOT has engaged 8,421 people in 190 EOT events, plus use of 22 on-line tutorials, and engagement in over 80 tours of facilities, and through TG’08. AUS staff have contributed significantly in support of these activities.

  14. Current SGW Collaborations • GIG • GEON and Navajo Technical College • PolarGrid • Computational Infrastructure for Geodynamics • Social Informatics DataGrid • Allegheny General Hospital • TeraDRE • HUB gateways • Asteroseismology • RP • Community Climate System Model (CCSM) • Neutron Science Portal • Earth System Grid

  15. Membership-governed organization • 40 institutional member, 9 foreign affiliates • Supports and promotes Earth science by developing and maintaining software for computational geophysics

  16. How does CIG use the TeraGrid? • Seismograms allow scientists to understand the ground motion • Computationally-intensive simulations run on TeraGrid using an assortment of 3D and 1D earth models produce synthetic seismograms • Necessary input datasets provided via the portal • Daemon (Python, Pyre) constantly polls the web site looking for work to do • GSI-OpenSSH and MyProxy credentials to submit jobs, monitors jobs, transfers output back to portal • status updates to the web site using HTTP POST • Users can download results in ASCII and Seismic Analysis Code (SAC) format • Visualizations include "beachball" graphics depicting the earthquake's source mechanism, and maps showing the locations of the earthquake and the seismic stations using GMT (http://gmt.soest.hawaii.edu/) • Researchers quickly receive results and can concentrate on the scientific aspects of the output rather than on the details of running the analysis on a supercomputer • Future Directions • Parameter explorations • Custom earth models for users

  17. Social Informatics Data Grid • Heavy use of “multimodal” data. • Subject might be viewing a video, while a researcher collects heart rate and eye movement data. • Events must be synchronized for analysis, large datasets result • Extensive analysis capabilities are not something that each researcher should have to create for themselves. http://www.ci.uchicago.edu/research/files/sidgrid.mov

  18. How does SIDGrid use the TeraGrid? • Computationally intensive tasks • Speech, gesture, facial expression, and physiological measurements • Media transcoding for pitch analysis of audio tracks • Once stored in raw form, data streams converted to formats compatible with software for annotation, coding, integration, analysis • fMRI image analysis • Workflows for massive job submissions and data transfers using Virtual Data System (VDS) • Worflows converted to concrete execution plan via Pegasus Grid planner • TeraGrid information service (MDS) • Replica location service (RLS) • DAGMAN and Condor-G/GRAM

  19. Purdue ASTA Activity – TG-MCA05S015 P. A. Cheeseman (aai@purdue.edu) Teragrid Allocations TG-MCA05S015 TG-MCA05T015

  20. Purdue ASTA Activity ... Milestones 2006/02 – Adaptation of parameter sweep to Condor began. 2006/05 – Condor adaptation plan reviewed. Reduce job times to avoid preemption. Improve fault tolerance. Incorporate internal, adaptable, time limits. Incorporate script level steps within program (self-checkpoint, seed iteration, etc.). 2006/08 – Program adaptation complete and adapted code in production (see Slide 4). 2007/06 – Presentation at TG07 (http://www.teragrid.org/events/teragrid07/archive/presentations/wednesday/TG07.PD.12)

  21. Purdue ASTA Activity ... Milestones (cont.) 2007/12/24 – Initial computations complete. ~6M jobs completed. 4M hours delivered. 240+ hours/Hour average delivery rate Peak rates of 2000+ hours/hour. 3,168,459 parameter sets processed (100 seeds per set). 2008/01 – Refinement computations began. Minor code adaptations necessary. Less CPU intensive. 2008/11 – Refinement computations complete. 8M+ inputs processed. Results presently being reviewed By Profs. Deem and Earl.

  22. Purdue ASTA Activity ...

More Related