1 / 33

TeraGrid Annual Review: Science Gateways

TeraGrid Annual Review: Science Gateways. Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu. Today’s brief presentation. Program background Gateways using TeraGrid most heavily in 2008 2008 activities 2009 plans.

madra
Download Presentation

TeraGrid Annual Review: Science Gateways

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TeraGrid Annual Review:Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu April, 2009

  2. Today’s brief presentation • Program background • Gateways using TeraGrid most heavily in 2008 • 2008 activities • 2009 plans April, 2009

  3. GatewaysA natural result of the impact of the internet on worldwide communication and information retrieval • Implications on the conduct of science are still evolving • 1980’s, Early gateways, National Center for Biotechnology Information BLAST server, search results sent by email, still a working portal today • 1992 Mosaic web browser developed • 1995 “International Protein Data Bank Enhanced by Computer Browser” • 2004 TeraGrid project director Rick Stevens recognized growth in scientific portal development and proposed the Science Gateway Program • Simultaneous explosion of digital information • Growing analysis needs in many, many scientific areas • Sensors, telescopes, satellites, digital images and video • #1 machine on Top500 today is 1000x more powerful than all combined entries on the first list in 1993 Only 17 years since the release of Mosaic! April, 2009

  4. Gateways democratize access to high end resources • Through gateways, almost anyone can investigate scientific questions using the TeraGrid • Not just those in the research groups of principal investigators who request allocations • Gateways foster new ideas, cross-disciplinary approaches • Gateways encourage students to experiment • But gateways enhance productivity of sophisticated scientists too • Significant number of papers resulting from gateways such as GridChem, nanoHUB, Robetta • Scientists can focus on challenging science problems rather than challenging infrastructure problems April, 2009

  5. TeraGrid selects all gateways (F) TeraGrid designs all gateways (F) TeraGrid limits the number of gateways (F) All gateways need TeraGrid funding to exist (F) Any PI can request an allocation and use it to develop a gateway (T) Gateway design is community-developed and that is the core strength of the program (T) TeraGrid staff are alerted to gateway work when a proposal is reviewed or when a community account is requested (T) Limited TeraGrid support can be provided for targeted assistance to integrate an existing gateway with TeraGrid (T) Easy Gateway True and False TestAnswers Provided April, 2009

  6. Today, there are approximately 35 gateways using the TeraGrid April, 2009

  7. Gateway CPU use up five fold in 2008But number of gateway users using the TeraGrid is down • 0.5M hours used on community accounts in 2007 • 2.5M hours used on community accounts in 2008 • Gateways using CPU most heavily • SCEC tera3d • Over 1M hours for hazard map calculations • GridChem • Computational chemistry • Robetta • Protein structure prediction using David Baker’s award winning Rosetta code • Up and coming groups with large awards • SIDGrid, 1M hours • Number of end gateway users is down though • Manual process, some under-counting • GridShib will solve this in 2009 • End of ITR funding for many gateways • Reduced class use by LEAD April, 2009

  8. SCEC Gateway used to produce realistic hazard map • Probabilistic Seismic Hazard Analysis (PSHA) map for California • Created from Earthquake Rupture Forecasts (ERC) • ~7000 ruptures can have 415,000 variations • Warm colors indicate regions with a high probability of experiencing strong ground motion in the next 50 years • Ground motion calculated using full 3-D waveform modeling for improved accuracy • Results in significant CPU use April, 2009

  9. SCEC: Why a gateway? • Calculations need to be done for each of the hundreds of thousands of rupture variations • SCEC has developed the “CyberShake computational platform” • Hardware, software and people which combine to produce a useful scientific result • For each site of interest - two large-scale MPI calculations and hundreds of thousands of independent post-processing jobs with significant data generation • Jobs aggregated to appear as a single job to the TeraGrid • Workflow throughput optimizations and use of SCEC’s gateway “platform” reduced time to solution by a factor of three • Computationally-intensive tasks, plus the need for reduced time to solution is a priority make TeraGrid a good fit Source: S. Callahan et.al. “Reducing Time-to-Solution Using Distributed High-Throughput Mega-Workflows – Experiences from SCEC CyberShake”. April, 2009

  10. GridChem • Understanding molecular structure and function increasingly important in many fields • Materials for electronics, biotechnology, medical devices, pharmaceutical design • GridChem provides reliable infrastructure for computational chemists • NSF Middleware Initiative (NMI) project • Requested and received advanced support from TeraGrid • Addressing issues which benefit all gateways, support team led by IU • Common user environments for domain software access • Standardized licensing • Application performance characteristics • Incorporation of additional data handling tools and data resources • Fault tolerant workflows • Scheduling policies for community users • Remote visualization April, 2009

  11. GridChem: Why a gateway? • Integrates high end resources in a desktop environment • Client-server approach allows work to continue while disconnected (plane flights) • Ability to monitor jobs across sites • Access to individual allocations • In the future, linkage of multi-scale packages • Focus on chemistry research without learning the intricacies of each system • Time limits, nodes, processors, memory, disk space, etc April, 2009

  12. Robetta GatewayProtein structure prediction with an award-winning code • Protein structure prediction is among many important problems in bioinformatics. • The Rosetta code, from the David Baker laboratory, has performed very well at CASP (Critical Assessment of Techniques for Protein Structure Prediction) competitions • Available for use by any academic scientist via the Robetta server • Robetta developers able to use TeraGrid’s existing gateway infrastructure, including community accounts and Globus • This very successful group needed no additional TeraGrid assistance to incorporate TeraGrid resources into the Robetta gateway • Google scholar reports 601 references to the Robetta gateway, including many PubMed publications April, 2009

  13. Robetta: Why a gateway? • Bioinformatics has long history of web-based services • NCBI Blast server from the 1990s • Easy input from the web • Access to top modeling code for all researchers April, 2009

  14. Social Informatics Data GridCollaborative access to large, complex datasets • SIDGrid is unique among social science data archive projects • Focused on streaming data which change over time • Voice, video, images (e.g. fMRI), text, numerical (e.g. heartrate, eye movement) • Provides the ability to investigate multiple datasets, collected at different time scales, simultaneously • Large datasets result • Sophisticated analysis tools http://www.ci.uchicago.edu/research/files/sidgrid.mov April, 2009

  15. SIDGrid: Why a gateway? • Social scientists have traditionally worked in isolated labs without the capability to share data or insights with others. • Data that is expensive to collect can now be shared with others • Geographically distant researchers can collaborate • Complex analysis tools and workflows available for all • Researchers have access to high performance computational resources • TeraGrid used for computationally-intensive tasks such as media transcoding algorithms for pitch analysis of audio tracks and fMRI image analysis Source: Dr. Steven Boker, Notre Dame April, 2009

  16. 2008 was a productive year • Gateway Security Summit – January • Annual report preparation – Feb-March • Major IPP preparation – Spring • Gateway-debug - Spring • Pathways - Summer • Gateway white paper - September • GridShib – all year • Major advances • All gateways to be using attribute-based authentication by September, 2009 • Documentation – all year • Professional re-write in 2008 • Gateway outreach – all year • AAAS, TG08, SC08, UK e-Science, GlobusWorld, e-Science 2008, Geoinformatics 2008, Chinese Academy of Sciences April, 2009

  17. Gateway Security Summit • 3-day meeting at San Diego Supercomputer Center, January 2008 • Co-lead by Nancy Wilkins-Diehr, Abe Singer, SDSC; Jim Marsteller, PSC • 30 attendees • Security representatives from 8 TeraGrid RP sites, Ohio Supercomputer Center, Open Science Grid • Gateway developers representing 11 different projects • Information on how community accounts are used within a gateway provided in advance of the summit • TeraGrid staff from accounting, documentation and attribute-based authentication projects • Best practices workshop • Presentations from gateways using community accounts • Q&A with security staff • Presentations from security staff on how different sites are looking at securing community accounts • Discussion of impact on gateways • Goals – standardized treatment of community accounts across sites, secure, ease of use for gateways • Gateway vulnerability risk assessment conducted in 2008 April, 2009

  18. Project Management ActivitiesAnnual Report, Integrated Project Plan, Work Breakdown Structure • First TG-wide annual report • ~2 months work, mid-Jan to mid-March, 2008 • 400+ page report produced • First TG-wide IPP started April 2008 • Needed level of detail between ~10 page text in annual report and 700 line Work Breakdown Structure • For each objective, an area director provides • Overview, motivation and plans • Background • Benefits to the TeraGrid project and community • Project success Criteria • How TG-wide collaboration will be facilitated • High-level project deliverables • Deliverable, sites involved, key success factor/metric/target for each • Assumptions, risks, constraints • Budget and staffing • 70 page IPP results April, 2009

  19. Intensive TeraGrid use by LEADMotivates closer look at reliability under load • A successful gateway program may result in bursty loads on the TeraGrid • We must be able to handle this successfully • 6-month intensive debugging effort • Weekly telecons with sys admins, grid software developers, gateway developers • Having the right people who could solve problems on the spot was key • Clear wiki documentation of problems • Rather than emails and telephone calls that were difficult to track • Testbed for debugging • Inca tests that simulated gateway behavior • Tests run more frequently on the testbed, results reviewed weekly • Code improvements made to both LEAD gateway and Globus • Work transitioned to ops-wg for routine monitoring • Track 2 awards for testbeds and interactive access can help Overloaded gridftp servers April, 2009

  20. TeraGrid Pathways Activities • “The mission of TeraGrid Pathways is to "scope, build, document, and evaluate extensible, scalable, and comprehensive pathways to Terascale/Petascale research, scholarship, education, and innovation that will assist faculty and students associated with under-served institutions to effectively utilize the TeraGrid resources and services to support their research, scholarship, and learning." • 2 Gateway components • Adapt gateways for educational use by underrepresented communities • GEON – SDSC, Navajo Tech • Teach participants from underrepresented communities how to build gateways • PolarGrid – IU, ECSU April, 2009

  21. Gateway White PaperPersistence required to impact scientists’ behavior • Gateways can be used for the most challenging problems, but • Scientists won’t rely on something that they are not confident will be around for the duration • We see this with software, but even moreso with gateways • Characteristics of 5-year or less cycles • Build exciting prototypes with input from scientists • Work with early adopters to extend capabilities • Tools are publicized, more scientists interested • Funding ends • Scientists who invested their time to use new tools are disillusioned • Less likely to try something new again • Start again on new short-term project • Need to break this cycle April, 2009

  22. Persistent gateways increase scientific productivity • A sustained gateway program can • Allow researchers to focus on science • Reduce duplication of effort • Sporadic development with many small programs • Increase diversity of end users • Increase skill set diversity of developers • Bring together teams to address the toughest problems • Program might begin with user-driven workshops to identify fundamental capabilities across domains • What are communities calling for? • Curated data collections • Which collections? • Simulation, visualization and analysis • Collaboration tools or workspaces • Generation of complex workflows • Access to instruments, sensor or radar data that have limited exposure today • What is the next PDB? nanoHUB? Earth System Grid? • What would cause a fundamental shift in the way science was conducted if it existed persistently? • Merit review and assessment will be critical to a long-term program April, 2009

  23. Attribute-based AuthenticationGridShib • High-level gateway objective, major focus for 2009 • We will be able to routinely count end gateway users, who will total a significant portion of total TeraGrid users • How to achieve this? • A unique identifier for each end gateway user per community account must exist in TGCDB • Gateways will need to transmit and TGCDB will need to receive this additional identifier through any job submission mechanism • Attribute-based authentication in production and easy to use • This year’s goal • All gateways sending attributes with jobs by September, 2009 • We are well on the way to meeting this goal • Thanks to Tom Scavo, Jim Basney, NCSA April, 2009

  24. Documentation gets a makeover • www.teragrid.org/gateways • Clear display of complex information • PI and developer pages • Should I build a gateway? How? What should it include? Should I use TeraGrid? • Gateway-in-a-day • Success stories • Recommended software area • Can be updated by anyone April, 2009

  25. What are we working toward?Previously stated PY4-5 goals • TeraGrid integration will be straightforward for new and existing gateway developers • Great documentation in 2008 • Current focus – streamlined community accounts • There will be a set of easy to discover general services provided by and for Gateways • In progress • The targeted support program will adapt to changing needs • Done • We will be able to routinely count end gateway users • In progress, done by September 2009 • There will be avenues for sustained Gateway funding April, 2009

  26. Exciting developments planned for 2009 • Add a gateway component to your social networking page • Increase gateway software visibility to end users • AMP gateway ingests data from recent Kepler satellite launch • Use of TeraGrid for overflow processing by other grids through virtual machine technology • Attribute-based authentication in production • Standardized community account treatment • New groups interested in developing gateways • Center for Analytical Ultracentrifugation of Macromolecular Assemblies • J. Craig Venter Institute • Roche Pharmaceuticals • Support from Vice President of engineering of the 454 Life Sciences division April, 2009

  27. Gateways and iGoogle • OpenSocial • Common API for social applications • iGoogle, MySpace, LinkedIn, Friendster • OpenSocial gadgets added to the Open Life Sciences Gateway • Gateway components can be staged on social networking pages • Greatly increases the flexibility of a gateway • Attractive to the next generation of scientists and builds on existing social networking collaboration support • To add an OLSG gadget into your igoogle page: • Go to www.google.com/ig/sandbox and Sign in • Go to http://lsgw.uc.teragrid.org:8080/gridsphere/ • Click +Google icons at the bottom of the page to add gadgets to your iGoogle page • Click the link of “Personalize the gadget” • Use account/password test/testit to login in April, 2009

  28. Thank you for your attentionQuestions welcome • www.teragrid.org/gateways • Nancy Wilkins-Diehr, wilkinsn@sdsc.edu April, 2009

  29. Using the TeraGrid to Understand the Stars and Search for Planets: Problem: The oscillations modes of stars give clues to the stellar interior. • Accurate modeling of these oscillations can tell us the size, composition and age of stars, and in turn help interpret the data returned by NASA’s Kepler Mission in the hunt for Earth-like extrasolar planets. • Solution: NCAR is working with the international science community to develop a TeraGrid science gateway to fit oscillation data with stellar models to determine these parameters automatically for large numbers of stars. • Key Elements: international science impact, gateway technology. The turbulent solar atmosphere captured with NASA’s TRACE satellite is overlaid with a model of one of the millions of oscillation modes that allow scientists to deduce the Sun’s hidden internal structure and dynamics. Similar measurements are now being made for other solar-type stars. Source: Dr. Rich Loft, NCAR April, 2009

  30. New Pyrosequencing TechnologyThe future of individualized medicine • Allegheny General Hospital • Early adopter of DNA sequencing platform from 454 Life Science Inc./Roche • Significant upgrade to this sequencing platform • Massively parallel, clone-free DNA pyrosequencing technology well suited for a variety of applications • Improved sequencing inline with NIH goal of making individual DNA sequencing available for $1k http://www.roche-applied-science.com/publications/multimedia/genome_sequencer/flx_presentation/wbt.htm April, 2009

  31. Purchase and maintenance of home cluster prohibitive for smaller sites • "We wholeheartedly support your idea of using TeraGrid resources to process data generated by our system. Please let us know what we can do to help you achieve your goal - we have permissive licensing arrangements which will not stand in the way of your project. In addition, if the [image processing and signal processing] software needs some customization (though I am quite certain that will not be required), we are prepared to provide assistance in this respect." April, 2009

  32. Analytical Ultracentrifugation Data • The Center for Analytical Ultracentrifugation of Macromolecular Assemblies, UT Health Science Center • Solution-state characterization of biological macromolecules and macromolecular assemblies by means of analytical ultracentrifugation. • Services provided to both academia and industry • Integrated data editing and analysis environment • Portable graphical user interface. • Beowulf module for Monte Carlo analysis • MySQL database backend • 32 active institutions • International collaboration • Technische University of Munich • Juelich Supercomputing Center April, 2009

  33. J. Craig Venter Institute2 independent DAC queries • Portal to NIAID Bioinformatics Resource Centers • Startup allocation awarded • Computation services such as Annotation, Homology Search, etc to the 4 bioinformatics resource centers, or to members of their research community • Developed infrastructure that should easily port to Teragrid for large scale parallel execution • Automated Proteogenomic Annotation for Prokaryotic Genomes • Re-annotation of all publicly available prokaryotic MS/MS datasets. April, 2009

More Related