1 / 30

TeraGrid Science Gateways

TeraGrid Science Gateways. Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu. Today I hope to answer. What are gateways? Why are gateways worth the effort

hanley
Download Presentation

TeraGrid Science Gateways

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu NSF Program Officers, September 10, 2008

  2. Today I hope to answer • What are gateways? • Why are gateways worth the effort • What do they allow scientists to do that they couldn't without gateways? What are some specific examples of this? Why are these examples important? • Impact on education and workforce development • Why sustainable gateways are important • We’ll demonstrate these with individual examples NSF Program Officers, September 10, 2008

  3. May, 2007 Gateway presentation at the NSFHow many of you were here? • 4 hour recap in two slides • Web developments, explosion of digital data are leading to the increased importance of gateways • 16 years after the availability of Mosaic, full impact on science yet to be felt • Many studies point to the impact of the internet on science • Public perception of the value of science increases with their use of science-based websites • Web usage model resonates with scientists • But, need persistency if the Web is to have a profound impact on science NSF Program Officers, September 10, 2008

  4. NSF has a long history in combining science and technology • PACI, ITR, STCs • Leadership continues today • 5 great presentations • Gerhard Klimeck, Purdue, nanoHUB • Dennis Gannon, Indiana University, LEAD • Sudhakar Pamidighantam, UIUC, GridChem • John McGee, RENCI, TeraGrid Bioportal • Shaowen Wang, UIUC, GISolve NSF Program Officers, September 10, 2008

  5. Today, there are approximately 29 gateways using the TeraGrid NSF Program Officers, September 10, 2008

  6. Does a gateway have to use TeraGrid to be a gateway? • No, I just talk about those that do because of my funding • But my position exposes me to a variety of gateways, many • Using high end resources is more work and is not recommended unless it serves a demonstrated need • Gateways are an excellent way to extend the impact of high-end resources • Are they all funded by TeraGrid? • Can TeraGrid claim success for all gateways? • No, we don’t make gateways the gateways you use, we make the gateways you use better NSF Program Officers, September 10, 2008

  7. Tremendous Opportunities Using the Largest Shared Resources - Challenges too! • What’s different when the resource doesn’t belong just to me? • Resource discovery • Accounting • Security • Proposal-based requests for resources (peer-reviewed access) • Code scaling and performance numbers • Justification of resources • Gateway citations • Tremendous benefits at the high end, but even more work for the developers • Potential impact on science is huge • Small number of developers can impact thousands of scientists • But need a way to train and fund those developers and provide them with appropriate tools NSF Program Officers, September 10, 2008

  8. Why are gateways worth the effort? ======= # Full path to executable executable=/users/wilkinsn/tutorial/bin/mcell # Working directory, where Condor-G will write # its output and error files on the local machine. initialdir=/users/wilkinsn/tutorial/exercise_3 # To set the working directory of the remote job, we # specify it in this globus RSL, which will be appended # to the RSL that Condor-G generates globusrsl=(directory='/users/wilkinsn/tutorial/exercise_3') # Arguments to pass to executable. arguments=nmj_recon.main.mdl # Condor-G can stage the executable transfer_executable=false # Specify the globus resource to execute the job globusscheduler=tg-login1.sdsc.teragrid.org/jobmanager-pbs # Condor has multiple universes, but Condor-G always uses globus universe=globus # Files to receive sdout and stderr. output=condor.out error=condor.err # Specify the number of copies of the job to submit to the condor queue. queue 1 • Increasing range of expertise needed to tackle the most challenging scientific problems • How many details do you want each individual scientist to need to know? • PBS, RSL, Condor • Coupling multi-scale codes • Assembling data from multiple sources • Collaboration frameworks #! /bin/sh #PBS -q dque #PBS -l nodes=1:ppn=2 #PBS -l walltime=00:02:00 #PBS -o pbs.out #PBS -e pbs.err #PBS -V cd /users/wilkinsn/tutorial/exercise_3 ../bin/mcell nmj_recon.main.mdl +( &(resourceManagerContact="tg-login1.sdsc.teragrid.org/jobmanager-pbs") (executable="/users/birnbaum/tutorial/bin/mcell") (arguments=nmj_recon.main.mdl) (count=128) (hostCount=10) (maxtime=2) (directory="/users/birnbaum/tutorial/exercise_3") (stdout="/users/birnbaum/tutorial/exercise_3/globus.out") (stderr="/users/birnbaum/tutorial/exercise_3/globus.err") ) NSF Program Officers, September 10, 2008

  9. Not just ease of useWhat can scientists do that they couldn’t do previously? • LEAD - access to radar data • NVO – access to sky surveys • OOI – access to sensor data • PolarGrid – access to polar ice sheet data • SIDGrid – analysis tools • GridChem – developing multiscale coupling • How would this have been done before gateways? NSF Program Officers, September 10, 2008

  10. Gateways can further investments in other projects • Increase access • To instruments, we’ll see an example today • Increase capabilities • To analyze data, we’ll see an example today • Improve workforce development • For underserved populations, we’ll see an example today • Increase outreach • Increase public awareness • Public sees value in investments in large facilities • Slice bread • Pack the kids’ lunch, etc. NSF Program Officers, September 10, 2008

  11. Gateways in the marketplaceKids control telescopes and share images • “In seconds my computer screen was transformed into a live telescopic view” • “Slooh's users include newbies and professional astronomers in 70 countries” • Observatories in the Canary Islands and Chile, Australia coming soon • 5000 images/month since 2003 • Increases public support for investment in these facilities NSF Program Officers, September 10, 2008

  12. Gateways Greatly Expand Access • Almost anyone can investigate scientific questions using high end resources • Not just those in the research groups of those who request allocations • Gateways allow anyone with a web browser to explore • Opportunities can be uncovered via google • My 11-year-old son discovered nanoHUB.org himself while his class was studying Bucky Balls • Fosters new ideas, cross-disciplinary approaches • Encourages students to experiment • But used in production too • Significant number of papers resulting from gateways including GridChem, nanoHUB • Scientists can focus on challenging science problems rather than challenging infrastructure problems NSF Program Officers, September 10, 2008

  13. TeraGrid Pathways Activities • 2 Gateway components • Adapt gateways for educational use by underrepresented communities • GEON – SDSC, Navajo Tech • Teach participants from underrepresented communities how to build gateways • PolarGrid – IU, ECSU NSF Program Officers, September 10, 2008

  14. Navajo Technical College and gateways • Incorporating the use of gateways in their curricula • GEON, GISolve areas of initial interest NSF Program Officers, September 10, 2008

  15. PolarGrid • Cyberinfrastructure Center for Polar Science (CICPS) • Experts in polar science, remote sensing and cyberinfrastructure • Indiana, ECSU, CReSIS • Satellite observations show disintegration of ice shelves in West Antarctica and speed-up of several glaciers in southern Greenland • Most existing ice sheet models, including those used by IPCC cannot explain the rapid changes http://www.polargrid.org/polargrid/images/4/42/C0050-polargrid-big.m4v Source: Geoffrey Fox NSF Program Officers, September 10, 2008

  16. Components of PolarGrid • Expedition grid consisting of ruggedized laptops in a field grid linked to a low power multi-core base camp cluster • Prototype and two production expedition grids feed into a 17 Teraflops "lower 48" system at Indiana University and Elizabeth City State (ECSU) split between research, education and training. • Gives ECSU a top-ranked 5 Teraflop MSI high performance computing system • Access to expensive data • High-end resources for analysis • MSI student involvement Source: Geoffrey Fox NSF Program Officers, September 10, 2008

  17. Recent Gateways using TeraGrid Significantly • SCEC • SIDGrid • CIG NSF Program Officers, September 10, 2008

  18. SCEC using gateway to produce hazard map • PSHA hazard map for California using newly released Earthquake Rupture Forecast (UCERF2.0) calculated using SCEC Science Gateway • Warm colors indicate regions with a high probability of experiencing strong ground motion in the next 50 years. • High resolution map, significant CPU use NSF Program Officers, September 10, 2008

  19. Social Informatics Data Grid • Heavy use of “multimodal” data. • Subject might be viewing a video, while a researcher collects heart rate and eye movement data. • Events must be synchronized for analysis, large datasets result • Extensive analysis capabilities are not something that each researcher should have to create for themselves. http://www.ci.uchicago.edu/research/files/sidgrid.mov NSF Program Officers, September 10, 2008

  20. Social scientists have traditionally worked in isolated labs without the capability to share data or insights with others. • SIDGrid enables a number of capabilities. • Data that is expensive to collect can now be shared with others, increasing the potential for scientific impact. • Geographically distant researchers can collaborate on the analysis of the same data set. • Complex analysis tools and workflows are now available for all to use, rather than having each lab duplicate efforts. • All researchers now have access to the highest quality computational resources • SIDGrid uses TeraGrid resources for computationally-intensive tasks such as media transcoding algorithms for pitch analysis of audio tracks and fMRI image analysis • SIDGrid is unique among social science data archive projects • Focused on streaming data which change over time • Provides the ability to investigate multiple datasets, collected at different time scales, simultaneously • Active users of the SIDGrid system include a human neuroscience group and linguistic research groups from the University of Chicago and the University of Nottingham, UK NSF Program Officers, September 10, 2008

  21. 40 institutional members • 9 foreign affiliates • Researchers request synthetic seismograms for any given earthquake • Allows scientists to understand the ground motion associated with any given earthquake • Requested and received advanced support from TeraGrid NSF Program Officers, September 10, 2008

  22. Advanced support for OCI resourcesIncluding gateway integration • Same peer review process used to request resources • 30,000 CPUs • + 6 months of Nancy • Reviews based on appropriate use of resources, science is not reviewed if already funded • Petascale • Multisite workflows • Gateways • Domain expertise Or someone really talented NSF Program Officers, September 10, 2008

  23. Support is Very Targeted • Start with well-defined objectives • Focus on efficient or novel use of OCI resources • Minimum .25 FTE for months to a year • Enough investment to really understand and help solve complex problems • Must have commitment from PIs • Want to make sure work is incorporated into production codes and gateways • Good candidates for targeted support include: • Large, high impact projects • Ability to influence new communities • Happy for feedback from directorates on important projects • Lessons learned move into training and documentation NSF Program Officers, September 10, 2008

  24. Gateway white paper recommends sustained funding • Gateways can be used for the most challenging problems, but • Scientists won’t rely on something that they are not confident will be around for the duration • We see this with software, but even more so with gateway infrastructure • A sustained gateway program can • Reduce duplication of effort • Sporadic development with many small programs • Increase diversity of end users • Increase skill set diversity of developers • Bring together teams to address the toughest problems NSF Program Officers, September 10, 2008

  25. Recommend 10-year programwith interim reviews • Characteristics of 5-year or less cycles • Build exciting prototypes with input from scientists • Work with early adopters to extend capabilities • Tools are publicized, more scientists interested • Funding ends • Scientists who invested their time to use new tools are disillusioned • Less likely to try something new again • Start again on new short-term project • Need to break this cycle NSF Program Officers, September 10, 2008

  26. Begin with user-driven workshops • What are the most fundamental capabilities in each directorate? • What is the next PDB? nanoHUB? Earth System Grid? • What is the community calling for? • Curated data collections • Which collections? • Simulation, visualization and analysis • Collaboration tools or workspaces • Generation of complex workflows • Access to instruments, sensor or radar data that have limited exposure today • Merit review and assessment will be critical to a long-term program NSF Program Officers, September 10, 2008

  27. When might a gateway be appropriate? • Researchers using defined sets of tools in different ways • Same executables, different input • GridChem, CHARMM • Creating multi-scale or complex workflows • Datasets • Common data formats • National Virtual Observatory • Earth System Grid • Some groups have invested significant efforts here • caBIG, extensive discussions to develop common terminology and formats • BIRN, extensive data sharing agreements • Difficult to access data/advanced workflows • Sensor/radar input • LEAD, GEON NSF Program Officers, September 10, 2008

  28. Tremendous Potential for Gateways • In only 16 years, the Web has fundamentally changed human communication • Science Gateways can leverage this amazingly powerful tool to: • Transform the way scientists collaborate • Streamline conduct of science • Influence the public’s perception of science • Reliability, trust, continuity are fundamental to truly change the conduct of science through the use of gateways • High end resources can have a profound impact • The future is very exciting! NSF Program Officers, September 10, 2008

  29. Thank you for your attention • For more information • www.teragrid.org • wilkinsn@sdsc.edu • Live demonstration of the Neutron Science Gateway • Vickie Lynch, Oak Ridge National Laboratory NSF Program Officers, September 10, 2008

  30. Afternoon Agenda • 2:00 pm Break • (aka recover from this talk, ask questions) • 2:15 pm Track 2 Resources • 2:15-2:35 pm Ranger • Jay Boisseau, Texas Advanced Computing Center • 2:35-2:55 pm Kraken • Bruce Loftis, National Institute for Computational Sciences • 2:55-3:15 pm Track 2c • Nick Nystrom, Pittsburgh Supercomputing Center • 3:15 pm Blue Waters • John Towns, National Center for Supercomputing Applications • 3:30 pm Open discussion with all presenters NSF Program Officers, September 10, 2008

More Related