1 / 46

Building Science Gateways

Building Science Gateways. Marlon Pierce Community Grids Laboratory Indiana University. Tutorial Overview. There’s More. Slides and Demo Site. Tutorial slides are available from http://www.collab-ogce.org/ogce/index.php/Tutorials

gavery
Download Presentation

Building Science Gateways

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Science Gateways Marlon Pierce Community Grids Laboratory Indiana University

  2. Tutorial Overview

  3. There’s More

  4. Slides and Demo Site • Tutorial slides are available from http://www.collab-ogce.org/ogce/index.php/Tutorials • We run a permanent demo portal at https://community.ucs.indiana.edu:8443/gridsphere/ • Also aliased as https://ogceportal.iu.teragrid.org:8443/gridsphere • Portal accounts train01-train30 have been created for the workshop. Password is the same as the account name. • Also train31-train49 from TG08 workshop. • We also have TeraGrid training accounts with names train01-train30 that can be used to retrieve TG proxy credentials. • These should be active all week. • You can also log into the TeraGrid User Portal with this account and the secret password.

  5. Concept #1: Web Portal • Web container that aggregates content from multiple sources into a single display. • “Start Pages” • Typically consume RSS/Atom news feeds. • More powerful versions these days support Flickr, calendars, games, etc. • Gadgets, widgets • Examples: iGoogle, Netvibes, My Yahoo!

  6. Gadget RSS Feeds

  7. Concept #2: Grid Computing • Grid computing software is designed to integrate large supercomputing facilities. • TeraGrid, Open Science Grid, EGEE, etc. • This is done via network services • Software providers in the US include Globus and Condor • Key Service Components (and example services) • Authentication and authorization framework (MyProxy) • Remote process access and control (GRAM, Condor) • Remote file, I/O access (GridFTP, SRB, RFT) • Additional Services • Information services, replica management, database federation, storage management, schedulers, etc. • Example Grid Software Stacks: CTSS and VDT • For TeraGrid and Open Science Grid, respectively • Being pushed by Cloud Computing (Amazon, Google, Microsoft, others)

  8. Science Portals and Gateways • Science Gateways adapt Web portal technology to build user interfaces to the Grid. • Science portals resemble standard portals, but must also • Support access to computing and storage resources. • Allow users remote, direct access to these resources. • You often want to run applications and access data that you own directly. • Provide access to science applications and data sets. • And we must provide value added services as well as user interfaces.

  9. Example Science Gateways • Many listed here: • http://www.teragrid.org/programs/sci_gateways/ • Cover many different scientific fields: • Atmospheric science, geophysics, computational chemistry, bioinformatics, etc • See also GCE08 workshop at SC08 and earlier proceedings • http://www.collab-ogce.org/gce08/index.php/Main_Page • GCE05-07 also linked.

  10. TeraGrid Science Gateways Program Slides courtesy of Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu

  11. Today, there are approximately 29 gateways using the TeraGrid

  12. Does a gateway have to use TeraGrid to be a gateway? • No, but the TeraGrid does fund the development and support of these gateways • Using high end resources is more work and is not recommended unless it serves a demonstrated need • Gateways are an excellent way to extend the impact of high-end resources • Are they all funded by TeraGrid? • Can TeraGrid claim success for all gateways? • No, we don’t make the gateways you use, we make the gateways you use better • TeraGrid does fund a small number of developers to provide advanced support. • More later.

  13. Why are gateways worth the effort? ======= # Full path to executable executable=/users/wilkinsn/tutorial/bin/mcell # Working directory, where Condor-G will write # its output and error files on the local machine. initialdir=/users/wilkinsn/tutorial/exercise_3 # To set the working directory of the remote job, we # specify it in this globus RSL, which will be appended # to the RSL that Condor-G generates globusrsl=(directory='/users/wilkinsn/tutorial/exercise_3') # Arguments to pass to executable. arguments=nmj_recon.main.mdl # Condor-G can stage the executable transfer_executable=false # Specify the globus resource to execute the job globusscheduler=tg-login1.sdsc.teragrid.org/jobmanager-pbs # Condor has multiple universes, but Condor-G always uses globus universe=globus # Files to receive sdout and stderr. output=condor.out error=condor.err # Specify the number of copies of the job to submit to the condor queue. queue 1 • Increasing range of expertise needed to tackle the most challenging scientific problems • How many details do you want each individual scientist to need to know? • PBS, RSL, Condor • Coupling multi-scale codes • Assembling data from multiple sources • Collaboration frameworks #! /bin/sh #PBS -q dque #PBS -l nodes=1:ppn=2 #PBS -l walltime=00:02:00 #PBS -o pbs.out #PBS -e pbs.err #PBS -V cd /users/wilkinsn/tutorial/exercise_3 ../bin/mcell nmj_recon.main.mdl +( &(resourceManagerContact="tg-login1.sdsc.teragrid.org/jobmanager-pbs") (executable="/users/birnbaum/tutorial/bin/mcell") (arguments=nmj_recon.main.mdl) (count=128) (hostCount=10) (maxtime=2) (directory="/users/birnbaum/tutorial/exercise_3") (stdout="/users/birnbaum/tutorial/exercise_3/globus.out") (stderr="/users/birnbaum/tutorial/exercise_3/globus.err") )

  14. Not just ease of useWhat can scientists do that they couldn’t do previously? • LEAD - access to radar data • NVO – access to sky surveys • OOI – access to sensor data • PolarGrid – access to polar ice sheet data • SIDGrid – analysis tools • GridChem – developing multiscale coupling • How would this have been done before gateways?

  15. Gateways Greatly Expand Access • Almost anyone can investigate scientific questions using high end resources • Not just those in the research groups of those who request allocations • Gateways allow anyone with a web browser to explore • Opportunities can be uncovered via google • Nancy’s 11-year-old son discovered nanoHUB.org himself while his class was studying Bucky Balls • Fosters new ideas, cross-disciplinary approaches • Encourages students to experiment • But used in production too • Significant number of papers resulting from gateways including GridChem, nanoHUB • Scientists can focus on challenging science problems rather than challenging infrastructure problems

  16. TeraGrid Pathways Activities • Program funding to involve MSI communities • 2 Gateway components • Adapt gateways for educational use by underrepresented communities • GEON – SDSC, Navajo Tech • Teach participants from underrepresented communities how to build gateways • PolarGrid – IU, ECSU

  17. Navajo Technical College and gateways • Incorporating the use of gateways in their curricula • GEON, GISolve areas of initial interest

  18. PolarGrid • Cyberinfrastructure Center for Polar Science (CICPS) • Experts in polar science, remote sensing and cyberinfrastructure • Indiana, ECSU, CReSIS • Satellite observations show disintegration of ice shelves in West Antarctica and speed-up of several glaciers in southern Greenland • Most existing ice sheet models, including those used by IPCC cannot explain the rapid changes http://www.polargrid.org/polargrid/images/4/42/C0050-polargrid-big.m4v Source: Geoffrey Fox

  19. Source: Geoffrey Fox • Components of PolarGrid • Expedition grid consisting of ruggedized laptops in a field grid linked to a low power multi-core base camp cluster • Prototype and two production expedition grids feed into a 17 Teraflops "lower 48" system at Indiana University and Elizabeth City State (ECSU) split between research, education and training. • Gives ECSU a top-ranked 5 Teraflop MSI high performance computing system • Access to expensive data • High-end resources for analysis • MSI student involvement

  20. Recent Gateways using TeraGrid Significantly • SCEC • SIDGrid • CIG

  21. SCEC using gateway to produce hazard map • PSHA hazard map for California using newly released Earthquake Rupture Forecast (UCERF2.0) calculated using SCEC Science Gateway • Warm colors indicate regions with a high probability of experiencing strong ground motion in the next 50 years. • High resolution map, significant CPU use

  22. Social Informatics Data Grid • Heavy use of “multimodal” data. • Subject might be viewing a video, while a researcher collects heart rate and eye movement data. • Events must be synchronized for analysis, large datasets result • Extensive analysis capabilities are not something that each researcher should have to create for themselves. http://www.ci.uchicago.edu/research/files/sidgrid.mov

  23. Social scientists have traditionally worked in isolated labs without the capability to share data or insights with others. • SIDGrid enables a number of capabilities. • Data that is expensive to collect can now be shared with others, increasing the potential for scientific impact. • Geographically distant researchers can collaborate on the analysis of the same data set. • Complex analysis tools and workflows are now available for all to use, rather than having each lab duplicate efforts. • All researchers now have access to the highest quality computational resources • SIDGrid uses TeraGrid resources for computationally-intensive tasks such as media transcoding algorithms for pitch analysis of audio tracks and fMRI image analysis • SIDGrid is unique among social science data archive projects • Focused on streaming data which change over time • Provides the ability to investigate multiple datasets, collected at different time scales, simultaneously • Active users of the SIDGrid system include a human neuroscience group and linguistic research groups from the University of Chicago and the University of Nottingham, UK

  24. 40 institutional members • 9 foreign affiliates • Researchers request synthetic seismograms for any given earthquake • Allows scientists to understand the ground motion associated with any given earthquake • Requested and received advanced support from TeraGrid

  25. Talks at E-Science • See the PSE Workshop: http://escience2008.iu.edu/workshops/innovative/index.shtml • Friday, 10:00 am-4:30 pm • Nancy Wilkins-Diehr will have more to say about some of these gateways. • See also Rich Wolski’s keynote on cloud computing. Next generation gateways will (need to) support cloud computing and virtual machine-based backends. • Purdue’s NanoHUB and HUB0 software have done this for some time.

  26. Getting Started Building a Gateway Should you? And how can you get help?

  27. When might a gateway be appropriate? • Researchers using defined sets of tools in different ways • Same executables, different input • GridChem, CHARMM • Creating multi-scale or complex workflows • Datasets • Common data formats • National Virtual Observatory • Earth System Grid • Some groups have invested significant efforts here • caBIG, extensive discussions to develop common terminology and formats • BIRN, extensive data sharing agreements • Difficult to access data/advanced workflows • Sensor/radar input • LEAD, GEON

  28. Advanced support for OCI resourcesIncluding gateway integration • Same peer review process used to request resources • 30,000 CPUs • + 6 months of Nancy • Reviews based on appropriate use of resources, science is not reviewed if already funded • Petascale • Multisite workflows • Gateways • Domain expertise Or someone really talented

  29. Support is Very Targeted • Start with well-defined objectives • Focus on efficient or novel use of OCI resources • Access to minimum 0.25 FTE for months to a year • Enough investment to really understand and help solve complex problems • Must have commitment from PIs • Want to make sure work is incorporated into production codes and gateways • Good candidates for targeted support include: • Large, high impact projects • Ability to influence new communities • Lessons learned move into training and documentation

  30. GATEWAYS UNDER THE HOOD

  31. My 2002 “octopus” SOA diagram, from the archives. Browser Interface HTTP(S) Portlets + Client Stubs SOAP/HTTP WSDL WSDL WSDL WSDL WSDL WSDL WSDL WSDL WSDL DB Service Job Sub/Mon And File Services Visualization Service JDBC DB DB Operating and Queuing Systems Host 1 Host 2 Host 3

  32. Terminology • Portlet: this is a standard Java component that generates HTML and can also act as a client to a remote service. • Lives in a portal container. • I will also use this term generically. • Web Service: a remotely invoke-able function on the Internet. • SOAP: the XML message envelop for carrying commands over HTTP. • WSDL: describes the service’s API in XML. • REST: A variation of this approach. • Lots more info: http://grids.ucs.indiana.edu/ptliupages/presentations/I590WebService.ppt

  33. But Why? • Three-tiered Service Oriented Architecture is the network equivalent of the the famous Model-View-Controller design pattern. • View: the user interface components. • Controller: Web service middleware • Model: the backend resources. • Independence of tiers gives flexibility • Services can be reused with alternative user interfaces • Workflow composers like Taverna, Xbaya, Kepler • User interfaces can work with different service implementations. • Drawback: reliability and robustness are issues.

  34. Two Approaches to the Middle Tier Fat Client Thin Client Portal Comp. Portal Comp. Grid Client HTTP + SOAP Web Service Grid Protocol (SOAP) Grid Client Grid Protocol (SOAP) Grid Service Grid Service Backend Resource Backend Resource

  35. Managing Scientific Workflows A Preview for Suresh’s Talks and Demos

  36. Scientific Workflows • Portal interfaces encode scientific use cases. • If you have a rich set of services, it is a lot of work to make portlets for all possible use cases. • And power users will have always want something more. • Example: our CICC project has dozens of chemical informatics Web services. • http://www.chembiogrid.org.wiki • Workflow composers can simplify this. • Allow users to encode and execute their own use cases.

  37. Web Services and Workflows • Perform a similarity search on the NIH DTP Human Tumor data. • Filter the results based on Pharmacokinetic properties (FILTER) • Convert to 3D (OMEGA) • Docking into a pre-defined protein (FRED) • Visualize (JMOL). Taverna workflow connects remote services.

  38. OGCE’s XBaya Workflow Composer

  39. Updating the Octopus Browser Interface HTTP(S) Social Gadgets+AJAX RSS,JSON/HTTP REST REST REST REST REST REST WSDL REST REST DB Service Job Sub/Mon And File Services Visualization Service JDBC DB DB Operating and Queuing Systems Host 1 Host 2 Host 3

  40. Sample Grid Gadgets in iGoogle

  41. Microformats, KML, and GeoRSS feeds used to deliver SAR data to multiple clients.

  42. More Information • Contact me: mpierce@cs.indiana.edu • See what I’m up to: http://communitygrids.blogspot.com/ • OGCE software: http://collab-ogce.org/ • Lots of people worked on all of these.

  43. Tremendous Opportunities Using the Largest Shared Resources - Challenges too! • What’s different when the resource doesn’t belong just to me? • Resource discovery • Accounting • Security • Proposal-based requests for resources (peer-reviewed access) • Code scaling and performance numbers • Justification of resources • Gateway citations • Tremendous benefits at the high end, but even more work for the developers • Potential impact on science is huge • Small number of developers can impact thousands of scientists • But need a way to train and fund those developers and provide them with appropriate tools

  44. Gateways can further investments in other projects • Increase access • To instruments • Increase capabilities • To analyze data • Improve workforce development • For underserved populations • Increase outreach • Increase public awareness • Public sees value in investments in large facilities

More Related