1 / 48

Science Gateways and their tremendous potential for science and engineering

Science Gateways and their tremendous potential for science and engineering. CI Days Las Vegas, New Mexico March 10-11, 2008 Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu. Thank you for sharing your beautiful town.

freira
Download Presentation

Science Gateways and their tremendous potential for science and engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Science Gatewaysand their tremendous potential for science and engineering CI Days Las Vegas, New Mexico March 10-11, 2008 Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu

  2. Thank you for sharing your beautiful town CI Days, March 10-11, 2008

  3. Phenomenal Impact of the Internet on Worldwide Communication and Information Retrieval • Implications on the conduct of science are still evolving • 1980’s, Early gateways, National Center for Biotechnology Information BLAST server, search results sent by email, still a working portal today • 1992 Mosaic web browser developed • 1995 “International Protein Data Bank Enhanced by Computer Browser” • 2004 TeraGrid project director Rick Stevens recognized growth in scientific portal development and proposed the Science Gateway Program • Simultaneous explosion of digital information • Analysis needs in a variety of scientific areas • Sensors, telescopes, satellites, digital images and video • #1 machine on Top500 today is 300x more powerful than all combined entries on the first list in 1993 Only 16 years since the release of Mosaic! CI Days, March 10-11, 2008

  4. 1998 Workshop Highlights Early Impact of Internet on Science • Shared access to geographically disperse resources • Assembling the best minds to tackle the toughest problems regardless of location • Tackling the same problems differently, but also tackling different problems • Not only the scope, but the process of scientific investigation is changed • “As the chemical applications and capabilities provided by collaboratories become more familiar, researchers will move significantly beyond current practice to exciting new paradigms for scientific work” • Requirements for future success include: • - Development of interdisciplinary partnerships of chemists and computer scientists • - Flexible and extensible frameworks for collaboratories • - Means to deploy, support, and evaluate collaboratories in the field CI Days, March 10-11, 2008

  5. Rapid Advances in Web Usability • Source: Screen Porch White Paper, The University of Western Ontario (1996) • First generation • Static Web pages • Second generation • Dynamic, database interfaces, cgi • Lacked the ease of use of desktop applications • Third generation • True networked and internetworked applications that enable dynamic two-way, even multi-way, communication and collaboration on the Web. • Remarkable new uses of the Web in the organizational workplace and on the Internet CI Days, March 10-11, 2008

  6. What’s Next?“Prediction is hard. Especially about the future.” Yogi Berra • Scientists of tomorrow are familiar with media we don’t even know about • Not using full power of the internet by any means today • Data and knowledge are handled differently • Linking publications and data referenced in those publications • Annotation, data provenance • Inability to create discourse around a piece of data • Ability to keep up with knowledge generation • 16,000 papers a week into PubMed • 50,000 papers a week in biology • Right now have choice between reading abstract or paper, might add 10 minute author clip • How can science motivate in the way YouTube can? • Streaming video to view simulations, using visual and sound media • Ipods everywhere, but not exploited for science • Web 2.0 • www.scivee.tv • Science was earlier internet adopter, now overtaken by business • Now a big difference between commercial and scientific sites • Noticeable efforts to keep users on commercial sites Source: 5/14/07 interview with Dr. Philip Bourne, Protein Data Bank CI Days, March 10-11, 2008

  7. The convenience of getting scientific material on the web opens doors to better attitudes and understanding of science. November 20, 2006 John B. Horrigan, Associate Director CI Days, March 10-11, 2008 http://www.pewinternet.org/pdfs/PIP_Exploratorium_Science.pdf

  8. NSF (my sponsor) has long recognized the importance of science and technology interactions • Interdisciplinary programs did much to facilitate application-technology integration and develop standard tools • 1997 PACI Program • Marriage of technologists and application scientists • A few groups served as path finders and benefited tremendously • NPACI neuroscience thrust in 1997 leads to Telescience portal and BIRN in 2001 • Information Technology Research (ITR) • NSF Middleware Initiative (NMI) • Plug and play tools so more groups can benefit CI Days, March 10-11, 2008

  9. NSF Continues Its Leadership TodayWhat Will Lead to Transformative Science? • “Virtual environments have the potential to enhance collaboration, education, and experimentation in ways that we are just beginning to explore.” • “In every discipline, we need new techniques that can help scientists and engineers uncover fresh knowledge from vast amounts of data generated by sensors, telescopes, satellites, or even the media and the Internet.” Gateways are a terrific example of interfaces that can support transformative science CI Days, March 10-11, 2008

  10. Flagship $52M CDI Program Launched in 2008 • Cyber-enabled Discovery and Innovation (CDI) is • “NSF’s bold five-year initiative to create revolutionary science and engineering research outcomes made possible by innovations and advances in computational thinking.” • Program announced October 1 • Bold multidisciplinary activities that, through computational thinking, promise radical, paradigm-changing research findings • Far-reaching, high-risk science and engineering research and education agendas that capitalize on innovations in, and/or innovative use of, computational thinking • Partnerships to involve investigators from academe, industry and may include international entities • Growth to $250M recommended by 2012 • Funded across NSF directorates CI Days, March 10-11, 2008

  11. Three Thematic Areas Offer Diversity • From Data to Knowledge • Enhancing human cognition and generating new knowledge from a wealth of heterogeneous digital data • Data mining, visualization, petascale computational power, etc. to assist scientists and engineers extract most important information from the almost infinite amounts of data from sensors, telescopes, satellites, the media, the Internet, surveys, etc. • Understanding Complexity in Natural, Built, and Social Systems • Deriving fundamental insights on systems comprising multiple interacting elements • Simulate and predict complex stochastic or chaotic systems • Explore and model nature’s interactions, connections, complex relations, and interdependencies, scaling from sub-particles to galactic, from subcellular to biosphere, and from the individual to the societal • Building Virtual Organizations • Facilitate creative, cyber-enabled boundary-crossing collaborations, including those with industry and international dimensions • Advance the frontiers of science and engineering and broaden participation in science, technology, engineering and math fields CI Days, March 10-11, 2008

  12. Science Gateways are a Natural Extension of Internet Developments • 3 common types of gateway • Web portal with users in front and services in back • Client server model where application programs running on users' machines (i.e. workstations and desktops) and accesses services • Bridges across multiple grids, allowing communities to utilize both community developed grids and shared grids • Continued rapid changes ahead, must be adaptable, gateways can provide some nimbleness CI Days, March 10-11, 2008

  13. Variety of Gateways Available Today

  14. Gateway Idea Resonates with Scientists • Capabilities provided by the Web are easy to envision because we use them in every day life • Researchers can imagine scientific capabilities provided through a familiar interface • Groups resonate with the fact that gateways are designed by communities and provide interfaces understood by those communities • But also provide access to greater capabilities on the back end without the user needing to understand the details of those capabilities • Scientists know they can undertake more complex analyses and that’s all they want to focus on • But this seamless access doesn’t come for free. It all hinges on very capable developers CI Days, March 10-11, 2008

  15. Tremendous Opportunities Using the Largest Shared Resources - Challenges too! • What’s different when the resource doesn’t belong just to me? • Resource discovery • Accounting • Security • Proposal-based requests for resources (peer-reviewed access) • Code scaling and performance numbers • Justification of resources • Gateway citations • Tremendous benefits at the high end, but even more work for the developers • Potential impact on science is huge • Small number of developers can impact thousands of scientists • But need a way to train and fund those developers and provide them with appropriate tools CI Days, March 10-11, 2008

  16. What is the TeraGrid?A unique combination of fundamental CI components Dedicated high-speed, cross—country network Staff & Advanced Support 20 Petabytes Storage 2 PetaFLOPS Computation Visualization CI Days, March 10-11, 2008

  17. Opportunities and Challenges as a Virtual Organization (TeraGrid) • Full vision of cyberinfrastructure • Data, compute, visualization, workflows • But need to do a better job of representing the capabilities to researchers • Creating prototypes for others to follow • Never underestimate the value in keeping things SIMPLE • Work with top notch people regardless of location • Better for end users • Single request process for all types of resources • Single place for documentation • But must work harder • To sustain momentum in projects • Set a few high-level goals • Clear management structure • Individual responsibility • Project accountability • To provide clarity for users CI Days, March 10-11, 2008

  18. TeraGrid Resources Available for all Domain ScientistsAt no cost to them! • Integrated, persistent, pioneering resources • Significantly improve the ability and capacity to gain new insights into the most challenging research questions and societal problems • Peer-reviewed, proposal-based access • Targeted support available as well • Dedicated staff investment to really make a difference on complex problems • Transformational science • Must have PI commitment • Make lessons learned available for all CI Days, March 10-11, 2008

  19. TeraGrid Usage Specific Allocations Roaming Allocations Compute Cycles Delivered Normalized Units (millions) ~50% Annual Growth 200 100 TeraGrid currently delivers an average of 420,000 cpu-hours per day -> ~21,000 DC every hour Source: Dave Hart (dhart@sdsc.edu) CI Days, March 10-11, 2008

  20. TeraGrid User Community Gateways Growth Target Source: Dave Hart (dhart@sdsc.edu) CI Days, March 10-11, 2008

  21. TeraGrid selects all gateways (F) TeraGrid designs all gateways (F) TeraGrid limits the number of gateways (F) All gateways need TeraGrid funding to exist (F) Any PI can request an allocation and use it to develop a gateway (T) Gateway design is community-developed and that is the core strength of the program (T) TeraGrid staff are alerted to gateway work when a proposal is reviewed or when a community account is requested (T) Limited TeraGrid support can be provided for targeted assistance to integrate an existing gateway with TeraGrid (T) Easy TeraGrid Gateway True and False TestAnswers Provided CI Days, March 10-11, 2008

  22. Common Gateway Needs • Web Services • GT4 deployment, identification of remaining capabilities • Information services, WebMDS • Auditing • Need to retrieve job usage info on production resources • GRAM audit deployed in test mode in September, inclusion in CTSSv4 • Community Accounts • Policy finalized, security approaches being tested by RPs • Attribute-based authentication testing • Allocations • Changes in allocation procedures, the mechanisms used to evaluate science impact, and models for identity management, authentication and authorization that are more tuned to virtual organizations. • Scheduling • Metascheduling RAT • On-demand via SPRUCE framework • Outreach • Talks, Schools/workshops (NVO, GISolve), major project demonstrations (LEAD) • SURA, HASTAC, GEON, CI-Channel, SC, Grace Hopper, MSI-CI2, Lariat, Science Workflows and On Demand Computing for Geosciences Workshop • Primer • Living document in wiki, provides up-to-date overview and instructions for new gateway developers (“how to make your portal a TeraGrid science gateway”) CI Days, March 10-11, 2008

  23. Selected Gateway Highlights nanoHUB Linked Environments for Atmospheric Discovery (LEAD) GridChem Biomedical Informatics Research Network (BIRN) Center for Remote Sensing of Polar Icesheets (CReSIS) CI Days, March 10-11, 2008

  24. Highlights: NanoHub Explosive User Growth • In past 12 months • 60,276 users • 46% from U.S. • 20,738 course downloads • 7503 podcast downloads • 349 online meetings • Full featured gateway • Simulation tools, curricula, multimedia, user contributions, collaborations CI Days, March 10-11, 2008

  25. Linked Environments for Atmospheric Discovery • Providing tools that are needed to make accurate • predictions of tornados and hurricanes • Meteorological data • Forecast models • Analysis and visualization tools • Data exploration and Grid workflow CI Days, March 10-11, 2008

  26. Highlights: LEAD Inspires StudentsAdvanced capabilities regardless of location • A student gets excited about what he was able to do with LEAD • “Dr. Sikora:Attached is a display of 2-m T and wind depicting the WRF's interpretation of the coastal front on 14 February 2007. It's interesting that I found an example using IDV that parallels our discussion of mesoscale boundaries in class. It illustrates very nicely the transition to a coastal low and the strong baroclinic zone with a location very similar to Markowski's depiction. I created this image in IDV after running a 5-km WRF run (initialized with NAM output) via the LEAD Portal. This simple 1-level plot is just a precursor of the many capabilities IDV will eventually offer to visualize high-res WRF output. Enjoy! • Eric” (email, March 2007) CI Days, March 10-11, 2008

  27. Highlights: GridChem’s Client-Server Approach Provides Power and a Rich Feature Set CI Days, March 10-11, 2008 Source: Sudhakar Pamidighantam, NCSA

  28. Biomedical Informatics Research Network (BIRN)‏ BIRN is a National Center for Research Resources (NCRR) initiative aimed at creating a testbed to address biomedical researchers CI Days, March 10-11, 2008 Source: Anthony Kolasny, Johns Hopkins

  29. 4 3 5 TeraGrid Supercomputing Shape Analysis - A Morphometry BIRN Project Data Donor Sites 1 Storage De-identification And upload 2 JHU CIS-KKI Shape Analysis of Segmented Structures MGH Segmentation BWH Visualization Goal: comparison and quantification of structures’ shape and volumetric differences across patient populations CI Days, March 10-11, 2008 Source: Anthony Kolasny, Johns Hopkins

  30. BIRN uses SSHFS to mount TeraGrid filesystems locally CIS has 87TB of local storage. /cis/net lists network drives. 220TB through CIS portal using autofs, samba, smbwebclient. CI Days, March 10-11, 2008 Source: Anthony Kolasny, Johns Hopkins University

  31. National Virtual ObservatoryFacilitating Scientific Discovery • Access to telescope images from around the world • NVO provides access to combined sky surveys • Different views of the same cosmological phenomenon can reveal new insights • New science enabled by enhancing access to data and computing resources • Data correlation • Understanding of physical processes • Identification of new phenomenon • NVO is a set of tools used to exploit the data avalanche CI Days, March 10-11, 2008

  32. CReSIS (Center for Remote Sensing of Ice Sheets) • Awarded CI-TEAM funding to build a Polar Gateway • International Polar Year 2007-2008 • Led by Geoffrey Fox, IU and Linda Hayden, Elizabeth City State • CReSISGrid • Build a TeraGrid Science Gateway • Provide broad-based educational and training activity in Cyberinfrastructure for remote sensing and ice sheet dynamics • Lessons learned in remote data gathering can be applied to fields CI Days, March 10-11, 2008

  33. Computing in Humanities, Arts, and Social Science (CHASS) • Large volumes of image scans of historical documents • Automatic cropping of about 37 TB of images corresponding to Abe Lincoln’s writings • Optical handwritten character recognition of image scans to extract information from Lincoln papers • Georeferencing of airborne imagery with historical and contemporary maps • Automated georeferencing of historical maps from the 18th century with contemporary maps to enable temporal and geospatial browsing of Lincoln papers • Calibrating, ortho-rectifying and georeferencing airborne imagery (~1TB) with historical maps to support better understanding of land use and land cover preservation, sustainability of natural resources and to support land restoration efforts • Gateway interface to on-demand computations related to document comparisons • Scientists and educators might upload multiple historical and contemporary documents including images and text • Cluster documents according to their image and textual characteristics and • Compare them with selected transcribed historical documents already hosted on our site CI Days, March 10-11, 2008

  34. When is a gateway appropriate? • Researchers using defined sets of tools in different ways • Same executables, different input • GridChem, CHARMM • Creating multi-scale workflows • Datasets • Common data formats • National Virtual Observatory • Earth System Grid • Some groups have invested significant efforts here • caBIG, extensive discussions to develop common terminology and formats • BIRN, extensive data sharing agreements • Difficult to access data/advanced workflows • Sensor/radar input • LEAD, GEON CI Days, March 10-11, 2008

  35. How to get started? • Conduct a needs assessment • Should I build a gateway? • Can I use an existing gateway? • What problems am I trying to solve? • SimpleGrid • Building blocks for science gateways • Decide on a software approach • OGCE • TeraGrid staff assistance • Targeted support • Pathways to Broadening Participation in TeraGrid initiative CI Days, March 10-11, 2008

  36. SimpleGrid Objectives • Developed for a “build a gateway in a day” hands-on tutorial • Downloadable code to completely build a simple gateway and understand the underlying moving parts • Use TeraGrid to support domain-specific scientific computing • Develop Grid-enabled applications to access TeraGrid capabilities • Create a GridSphere-based portal as a TeraGrid science gateway interface • Develop JSR-168 compliant portlets to build gateway components • Understand a GISolve-based workflow to steer analyses on TeraGrid • http://www.cigi.uiuc.edu/doku.php/projects/simplegrid CI Days, March 10-11, 2008

  37. Overview CI Days, March 10-11, 2008

  38. Learning Curve • Access TeraGrid resources • Accounts, computing and data storage • Develop Java programs for TeraGrid access • JGlobus Cog programming • A simple visualization module • Build a simple science gateway Grid portal • JSR-168 PortletAPI • GridSphere-based portlet development CI Days, March 10-11, 2008

  39. Open Grid Computing Environment (OGCE) • Portal development framework developed through NSF’s Middleware Initiative (NMI) program • http://www.collab-ogce.org/ogce • Actively supported • Used by several TeraGrid Gateways • Interested in assisting new communities CI Days, March 10-11, 2008

  40. OGCE Goals • To provide easily installable, well-tested software for building Web client and service components that constitute a Grid Computing Environment. • Science Web Portal --> GCE --> Science Gateway • To support developing groups through training, outreach, and divine intervention. • Gateways have many needs that can’t be solved by downloadable software alone. Source: Marlon Pierce, Indiana University

  41. What Is a Web Portal? • Aggregate content from multiple sources into a single display. • Typically consume RSS/Atom news feeds. • More powerful versions these days support Flickr, calendars, games, etc. • Gadgets, widgets • Examples: iGoogle, Netvibes, My Yahoo! Source: Marlon Pierce, Indiana University

  42. A Comprehensive Gateway Architecture User’s Browser Grid Portal Server User’s Desktop Workflow Composer Security Services Gateway Services Workflow/ Application Execution Engine User Data & Metadata Catalogs Application Resource Catalogs Data Services Information Services Job MGMT, Resource Broker And Scheduling Services Security Services Globus-Teragrid “OGSA-Like” Services Source: Marlon Pierce, Indiana University

  43. TeraGrid User Portal Source: Marlon Pierce, Indiana University

  44. North Carolina Bioportal • Principal collaborators: John McGee and Lavanya Ramakrishnan • Features • access to common bioinformatics tools • extensible toolkit and infrastructure • OGCE and National Middleware Initiative (NMI) • leverages emerging international standards • remotely accessible or locally deployable • packaged and distributed with documentation • National reach and community • TeraGrid deployment • Portals hosted at RENCI and NCSA • Education and training Source: Marlon Pierce, Indiana University

  45. Source: Marlon Pierce, Indiana University

  46. TeraGrid Support • Advanced Support for TeraGrid Applications (ASTA) • Dedicated support from TeraGrid available through peer reviews proposal process • Support can range from .25 to 1 FTE from several months up to one year • Carefully detail collaboration so that gateway can be maintained when support concludes • Pathways to Broadening Participation in TeraGrid • One year grant that includes • Evaluation of the readiness of gateways for use in education • Work with 2 new gateway projects that involve underrepresented communities • Short-term faculty and student mentoring and fellowship programs • TeraGrid Campus Champions CI Days, March 10-11, 2008

  47. Tremendous Potential for Gateways • In only 16 years, the Web has fundamentally changed human communication • Science Gateways can leverage this amazingly powerful tool to: • Transform the way scientists collaborate • Streamline conduct of science • Influence the public’s perception of science • Like e-commerce, Science Gateways need to build trust in the infrastructure, tools, and methods that they use • Unlike the public or commercial arena, scientists will be vested in  these gateways • Science Gateways will need to build trust in the organization behind them.  Gateways need to have continuity • High end resources can have a profound impact • The future is very exciting! CI Days, March 10-11, 2008

  48. Thank you for your attention • We look forward to talking with you over the next two days • wilkinsn@sdsc.edu • www.teragrid.org CI Days, March 10-11, 2008

More Related