Science Gateways on the TeraGrid Nancy Wilkins-Diehr Area Director for Science Gateways San Diego Supercomputer Center firstname.lastname@example.org
Today’s Outline • What are Gateways? • Why TeraGrid and Gateways? • Initial Strategy • Implementation Details • Issues to address when using TG • Future growth – Gateways and still more gateways
Building a distributed system of unprecedented scale 40+ teraflops compute 1+ petabyte storage 10-40Gb/s networking Creating a unified user environment across heterogeneous resources User software environment, User support resources. Created an initial community of over 500 users, 80 PI’s. Integrating new partners to introduce new capabilities Additional computing, visualization capabilities New types of resources- data collections, instruments The TeraGrid Strategy Make it extensible!
TeraGrid Objectives • DEEP Science: Enabling Terascale Science • Make Science More Productive through an integrated set of very-high capability resources. • WIDE Impact: Empowering Communities • Bring TeraGrid capabilities to the broad science community. • OPEN Infrastructure, OPEN Partnership • Provide a coordinated, general purpose, reliable set of services and resources.
Science GatewaysA new initiative for the TeraGrid Workflow Composer • Increasing investment by communities in their own cyberinfrastructure, but heterogeneous: • Resources • Users – from expert to K-12 • Software stacks, policies • Science Gateways • Provide “TeraGrid Inside” capabilities • Leverage community investment • Three common forms: • Web-based Portals • Application programs running on users' machines but accessing services in TeraGrid • Coordinated access points enabling users to move seamlessly between TeraGrid and other grids.
Science Gateway Examples As well as additional gateway projects that have joined us or are planning to join, including… University of Buffalo, BIRN ,NEES, GEON, Several NCAR projects, Cornell (large data collections), LSU (coastal modeling), IU Hydra Portal
National Virtual ObservatoryFacilitating Scientific Discovery • Astronomy is increasingly a data-rich science • New science enabled by enhancing access to data and computing resources • Ease of use in locating, retrieving, and analyzing data from archives and catalogs worldwide • NVO is a set of tools used to exploit the data avalanche
NanoHUB Middleware infrastructure Science Gateway Campus Grids Purdue, GLOW Workspaces Capability Computing Grid Middleware VM nanoHUB VO Virtual backends Virtual Cluster with VIOLIN Capacity Computing Research apps
The RENCI Bioportal • Supports • distributed collaboration • multi-site data access • computational tools for local or remote execution • Grid and cluster interoperability • Will provides access to • common sequence and protein structure databases • over 140 software packages • Tutorial with John McGee Friday afternoon!
Linked Environments for Atmospheric DiscoveryLEAD • Providing tools that are needed to make accurate • predictions of tornados and hurricanes • Data exploration and Grid workflow
NCAR Earth Systems Grid • ESG originally a distributed data management/access system but it has evolved into more. • User registration, authorization controls, and metrics tracking • CCSM model source, initialization datasets, post-processing codes, and analysis and visualization tools. • Prototypes of model- submission environments, eventually real-time tracking of model status along with references to available output datasets. • "science gateway" for climate research. • Expect to see more model runs at higher- resolution and with greater component scope.
So how will we meet all these needs? • With RATS! (Requirements Analysis Teams) • Collection, analysis and consolidation of requirements to jump start the work • Interviews with 10 Gateways • Common user models, accounting needs, scheduling needs • Summarized requirements for each TeraGrid working group • Accounting, Security, Web Services, Software • Areas for more study identified • Primer outline for new Gateways in progress • And milestones
Accounting Support for accounts with differing capabilities Ability to associate compute job to a individual portal user Scheme for portal registration and usage tracking Support for OSG’s Grid User Management System (GUMS) Dynamic accounts Security Community account privileges Need to identify human responsible for a job for incident response Acceptance of other grid certificates TG-hosted web servers, cgi-bin code Web Services Initial analysis completed 12/05 Some Gateways (LEAD, Open Life Sciences) have immediate needs Many will build on capabilities offered by GT4, but interoperability could be an issue Web Service security Interfaces to scheduling and account management are common requirements Software Interoperability of software stacks between TG and peer grids Software installations for gateways across all TG sites Community software areas Management (pacman, other options) Implications for TeraGrid working groups
Current areas of development • GT4 audit capabilities • Web services • Portal technology evaluation • Gateways primer and enhanced documentation • Process for working with new gateways
GT4 Auditing (Proposal)Stuart Martin, ANL TeraGrid Resource Provider (RP) -No Changes required to AMIE-DAI provides abstraction and link to both audit and accounting GT4 Java Container Core Audit Table Core Deleg Audit Table Delegation RFT Audit Table RFT Client / Gateway ** sudo RM adapter Create Job Get EPR Control Jobwith EPR MJFS Resource Manager RM log - Query Using Grid JID SEG MEJS ** GRAM Audit Table RM Accounting - Reply with Accounting record User Job(s) OGSA DAI Local AMIE Accounting ** Locally convert EPR to Grid JID AMIE upload Central TG Accounting DB
GT4 Auditing Pre-WS GRAM TeraGrid Resource Provider (RP) • -Each GRAM audit file is a comma separated • list of fields; for easy DB uploading • -Dir/Permissions can be handled the same as • gx-map files • Same DAI interface providing audit and • accounting info for WS GRAM can be used for pre-ws gram Gatekeeper Create and Control Job Job Manager ** Client / Gateway ** RM adapter GRAM Audit Files Resource Manager DB Upload (Cron job) GT4 Java Container - Query Using Grid JID (job contact) GRAM Audit Table RM Accounting User Job(s) - Reply with Accounting record OGSA DAI Local AMIE Accounting ** No conversion needed job contact == Grid JID AMIE upload Central TG Accounting DB
Gateway Web Services NeedsIvan Judson, ANL • Interfaces provided by the TeraGridThe list of services that have been identified by the gateways developers includes: • Resource Status Service (both polling and pub/sub) • Job Submission Interface • The gateways expect this to be provided by WS-GRAM • Job Tracking Interface (Both polling and pub/sub) • File/Data Staging Interface • Retrieve Usage Information • Retrieve Inca Info • Advanced Reservation Interface • Cross-site Run interface • Pushing DN to an RP interface • Interfaces provided by the GatewaysThe list of services that have been identified by the gateways developers and the TeraGrid Security group includes: • Retrieve user information for a job • Retrieve accounting information/statistics • Provides the necessary means to track down problem job submissions, identify malicious users, and tabulate accounting and logging information for reporting needs by the RPs. It is expected that the information provided for the first interface is simply the (resource, job id) that is known by both parties at job submission time. This interface provides sufficient user information for the RPs to deal with the situation at hand, and possibly identifies another interface that should be provided by the gateways: • Don't submit jobs from the user who submitted job (resource, job id), until we say it's Ok. • The accounting interface requires no information, but returns sufficient accounting information and statistics to report to funding agencies, program managers, etc.
Portal Technology RATJens Schwidder, ORNL Summarize portal technologies used by Gateways: • Clarens, InVIGO, OGCE • Strengths and weaknesses, user registration, auditing mechanisms • Issues encountered when utilizing TG resources • Provide recommendations for future Gateways • Make sure planned technologies can be supported by existing TG infrastructure
1. Introduction 2. Science Gateway in Context a. Science Gateway (SGW) Definition(s) b. Science Gateway user modes c. Distinction between SGW and other TeraGrid user modes 3. Components of a Science Gateway a. User Model b. Gateway targeted community c. Gateway Services d. Integration with TeraGrid external resources (data collections, services, …) e. Organizational and administrative structure 4. TeraGrid services and policies available for Science Gateways a. Portal middleware tools (user portal and other portal tools) b. Account Management (user models, community accounts, ) c. Security environment (security models) d. Web Services e. Scheduling services (and meta-scheduling) f. Community accounts and allocations g. Community Software Areas h. All traditional TeraGrid services and resources i. Ability to propose additional services and how that would interact with TeraGrid operations 5. Responsibilities and Requirements for Science Gateways a. Interaction with and compatibility with TeraGrid communities b. Control procedures i. Community user identification and tracking (map TeraGrid usage to Portal user) ii. Use monitoring and reporting iii. Security and trust iv. Appropriate use 6. How to get started a. Existing resources i. Publication references ii. Web areas with more details iii. Online tutorials iv. Upcoming presentations and tutorials b. Who to contact for initial discussions c. How to propose a new Gateway d. How to integrate with TeraGrid Gateways efforts. e. How to obtain a resource allocation Gateways Primer Outline
Want to be involved? • email@example.com mailing list • Email firstname.lastname@example.org • <subscribe gateways> in body • Biweekly telecons to get advice from others. Current focus • Auditing strategy • Mini-tutorial at April Lariat workshop, “Accelerating Research Through Grid Computing” • Hands on tutorial at June conference • Overview of Gateways • In depth presentations by LEAD, nanoHUB, RENCI, GIScience • Transition to GT4 • Scheduling requirements • As original gateways move into production, we will be able to provide short term support to new projects that would benefit • www.teragrid.org • Nancy Wilkins-Diehr, email@example.com