380 likes | 484 Views
M. Thomas, S. Mock, M. Dahan, K. Mueller, D. Sutton San Diego Supercomputer Center, UCSD And J. Boisseau Texas Advance Computing Center, Univ. of Texas at Austin Presented at the 10 th IEEE International Symposium on High Performance Distributed Computing
E N D
M. Thomas, S. Mock, M. Dahan, K. Mueller, D. Sutton San Diego Supercomputer Center, UCSD And J. Boisseau Texas Advance Computing Center, Univ. of Texas at Austin Presented at the 10th IEEE International Symposium on High Performance Distributed Computing 7-9 August, 2001 San Francisco, CA The GridPort Toolkit: a System for Building Grid Portals
Outline • Intro/Background/Motivation • The GridPort Toolkit • GridPort-based Application Portals • Web Services Experiments • Future Work/Conclusions • FOR ED’s IPAQ (wherever it is): • https://hotpage.npaci.edu/pda
Motivation • Computational science environment is complex: • Users now have access to a variety of distributed resources • Interfaces to these resources vary and change often • Policies, accounts, etc. differ across sites/orgs • Computational scientists are not computer scientists: • Provide universal, easy access to resources • Focus on keeping the GCE simple, easy to use, easily accessible. • Users of GridPort-based portals require no software downloads or configuration changes on the client side, and run on common web browsers. • Driving philosophy: • Focus on Grid users and developers that will benefit from simple portals and portal technologies • reduce workload on Grid users and Grid application developers. • Support users and smaller projects
Evolution of GridPort/HotPage • 1997-1998 (the intern years) • NPACI HotPage project started; Informational services • 1999: • Informational HotPage installed at other sites • Globus Toolkit interactive services (beta, GRAM/GSI) • Formed GridPort Toolkit project; Technology transfer • 2000: • Developed GridPort v1.0 to support application portals (LAPK, GAMESS) • User portal collaboration GSI across NPACI, Alliance, NASA/IPG • PDA version • 2001: • Released GridPort Toolkit v2.0 for public use: • Session state, single login; SRB integration coupled to GSI • Web services experiments • HotPage updated interface; supporting PACI/NPACI; personalization • Creating Perl Package version of GridPort • Created Globus Perl CoG Module
GridPort Design Requirements • Universal access: • Portals will be web-based; must support ‘old’ browsers • Portals must run anywhere, anytime, leave no data • Require no downloads, plug-ins or applications • Technology transfer: • GridPort is a Grid ‘jump-start’ kit • Leverage infrastructure provided by World Wide Web • Use common Grid technologies and standards: • minimize impact on already burdened resource administrators • GridPort Toolkit should not require that additional services be run on the HPC Systems • Provide a scalable and flexible infrastructure: • Facilitate adding/removing Grid resources, services, jobs, and users
GridPort Design Requirements (cont.) • Security: • Support HTTPS/SSL encryption at all layers, and provide access control. Base on GSI. • Single login: • Required for easy access/navigation across Grid resources. • Client applications and portal services should be able to run on separate webservers: • Enable scientists to build their own application portals and use existing portals for common infrastructure services • Any site should be able to host a portal • Any user should be able to create their own portal if they have accounts and certificate • Adopt Global Grid Forum standards: • Actively collaborate and promote Global Grid Forum activities
GridPort Layers • Clients: • Web browsers, including PDA versions • Plan to expand to other wireless devices • Application Portals: • Currently, they exist on same physical machine and share domain (cookies) • Served to clients by separate virtual webservers • hotpage.npaci.edu or gridport.npaci.edu • All use the same instance of the GridPort libraries. • Share data, libraries, filespace, and other services on the webserver machine. • Single-login environment
GridPort Layers (cont.) • Portal Services. • For portals and users • Managing session state, portal accounts, file collections, • Monitoring the information services • Services that are portal specific • not typically addressed by Grid or web developers. • Grid Services • Standard middle and backend tiers of the Grid • Globus, Legion, SRB, NWS, Apples and (someday) etaschedulers • Resources • Compute • Archival
GridPort Layers • NEED A SCHEMATIC OF THIS
Commercial Technologies Employed • Server: • Netscape or Apache servers • HTTPS, SSL, HTML/JavaScript, SSH • Perl 5.0/CGI • Database: flat text configuration files • migrating to SQL/Oracle in limited cases (reliability) • Will use DB to generate text files • OS: Unix/Solaris, Linux • Client: • Netscape Communicator, IE (4.0 or greater) • PC, Mac, Sun/Solaris, SGI • HTTPS, SSL, HTML/JavaScript (limited use)
Grid Technologies • Globus/GRAM gatekeeper • used to run interactive jobs and tasks, and to submit batch jobs on remote resources • Grid Security Infrastructure (GSI)/MyProxy • used for security and authentication • Grid Information Systems/Grid Resource Information System (GIS/GRIS) • used for information services where available • SDSC Storage Resource Broker (SRB) • used for distributed file collection and management • Key problem: • not all partners install and maintain all software
Services Supported • Portal user accounts: • On-line account/certificate creation unique portal ID • Associate portalID with DN • Associate DN with user ID’s in mapfiles authenticate • Track sessions, user preferences, distributed filespace • Portal users must have valid PKI/GSI certificate. • Accepted certs: NPACI, Alliance, NASA/IPG, Cactus, Globus • This is a complex process, so it does not scale yet • Authentication (2 ways): • Authentication against certificate data stored in the SDSC certificate repository. • Myproxy server • We save proxy file for duration of session • Sessions expire after timeout period or user logs out.
Services Supported • Jobs: • Executed via the Globus/GRAM gatekeeper. • Simple Unix-type commands: • mkdir, ls, rmdir, cd, and pwd. (part of API) • Compiling and running programs • job and batchscript submission and deletion, and viewing of job status and history. • Files: • Access to compute, archival, portal file space • file transfer • between the local workstation and the HPC resources • Between any 2 resources (via SRB) • Perform common file management operations on remote files • tar/untar, gzip/gunzip, and movement to archival storage.
Compute: IBM (Blue Horizon, SP) Compaq (TCS1) CRAY (T3E, T90) Sun (E10K) SGI (O2K) Hewlett Packard (V2500) Workstations and clusters. Others Archival: HPSS, DMF, MASS Any system running Globus can be added Multiple sites, centers, and orgs PACI Grid: NPACI, Alliance, PSC, hopefully DTF NASA/IPG Multiple sites/locations: SDSC NCSA Pittsburgh Supercomputing Center Universities: UT Austin, Univ. of Kentucky, Boston Univ. Resources Supported
Security Implementation • Security between the client -> web server -> grid: • SSL/RC4-40 128 bit key/ SSL RSA X509 certificate • GSI authentication used for all portal services: • Transparent access to the grid via GSI infrastructure • Authentication tracked with cookies: • Coupled to server DB/session tracking, maintain session state • Assigned a random value by the webserver at login • Random value in the cookie corresponds to a session file • Session file contains a timestamp • Single login environment: • Provides access to all NPACI Resources where GSI available • With full account access privileges for specific host • Within same domain because of cookies
Security Implementation (cont.) • User authentication via valid proxy files: • Proxy generated from key/cert pair or retrieved from MyProxy • Sensitive data (proxies) stored in restricted access portal repository • Repository located outside webserver filespace • Has user and group permissions control • Portal acts as proxy: • Executing requests on behalf of the user • Only what user is authorized to access • Based on credentials presented when portal account created • Users have same level of access to resource as if logged on • Globus used for client requests on resources • GSI used at all layers forward session proxy file
Applications Running on GridPort • 2 approaches for portals: • Those developed by the NPACI Team • Those developed by the application team/developer (blue) • Application portals in production: • PACI HotPage, https://hotpage.npaci.edu • NPACI HotPage, https://hotpage.paci.org • Pharmacokinetic Modeling, https://gridport.npaci.edu/LAPK • General Atomic and Molecular electronic Structure System, https://gridport.npaci.edu/GAMESS • Portals developed by project application developers: • Bays to Estuaries Project (BBE), http://bbe.npaci.edu • Protein Database/CE Portal, https://gridport.npaci.edu/CE • Telescience (9/30/01), https://gridport.npaci.edu/Telescience
Using GridPort • Install Perl libraries and GridPort code on webserver • Application portal developer incorporates GridPort libraries directly into code • Can modify or add subroutines • General pattern (for our dev. team) • Uses between 3 and 6 lines of Perl code to access functions • Jobs, files, auth, etc. • Each of the CGI scripts for application portals developed with GridPort follow this pattern • An Example: HotPage Batch job submission • Contains three lines of code that reference GridPort. • Other lines of code (~750) are specific HotPage
HotPage View: Job Submission HotPage
Laboratory for Applied Pharmacokinetics (LAPK) • Community Model Portal: • users are Doctors, so need extremely simple interface • Must be portable – run from many countries/labs • Need to hide details such as • Resource, files, batch scrips, compilation, UNIX env. • Uses gridport.npaci.edu portal services/capabilities: • File upload/download between local host/portal/HPC systems • Jobs: • Job submit (builds batch script, moves files, submit jobs) • Job tracking: moves results to user filespace when complet • Job cancel/delete • Job History: maintains relevant job information • Impact: • LAPK users can now run multiple jobs at one time using portal
GCE Web Services • New architecture for GCE’s is emerging: • Workshop held at SDSC (May ’01) to discuss this. • Grid Portals Markup Language/XML • Constructing GCE Testbed • Based on ‘web services’ model that is currently evolving in commercial world: • Sun Jxta, IBM WebSphere Microsoft .NET • XML/SOAP/UDDI/WSDL • CCA (See Gannon’s talk) • In this expt, our ‘port’ is a URL • Key Advantage: Client may be a web page/portal, another application or Grid service • Allows separation of the function of hosting client from the service or application being used
A Web Services Expt: GridPort Client Toolkit • Focus on medium/small applications and researchers • Choose simple protocol (HTTP/CGI/Perl) • Client/application can be located on any server or system. • Connection to portal services is through the GCT: • https://portals.npaci.edu/client/tools/FUNCTIONS • Inherits all existing portal services running on portal • Including authentication/single login • It’s easy • Took 1 week to develop GCT • Key project goal: • Allow scientist to write local portals/apps/etc. and use services
Web Services Expt: GridPort Client Toolkit • Ease of use: • Do not have to install complex code to get started: • webservers, no Globus, no SSH, no SSL, no PKI, etc. • Do not have to write complex interface scripts to access these services (we’ve done that already) • Do not have to fund advanced web development teams • Client has local control over project, including filespace, etc. • Integration to existing portals has been done: • Bays to Estuaries project
Authentication: Login Logout Check authentication state Jobs: Sumbit jobs to queues Cancel jobs Execute commands (command like interface) Files: Upload from local host Download to local host FTP – move FILE View Portal FILEpace (?) Commands: Pwd Cd Whoami Etc. Services Implemented in GCT
Basin, Bays to Estuaries (BBE) Portal • Community model: Scientific portal for conducting multi-model Earth System Science (ESS): • Simulations are run to forecast the transport of sediments within the San Diego Bay area during a storm. • Technology developed for the BBE project: • Website located on BBE webserver/machine • http://bbe.npaci.edu • Uses SRB for file management (GSI) • Perl/CGI based portal • Minimal effort required to modify code: • Use GCT for all interactive functions: • Hardest part was installing Perl/LWP module on local sytsem • Roughly 14 tests needed to integrate GCT into portal • 4 new Perl scripts required
Conclusions • Remember your client: • Developer != User • Robust portals can be built with simple technologies • GridPort is a good ‘jump-start’ Toolkit • Promotes rapid deployment • We need • Grid accounts so we don’t have to update 10 billion mapfiles • Common/shared security • Grid metaschedulers so our users can run on best available system • Grid aware compilers • Grid information services that are fast • Grid Web services
Future Work • GridPort V3.0: • In process of evaluating new technologies: • Java technologies to support CCA efforts • Considering move away from Perl (XML incompatibilities one reason) • Data portal technologies: SRB and GSI-FTP • Support personalization at account level • Web services used by production application portals • HotPage v3.0: • Expand to DTF, and non NSF PACI systems • Expand personalization MyHotPage • Implement use of NPACI Machines database • Update to accommodate Virt. Org. concepts • Automatic SRB collections for all users
New Directions • Continue Web services architecture research: • Collaboration with GGF/GCE Research Area and working groups • GCE Testbed plan underway • USA: PACI, Alliance, NASA, Jefferson lab, PNNL, others; • Europe: Daresbury, Cactus, others? • Collaboration with Sun & CAL(IT)2 project
GridPort “Team” • SDSC Staff: • Mary Thomas • Steve Mock • Kurt Mueller • Maytal Dahan • Cathie Mills • Student interns: Ray Regno, Akhil Seth • A Collective Effort: supported by SDSC services – • Server systems (Josh Polterock) • HPC Systems (Victor Hazelwood) • Databases (Dave Archibal) • Distr. Computing (Keith Thompson, Bill Link)
Acknowledgements • San Diego Supercomputer Center and the NSF funded PACI programs for their support (both with resources and staff) • Grants: • NPACI, NSF-ACI-975249 • NPACI–NSF-NASA IPG Project Supplement • Pharmacokinetic Modeling NCRR Grant No RR11526 • NBCR , NIH/NCRR P41 RR08605-07 • Collaborators: • PACI Partners • The User Portal Collaboration members: NASA/IPG, LBL, PNNL • Globus team for providing valuable input and ideas on this project: Gregor von Laszewski, Carl Kesselman and others. • The Global Grid Forum GCE working group
References • GridPort Toolkit Website • https://gridport.npaci.edu • NPACI HotPage User Portal • HotPage: https://hotpage.npaci.edu • Accounts: http://hotpage.npaci.edu/accounts • Downloads: • http://gridport.npaci.edu/downloads • GridPort Toolkit • NPACI HotPage • GCT Portal (frames based) • Contact: • Mary Thomas (mthomas@sdsc.edu)