grid computing using modern technologies l.
Skip this Video
Loading SlideShow in 5 Seconds..
Grid Computing Using Modern Technologies PowerPoint Presentation
Download Presentation
Grid Computing Using Modern Technologies

Loading in 2 Seconds...

play fullscreen
1 / 70

Grid Computing Using Modern Technologies - PowerPoint PPT Presentation

  • Uploaded on

Grid Computing Using Modern Technologies. A 3-Part Tutorial presented by: Mary Thomas Dennis Gannon Geoffrey Fox. Tutorial Outline. Part I: Mary Thomas (TACC/UT) Understanding Web and Grid portal technologies Building application portals with GridPort Grid Web services

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Grid Computing Using Modern Technologies' - medwin

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
grid computing using modern technologies

Grid Computing Using Modern Technologies

A 3-Part Tutorial presented by:

Mary Thomas

Dennis Gannon

Geoffrey Fox

tutorial outline
Tutorial Outline
  • Part I: Mary Thomas (TACC/UT)
    • Understanding Web and Grid portal technologies
    • Building application portals with GridPort
    • Grid Web services
  • Part II: Dennis Gannon (Indiana)
    • Distributed Software Components
    • Grid Web services and science portals for scientific applications
  • Part III: Geoffrey Fox (Indiana)
    • Integrating Peer-to-Peer networks and web services with the Grid
    • Grid Web services and Gateway
introduction to developing web based grid computing portals

Introduction to Developing Web-BasedGrid Computing Portals

Mary Thomas

Texas Advanced Computing Center

The University of Texas at Austin



Presented at GGF4, Toronto, Canada, Sunday, 2/15/02

goals of part 1
Goals of Part 1
  • Introduce basic portal technologies and concepts
  • Provide enough knowledge to go out and begin process of evaluating/understanding technologies
  • Provide enough knowledge to build a computing portal based on GridPort/Perl/CGI
  • Not going to teach you how to install Grid software: assume you can do this is you have someone who takes care of this
  • Audience: application developers or scientists
  • Introduce the concept of a portal for computational Grids and science applications
  • Use GridPort Toolkit to demonstrate the basic concepts needed to understand how to use the web and grid technologies
  • Show examples of how to program a portal
  • Defining Portals and Web Technologies
  • The GridPort Portal Architecture
    • GridPort-Based Portals
    • Application Portals
    • Programming Example
  • Portal Services for Remote Clients
    • Client Portal Services
    • Grid Portal Web Services
web server technologies
Web Server Technologies
  • Web Servers:
    • Run on a machine, and clients access the process
    • Common Versions:
      • Netscape (
      • Apache ( - open source
  • OS’s: Windows, Unix, MacIntosh, Lynus, etc.
  • Web Programming Languages
    • Server: Java, Javascript, Python, PHP, Perl
    • Client: HTML, Javascript
    • Protocols: HTTP/CGI/Servlets/Applets
  • Security
    • HTTPS, SSL, Encryption
    • Cookies
    • Certificates
web clients
Web Clients
  • Multiple display devices:
    • Desktop workstations, PC’s, PDA’s, cell phones, pagers, other wireless devices, televisions
  • Various viewing tools:
    • Browsers: Internet Explorer, Netscape Navigator, Opera
    • Visually Impaired tools
  • OS’s: Windows/WinCE, Unix, Mac, Lynux, Palm, etc.
  • Web Programming Languages
    • HTML, Javascript
    • Perl, Java for ‘scrapers’
  • Security
    • HTTPS, SSL, Encryption, Cookies
    • Certificates
what is a portal
What is a Portal?
  • Web sites that provide centralized access to a set of resources
  • Characterized by
    • Personalization
    • Security/authentication/authorization
    • What you see often changes based on what you are looking for (e.g.: adds)
    • Navigation/choices
  • Gateway for Web access to distributed resources/information
    • Hub from which users can locate all the Web content that they commonly need.
classes of portals
Classes of portals
  • Horizontal or “mega-portals”:
    • information from search engines and the ISP's (yahoo)
    • everybody comes in, sees the same thing
    • allow personalization to some degree
  • Vertical
    • portals that are customized by the system.
    • the system recognizes who you are, and gives you a different view of the university or the company that you're going to build.
    • More specialized (amazon, travelocity, etc.)
  • Intranet:
    • inside a company that give particular people the information that they need
scientific web portals
Scientific Web Portals
  • Differ from Commercial Portals (yahoo, amazon)
  • Types of Science Portals:
    • User Portals:
      • simplify user’s ability to interact with and utilize a complex, often distributed environment
      • direct access to resources (compute, data, archival, instruments, and information)
    • Application Interfaces
      • Enables scientists to conduct simulations on multiple resources
    • EOT Portals
      • Educates public (future scientists?) about science using software simulations, visualizations, etc
      • Learning tools
    • Individual Portals
      • Users can roll out their own portals by writing web pages using standard HTML or Perl/CGI
why use portals for computational science
Why Use Portals for Computational Science?
  • Computational science environment is complex:
    • Users have access to a variety of distributed resources (compute, storage, etc.).
    • Interfaces, OS’s, Grid tools to these resources vary and change often
    • Environment changes:
      • Relocation/upgrade of binaries
    • Policies at sites sometimes differ, allocations change
    • Using multiple resources can be cumbersome
  • Grid adds complexity for programmers
portals provide simple interfaces
Portals Provide Simple Interfaces
  • Portals are web based and that has advantages -
    • Users know & understand the web
  • Can serve as a layer in the middle-tier infrastructure of the Grid
    • Integrate various Grid services and resources
  • Users can be isolated from resource specific details
  • Single web interface isolates system changes/differences
  • Not and end-all solution - several issues/challenges here
    • Performance, scaleability
virtual organizations
Virtual Organizations
  • GGF Model, based on Foster paper
    • “Anatomy of the Grid”
  • Hierarchical tree based
  • Each ‘node’ represents collections of:
    • Compute resources
    • Projects,
    • Centers
  • HotPage portals represent VO’s
    • Like to build HiPCAT as a VO to run as DTF node
virtual organizations hotpage
Virtual Organizations (HotPage)











portal toolkits
Portal Toolkits
  • Commercial:
    • Sun: Java Servlets, iPlanet
    • IBM: WebSphere
    • MSFT: .NET
  • Special interest groups:
      • uPortal, Javaspeed
  • R&D within Grid community:
    • GridPort Toolkit (
    • GPDK ()
    • GirdSphere (refer to ACM site???)
    • Gateway (Fox)
    • CCA (Gannon)
gridport toolkit design concepts
GridPort Toolkit Design Concepts
  • Key design goals:
    • Any site should be able to host a user portal
    • Any user should be able to create their own user portal if they have accounts and certificate
  • Key Requirements:
    • Base software design on infrastructure provided by World Wide Web:
      • use commodity technologies wherever possible
      • avoid shell programs/applications/applets
    • GridPort Toolkit should not require that additional services be run on the HPC Systems
      • reduce complexity -- there are enough of these already
      • so, leverage existing grid research & development
    • GSI certificate (PKI)
gridport designed for ease of use
GridPort Designed for Ease of Use
  • WWW interface is common, well understood, and pervasive
    • User Portals can be accessed by anyone who has access to a web browser, regardless of location
  • Users can construct customized application web pages:
    • only basic knowledge of HTML and Perl/CGI
  • Application programmers can extend the set of basic functions provided by the Toolkit
  • Portal services hosts can modify support services by adding/remove/modifying broker or grid interface codes
commodity web technologies
Commodity Web Technologies
  • Use of commodity web technologies -> Portability
    • contribute to a ‘plug-n-play’ grid
  • Requirements:
    • Any Browser: Communicator, IE
    • HTTP, HTTPS, SSL, HTML/JavaScript, Perl/CGI, SSH, FTP
    • Netscape or Apache servers
  • Based on simple technology, this software is easily ported to, and used by other sites.
    • easy to modify and adapt to local site policies and requirements
  • Goal is to design a toolkit that is simple to implement, support, port, and develop
grid technologies
Grid Technologies
  • Security:
    • Globus Grid Security Infrastructure (GSI), SSH
    • Myproxy for remote proxies
  • Job Execution:
    • Globus GRAM
  • Information:
    • Globus Grid Information Services (GIS)
  • File Management:
    • SDSC Storage Resource (SRB)
    • GridFTP
information services
Information Services
  • Designed to provide a user-oriented interface to NPACI Resources and Services
  • Consists of on-line documentation, static informational pages, and links to events within NPACI, including basic user information such as:
    • Documentation, training , news, consulting
    • Simple tools:
      • application search
      • systems information
      • generation of batch scripts for all compute resources
      • Network Weather System
  • No user authentication is required to view this information.
information services dynamic
Information Services: Dynamic
  • Dynamic information provided by automatic data collection scripts that provide real-time information for each machine (or summaries) such as:
    • Status Bar: displays live updates showing operational status and utilization of all NPACI resources
    • Machine Usage: displays summary of machine status, load, and batch queues
    • Batch Queues: displays currently executing and queued jobs
    • Node Maps: displays graphical map of how running applications are mapped to nodes
    • Network Weathering System: provides connectivity information between a user’s local host and grid resources
  • Pulled from 3 possible sources:
    • MDS, web services, local cron jobs
interactive sessions
Interactive Sessions
  • How do they work?
  • What do they do:
    • Job submission
    • File management
    • Authentication
  • What do we use to do them?
    • List of grid technologies
interactive sessions login logout
Interactive Sessions: Login/Logout
  • Login:
    • client browser connects to an HTTPS server
    • user enters valid NPACI Portal account ID
    • login into Portal using CA authentication (Globus):
      • Globus infrastructure manages user accounts on remote hosts
      • username used to map passphrase to valid key and certificate (local repository), or Myproxy (remote)
      • passphrase used to create proxy cert. using globus-proxy-init
      • if proxy-init successful, session key stored on client browser
      • data passed through web server over SSL channel
      • Session info stored in secure, local repository
interactive sessions login logout34
Interactive Sessions: Login/Logout
  • Logout:
    • user automatically logged out
      • if logout selected
      • session times out
    • on logout:
      • active session data files cleared
      • relevant user information archived for future sessions stored
grid security at all layers
Grid Security at all Layers
  • GSI authentication for all portal services
    • transparent access to the grid via GSI infrastructure
    • Security between the client -> web server -> grid:
      • SSL/RC4-40 128 bit key/ SSL RSA X509 certificate
    • authentication tracked with cookies coupled to server data base/session tracking
  • Single login environment (another key goal)
    • provide access to all NPACI Resources where GSI available.
    • with full account access privileges for specific host
    • use client browser cookies to track state
portal accounts
Portal Accounts
  • Portal accounts are not the same as resource accounts.
    • valid Grid user on resource, need allocations
    • processes run under own account with same access and privileges as if they had logged onto resource
  • Portal users must have a digital certificate signed by a known Certificate Authority (CA)
    • And must get DN into mapfile
  • Accounts for NPACI users obtained via an on-line web form:
    • Can generate a certificate - certificate and key are placed in a secure repository
interactive sessions job execution
Interactive Sessions: Job Execution
  • Web server transactions:
    • confirm/authenticate user login status
    • parse command/request (CGI vars)
    • establish user environment
    • assemble remote command (Globus/SSH)
    • verify proxy (if Globus) or recreate
    • send command (e.g., Globus daemon on remote host)
    • parse, format, and return results to the web browser on the user’s workstation or store data (e.g., FTP).
  • While the user login is in the active state:
    • check for timeout
    • track current state
    • record information about job requests and user data for use in subsequent transactions or sessions.
gridport file system
GridPort File System
  • Without SRB capabilities, files are distributed
  • Adds to complexities when migrating and managing data
gridport srb architecture
GridPort + SRB Architecture
  • With SRB capabilities, file access is direct
  • Single SRB account access allows for more flexible data management
variety of gridport applications
Variety of GridPort Applications
  • Current applications in production:
    • NPACI/PACI HotPages (also @PACI/NCSA )
    • LAPK Portal: Pharmacokinetic Modeling (live demo of Pharmacokinetic Modeling Portal)
    • GAMESS (General Atomic and Molecular electronic Structure System)
    • Telescience (Ellisman)
    • Protein Data Bank CE Portal (Phil Bourne)
programming example job submit
Programming Example: Job Submit
  • Client:
    • Example of Client HTML page
    • HTML Code
  • Server:
    • Perl/CGI parser script running on server
    • GridPort Toolkit function code
jobsumbit html code
JobSumbit HTML Code

<FORM action=""

method=post enctype="application/x-www-form-urlencoded" name="job_submit">

Arguments: <INPUT TYPE="text" NAME="args">

Select Queue: <SELECT NAME="queue">

<OPTION VALUE="low">low

<OPTION VALUE="normal">normal

<OPTION VALUE="high">high

<OPTION VALUE="express">express


Number of Cpu’s: <INPUT TYPE="text" NAME="cpus">

Max Time (min): <INPUT TYPE="text" NAME="max_time">

<INPUT TYPE="hidden" NAME="mach" VALUE="SSPHN">

<INPUT TYPE="hidden" NAME="exe" VALUE="/rmount/paci/sdsc/mthomas/mpi_pi">

<INPUT TYPE="submit" METHOD="post" ACTION="" >


jobsumbit server perl cgi parser
JobSumbit: Server Perl/CGI Parser
  • GRABS HTTP/CGI data and sends it to GridPort subroutine, waits for results


use CGI qw(:all);

my $query = new CGI;




$MY_LOCATION = "tools/cgi-bin";

$CURRENT_DIR = `pwd`;

($PORTAL_ROOT, $rest) = split(/$MY_LOCATION/, $CURRENT_DIR);

$GLOBAL_VARS_CONFIG = $PORTAL_ROOT . "cgi-bin/global_vars.cgi";


require "$PORTAL_HOME_DIR/cgi-bin/hotpage_authen.cgi";


jobsubmit server perl cgi code cont
JobSubmit: Server Perl/CGI code (cont.)

# load in code to do job submission through globus

require "$GRIDPORT_HOME_DIR/services/globus/cgi-bin/gridport_globus_job.cgi";

# subroutines to get/set user directories (home,work, current) and do job handling

require "$PORTAL_HOME_DIR/tools/cgi-bin/user_dirs.cgi";

require "$PORTAL_HOME_DIR/tools/cgi-bin/user_jobs.cgi";

my $args = $query->param(args);

my $queue = $query->param(queue);

my $cpus = $query->param(cpus);

my $max_time = $query->param(max_time);

$mach = $query->param(mach);

my $exe = $query->param(exe);

$exe = $exe . " $args";

# run the command through Globus, trap output, return to caller process

@output = gridport_globus_job_submit($mach,$cpus,60,$exe,$max_time,$queue);

gridport globus job submit

sub gridport_globus_job_submit {

my @job = ();

my $user = &get_username();

### get the input and set up globus

my ($mach, $cpus, $timeout, $exe, $max_cpu_time, $queue) = @_;

&globus_config($user); # verify data

&mach_config($mach); # verify data

#build the globus command

my $globus_submit = "$globus_job_submit{$machines{$mach}{gv}} ";

$globus_submit .= "$machines{$mach}{name}{job} -np $cpus -queue $queue ";

$globus_submit .= "-maxtime $max_cpu_time $exe";

@job = run_command_timeout($globus_submit, $timeout); # run job

return @job; }

laboratory for applied pharmacokinetics
Laboratory for Applied Pharmacokinetics
  • (LAPK) Portal:
    • Users are Doctors, so need extremely simple interface
  • Must be portable – run from many countries
  • Need to hide details such as
    • Type of resources (T3E), file storage, batch script details, compilation,UNIX command line
  • Major Success:
    • LAPK users can now run multiple jobs at one time using portal.
    • Not possible before because developers had to keep codes & scripts simple enough for doctors to use on T3E
laboratory for applied pharmacokinetics49
Laboratory for Applied Pharmacokinetics
  • Uses portal services/capabilities:
    • File upload/download between local host/portal/HPC systems
    • Job Submit:
      • submission (builds batch script, moves files to resource, submit jobs)
      • Job tracking: in the background portal tracks jobs on system and moves results back over to portal storage when done
      • Job cancel/delete
    • Job History: maintains relevant job information
portal services
Portal Services
  • How does one convert/modify existing applications?
    • Can develop your own version of GridPort
    • Can install GridPort or other toolkits
  • Remote clients are typically browsers accessing local portal (HotPage)
  • Application website located on ANY server:
    • either on local filespace/system where webserver and GridPort toolkit installed
    • Or, running on a remote machine
  • Want to allow remote users to have control over access, display, interactions, etc:
    • Need for a new service model
remote portal services
Remote Portal Services
  • Must be Grid based:
    • Critical to support GSI
  • 2 current solutions (GridPort)
    • GridPort Client Toolkit
      • Allows client to use simple HTML to build remote web pages; limited to HTML/FORMS, CGI/JSP model
    • Grid Portal Web Services:
      • supports variety of clients
        • Can be an application
        • can be a portal server program
        • Can be another web service
gridport client toolkit
GridPort Client Toolkit
  • Focus on medium/small applications and researchers
    • Not all app. Scientists can afford to hire web team
  • Base on simple protocols (HTTPS/CGI/Perl)
    • Could use applets or JSP
  • Connection to portal services is through the GCT:
    • GridPort Client Toolkit
    • Inherits all existing portal services running on portal
    • Limited job functions, but concept works and is needed
  • An Experiment in progress
    • not production yet
gridport client toolkit56
GridPort Client Toolkit
  • Ease of use:
    • Do not have to install complex code to get started:
      • webservers, no Globus, no SSH, no SSL, no PKI, etc.
    • Do not have to write complex interface scripts to access these services (we’ve done that already)
    • Do not have to fund advanced web development teams
  • Client has local control over project, including filespace, etc.
  • Integration to existing portals can be done:
    • Bays to Estuaries project
how does gct work


SDSC Repository

Job Fail




How Does GCT Work?

FORM/CGI action







services implemented in gct



Check authentication state


Sumbit jobs to queues

Cancel jobs

Execute commands (command like interface)


Upload from local host

Download to local host

FTP – move FILE

View Portal FILEpace (?)






Services Implemented in GCT
basin bays to estuaries bbe portal
Basin, Bays to Estuaries (BBE) Portal
  • Community model: scientific portal for conducting multi-model Earth System Science (ESS):
    • Simulations are run to forecast the transport of sediments within the San Diego Bay area during a storm.
  • Technology developed for the BBE project:
    • Website located on BBE webserver/machine
    • Uses SRB for file management (GSI)
    • Perl/CGI
  • Uses GCT for all interactive functions:
    • minimal effort required to modify code
    • roughly 14 tests needed to integrate GCT
    • four new perl scripts required 
grid portals the problem
Grid Portals: the Problem
  • Example: portal or applications need to perform grid tasks for any arbitrary user, on any arbitrary resource, and span all ‘layers’ of the grid
    • portals must be ‘aware’ of resources (use GIS)
    • What grid services are running on that resources:
      • Globus/Legion/VegaGrid/SSH, etc
      • GIS
      • GSI/Kerberos, MyProxy
    • Request syntax differs for each resource:
      • GRAM/Legion/SSH/MAUI/PBS/Others
    • Portal must have permission to use/access for user (GSI, MyProxy)
grid portals complexity grows
Grid Portals: Complexity Grows
  • Growth of Grid presents a huge complexity problem for developers and systems that does not scale
  • Portals interact with/integrate all layers:
    • GIU/Client interface
    • Uses all middleware services (Globus, SRB, GSI-FTP, etc.)
    • Each portal in the world must store and configure same data
    • Repeated data, open to errors, variations
    • Multiple programmers repeating same tasks and implementations
    • Much portal software is “hard-coded” and not dynamic
  • The Grid is international, need for scaleable, interoperable services
  • Too much ‘hard-coding’ needed at this time (big issue for Portals)
web services a proposed solution
Web Services: a Proposed Solution
  • Web services architecture provides mechanisms for
    • dynamic service discovery (UDDI)
    • Separation of implementation from function (WSDL)
    • Know protocol (SOAP/HTTP, SOAP/RPC)
  • Service provider encapsulates implementation details
  • Client does not need to know details, just where to send the request
  • Challenge will be discovery  problem with Jini/CORBA
  • Commercial world developing web services technologies in P2P world:
    • Implies funding/support
    • rapid development/technology advancement
    • Caution: this does NOT imply cohesiveness or standards
  • Note: in some ways, Globus/GRAM is a web service
  • Advantage: language independent, so can run on any system
    • We are pursuing Perl, Python, Java, C++ at this time
proposed web service architecture
Proposed Web Service Architecture
  • Adopt W3C standards for:
    • WSDL 1.1: current standard (note: uses old XML version)
    • SOAP 1.1: over HTTP/HTTPS/GSI for authentication
      • Java 1.3 or greater
      • Python 2.0 or greater
    • XML schemas for language/description
    • Explore UDDI 2.0/WSIL
  • Require:
    • GSI
  • Adopt “anatomy of the Grid model”
    • virtual organizations
    • Portals be built as services in addition to applications
web services example job submit















Change in workflow model: before, each step represents code residing on local portal.

Web Services Example: Job Submit









Grid Web


Grid ServicesCollective and Resource Access



Grid Protocols and Grid Security Infrastructure

XML / SOAP over Grid Security Infrastructure

Job Submission /


Grid Protocols and Grid Security Infrastructure

Discipline /



(e.g. SDSCTeleScience)

http, https


File Transfer


Data Management



ProblemSolvingEnvironments(AVS, SciRun,Cactus)



Web Browser



Replica Catalog /








Grid X.509CertificationAuthority

  • other services:
  • visualization
  • interface builders
  • collaboration tools
  • numerical gridgenerators
  • etc.



Secure, ReliableGroup Comm.


Python, Java, Perl, etc.,

JSPs format to html

CoG Kits implementing

Web Services in servelets, servers, etc.

Grid Web ServiceDescription (WSDL)

& Discovery (UDDI)

Apache Tomcat&WebSphere&Cold Fusion=JVM + servlet instantiation + routing

Apache SOAP,.NET, etc.

gridport team
GridPort Team
  • GridPort Project represents collaboration efforts spanning TACC, SDSC, NPACI:
    • Mary Thomas, Rich Toscano (TACC)
    • Steve Mock, Maytal Dahan, Cathie Mills, Kurt Mueller (SDSC)
  • And input from other Institutions:
    • Argonne/ISI: Globus development team
    • NCSA/Alliance
    • NASA/IPG
    • GGF/GCE Interoperable Web Services Testbed
  • GridPort Toolkit: Contact: Mary Thomas (
    • HotPage User Portals
    • Downloads
      • GridPort Toolkit, NPACI HotPage, GCT Portal (frames based)
  • GGF/GCE website: