Grid computing using modern technologies
1 / 70

- PowerPoint PPT Presentation

  • Updated On :

Grid Computing Using Modern Technologies. A 3-Part Tutorial presented by: Mary Thomas Dennis Gannon Geoffrey Fox. Tutorial Outline. Part I: Mary Thomas (TACC/UT) Understanding Web and Grid portal technologies Building application portals with GridPort Grid Web services

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about '' - medwin

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Grid computing using modern technologies l.jpg

Grid Computing Using Modern Technologies

A 3-Part Tutorial presented by:

Mary Thomas

Dennis Gannon

Geoffrey Fox

Tutorial outline l.jpg
Tutorial Outline

  • Part I: Mary Thomas (TACC/UT)

    • Understanding Web and Grid portal technologies

    • Building application portals with GridPort

    • Grid Web services

  • Part II: Dennis Gannon (Indiana)

    • Distributed Software Components

    • Grid Web services and science portals for scientific applications

  • Part III: Geoffrey Fox (Indiana)

    • Integrating Peer-to-Peer networks and web services with the Grid

    • Grid Web services and Gateway

Introduction to developing web based grid computing portals l.jpg

Introduction to Developing Web-BasedGrid Computing Portals

Mary Thomas

Texas Advanced Computing Center

The University of Texas at Austin



Presented at GGF4, Toronto, Canada, Sunday, 2/15/02

Goals of part 1 l.jpg
Goals of Part 1

  • Introduce basic portal technologies and concepts

  • Provide enough knowledge to go out and begin process of evaluating/understanding technologies

  • Provide enough knowledge to build a computing portal based on GridPort/Perl/CGI

  • Not going to teach you how to install Grid software: assume you can do this is you have someone who takes care of this

  • Audience: application developers or scientists

Approach l.jpg

  • Introduce the concept of a portal for computational Grids and science applications

  • Use GridPort Toolkit to demonstrate the basic concepts needed to understand how to use the web and grid technologies

  • Show examples of how to program a portal

Outline l.jpg

  • Defining Portals and Web Technologies

  • The GridPort Portal Architecture

    • GridPort-Based Portals

    • Application Portals

    • Programming Example

  • Portal Services for Remote Clients

    • Client Portal Services

    • Grid Portal Web Services

Web server technologies l.jpg
Web Server Technologies

  • Web Servers:

    • Run on a machine, and clients access the process

    • Common Versions:

      • Netscape (

      • Apache ( - open source

  • OS’s: Windows, Unix, MacIntosh, Lynus, etc.

  • Web Programming Languages

    • Server: Java, Javascript, Python, PHP, Perl

    • Client: HTML, Javascript

    • Protocols: HTTP/CGI/Servlets/Applets

  • Security

    • HTTPS, SSL, Encryption

    • Cookies

    • Certificates

Web clients l.jpg
Web Clients

  • Multiple display devices:

    • Desktop workstations, PC’s, PDA’s, cell phones, pagers, other wireless devices, televisions

  • Various viewing tools:

    • Browsers: Internet Explorer, Netscape Navigator, Opera

    • Visually Impaired tools

  • OS’s: Windows/WinCE, Unix, Mac, Lynux, Palm, etc.

  • Web Programming Languages

    • HTML, Javascript

    • Perl, Java for ‘scrapers’

  • Security

    • HTTPS, SSL, Encryption, Cookies

    • Certificates

What is a portal l.jpg
What is a Portal?

  • Web sites that provide centralized access to a set of resources

  • Characterized by

    • Personalization

    • Security/authentication/authorization

    • What you see often changes based on what you are looking for (e.g.: adds)

    • Navigation/choices

  • Gateway for Web access to distributed resources/information

    • Hub from which users can locate all the Web content that they commonly need.

Classes of portals l.jpg
Classes of portals

  • Horizontal or “mega-portals”:

    • information from search engines and the ISP's (yahoo)

    • everybody comes in, sees the same thing

    • allow personalization to some degree

  • Vertical

    • portals that are customized by the system.

    • the system recognizes who you are, and gives you a different view of the university or the company that you're going to build.

    • More specialized (amazon, travelocity, etc.)

  • Intranet:

    • inside a company that give particular people the information that they need

Scientific web portals l.jpg
Scientific Web Portals

  • Differ from Commercial Portals (yahoo, amazon)

  • Types of Science Portals:

    • User Portals:

      • simplify user’s ability to interact with and utilize a complex, often distributed environment

      • direct access to resources (compute, data, archival, instruments, and information)

    • Application Interfaces

      • Enables scientists to conduct simulations on multiple resources

    • EOT Portals

      • Educates public (future scientists?) about science using software simulations, visualizations, etc

      • Learning tools

    • Individual Portals

      • Users can roll out their own portals by writing web pages using standard HTML or Perl/CGI

Why use portals for computational science l.jpg
Why Use Portals for Computational Science?

  • Computational science environment is complex:

    • Users have access to a variety of distributed resources (compute, storage, etc.).

    • Interfaces, OS’s, Grid tools to these resources vary and change often

    • Environment changes:

      • Relocation/upgrade of binaries

    • Policies at sites sometimes differ, allocations change

    • Using multiple resources can be cumbersome

  • Grid adds complexity for programmers

Portals provide simple interfaces l.jpg
Portals Provide Simple Interfaces

  • Portals are web based and that has advantages -

    • Users know & understand the web

  • Can serve as a layer in the middle-tier infrastructure of the Grid

    • Integrate various Grid services and resources

  • Users can be isolated from resource specific details

  • Single web interface isolates system changes/differences

  • Not and end-all solution - several issues/challenges here

    • Performance, scaleability

Virtual organizations l.jpg
Virtual Organizations

  • GGF Model, based on Foster paper

    • “Anatomy of the Grid”

  • Hierarchical tree based

  • Each ‘node’ represents collections of:

    • Compute resources

    • Projects,

    • Centers

  • HotPage portals represent VO’s


    • Like to build HiPCAT as a VO to run as DTF node

Virtual organizations hotpage l.jpg
Virtual Organizations (HotPage)











Portal toolkits l.jpg
Portal Toolkits

  • Commercial:

    • Sun: Java Servlets, iPlanet

    • IBM: WebSphere

    • MSFT: .NET

  • Special interest groups:

    • uPortal, Javaspeed

  • R&D within Grid community:

    • GridPort Toolkit (

    • GPDK ()

    • GirdSphere (refer to ACM site???)

    • Gateway (Fox)

    • CCA (Gannon)

  • Gridport toolkit design concepts l.jpg
    GridPort Toolkit Design Concepts

    • Key design goals:

      • Any site should be able to host a user portal

      • Any user should be able to create their own user portal if they have accounts and certificate

    • Key Requirements:

      • Base software design on infrastructure provided by World Wide Web:

        • use commodity technologies wherever possible

        • avoid shell programs/applications/applets

      • GridPort Toolkit should not require that additional services be run on the HPC Systems

        • reduce complexity -- there are enough of these already

        • so, leverage existing grid research & development

      • GSI certificate (PKI)

    Gridport designed for ease of use l.jpg
    GridPort Designed for Ease of Use

    • WWW interface is common, well understood, and pervasive

      • User Portals can be accessed by anyone who has access to a web browser, regardless of location

    • Users can construct customized application web pages:

      • only basic knowledge of HTML and Perl/CGI

    • Application programmers can extend the set of basic functions provided by the Toolkit

    • Portal services hosts can modify support services by adding/remove/modifying broker or grid interface codes

    Commodity web technologies l.jpg
    Commodity Web Technologies

    • Use of commodity web technologies -> Portability

      • contribute to a ‘plug-n-play’ grid

    • Requirements:

      • Any Browser: Communicator, IE

      • HTTP, HTTPS, SSL, HTML/JavaScript, Perl/CGI, SSH, FTP

      • Netscape or Apache servers

    • Based on simple technology, this software is easily ported to, and used by other sites.

      • easy to modify and adapt to local site policies and requirements

    • Goal is to design a toolkit that is simple to implement, support, port, and develop

    Grid technologies l.jpg
    Grid Technologies

    • Security:

      • Globus Grid Security Infrastructure (GSI), SSH

      • Myproxy for remote proxies

    • Job Execution:

      • Globus GRAM

    • Information:

      • Globus Grid Information Services (GIS)

    • File Management:

      • SDSC Storage Resource (SRB)

      • GridFTP

    Information services l.jpg
    Information Services

    • Designed to provide a user-oriented interface to NPACI Resources and Services

    • Consists of on-line documentation, static informational pages, and links to events within NPACI, including basic user information such as:

      • Documentation, training , news, consulting

      • Simple tools:

        • application search

        • systems information

        • generation of batch scripts for all compute resources

        • Network Weather System

    • No user authentication is required to view this information.

    Information services dynamic l.jpg
    Information Services: Dynamic

    • Dynamic information provided by automatic data collection scripts that provide real-time information for each machine (or summaries) such as:

      • Status Bar: displays live updates showing operational status and utilization of all NPACI resources

      • Machine Usage: displays summary of machine status, load, and batch queues

      • Batch Queues: displays currently executing and queued jobs

      • Node Maps: displays graphical map of how running applications are mapped to nodes

      • Network Weathering System: provides connectivity information between a user’s local host and grid resources

    • Pulled from 3 possible sources:

      • MDS, web services, local cron jobs

    Interactive sessions l.jpg
    Interactive Sessions

    • How do they work?

    • What do they do:

      • Job submission

      • File management

      • Authentication

    • What do we use to do them?

      • List of grid technologies

    Interactive sessions login logout l.jpg
    Interactive Sessions: Login/Logout

    • Login:

      • client browser connects to an HTTPS server

      • user enters valid NPACI Portal account ID

      • login into Portal using CA authentication (Globus):

        • Globus infrastructure manages user accounts on remote hosts

        • username used to map passphrase to valid key and certificate (local repository), or Myproxy (remote)

        • passphrase used to create proxy cert. using globus-proxy-init

        • if proxy-init successful, session key stored on client browser

        • data passed through web server over SSL channel

        • Session info stored in secure, local repository

    Interactive sessions login logout34 l.jpg
    Interactive Sessions: Login/Logout

    • Logout:

      • user automatically logged out

        • if logout selected

        • session times out

      • on logout:

        • active session data files cleared

        • relevant user information archived for future sessions stored

    Grid security at all layers l.jpg
    Grid Security at all Layers

    • GSI authentication for all portal services

      • transparent access to the grid via GSI infrastructure

      • Security between the client -> web server -> grid:

        • SSL/RC4-40 128 bit key/ SSL RSA X509 certificate

      • authentication tracked with cookies coupled to server data base/session tracking

    • Single login environment (another key goal)

      • provide access to all NPACI Resources where GSI available.

      • with full account access privileges for specific host

      • use client browser cookies to track state

    Portal accounts l.jpg
    Portal Accounts

    • Portal accounts are not the same as resource accounts.

      • valid Grid user on resource, need allocations

      • processes run under own account with same access and privileges as if they had logged onto resource

    • Portal users must have a digital certificate signed by a known Certificate Authority (CA)

      • And must get DN into mapfile

    • Accounts for NPACI users obtained via an on-line web form:

      • Can generate a certificate - certificate and key are placed in a secure repository

    Interactive sessions job execution l.jpg
    Interactive Sessions: Job Execution

    • Web server transactions:

      • confirm/authenticate user login status

      • parse command/request (CGI vars)

      • establish user environment

      • assemble remote command (Globus/SSH)

      • verify proxy (if Globus) or recreate

      • send command (e.g., Globus daemon on remote host)

      • parse, format, and return results to the web browser on the user’s workstation or store data (e.g., FTP).

    • While the user login is in the active state:

      • check for timeout

      • track current state

      • record information about job requests and user data for use in subsequent transactions or sessions.

    Gridport file system l.jpg
    GridPort File System

    • Without SRB capabilities, files are distributed

    • Adds to complexities when migrating and managing data

    Gridport srb architecture l.jpg
    GridPort + SRB Architecture

    • With SRB capabilities, file access is direct

    • Single SRB account access allows for more flexible data management

    Variety of gridport applications l.jpg
    Variety of GridPort Applications

    • Current applications in production:

      • NPACI/PACI HotPages (also @PACI/NCSA )


      • LAPK Portal: Pharmacokinetic Modeling (live demo of Pharmacokinetic Modeling Portal)


      • GAMESS (General Atomic and Molecular electronic Structure System)


      • Telescience (Ellisman)


      • Protein Data Bank CE Portal (Phil Bourne)


    Programming example job submit l.jpg
    Programming Example: Job Submit

    • Client:

      • Example of Client HTML page

      • HTML Code

    • Server:

      • Perl/CGI parser script running on server

      • GridPort Toolkit function code

    Jobsumbit html code l.jpg
    JobSumbit HTML Code

    <FORM action=""

    method=post enctype="application/x-www-form-urlencoded" name="job_submit">

    Arguments: <INPUT TYPE="text" NAME="args">

    Select Queue: <SELECT NAME="queue">

    <OPTION VALUE="low">low

    <OPTION VALUE="normal">normal

    <OPTION VALUE="high">high

    <OPTION VALUE="express">express


    Number of Cpu’s: <INPUT TYPE="text" NAME="cpus">

    Max Time (min): <INPUT TYPE="text" NAME="max_time">

    <INPUT TYPE="hidden" NAME="mach" VALUE="SSPHN">

    <INPUT TYPE="hidden" NAME="exe" VALUE="/rmount/paci/sdsc/mthomas/mpi_pi">

    <INPUT TYPE="submit" METHOD="post" ACTION="" >


    Jobsumbit server perl cgi parser l.jpg
    JobSumbit: Server Perl/CGI Parser

    • GRABS HTTP/CGI data and sends it to GridPort subroutine, waits for results


      use CGI qw(:all);

      my $query = new CGI;




      $MY_LOCATION = "tools/cgi-bin";

      $CURRENT_DIR = `pwd`;

      ($PORTAL_ROOT, $rest) = split(/$MY_LOCATION/, $CURRENT_DIR);

      $GLOBAL_VARS_CONFIG = $PORTAL_ROOT . "cgi-bin/global_vars.cgi";

      require "$GLOBAL_VARS_CONFIG";

      require "$PORTAL_HOME_DIR/cgi-bin/hotpage_authen.cgi";


    Jobsubmit server perl cgi code cont l.jpg
    JobSubmit: Server Perl/CGI code (cont.)

    # load in code to do job submission through globus

    require "$GRIDPORT_HOME_DIR/services/globus/cgi-bin/gridport_globus_job.cgi";

    # subroutines to get/set user directories (home,work, current) and do job handling

    require "$PORTAL_HOME_DIR/tools/cgi-bin/user_dirs.cgi";

    require "$PORTAL_HOME_DIR/tools/cgi-bin/user_jobs.cgi";

    my $args = $query->param(args);

    my $queue = $query->param(queue);

    my $cpus = $query->param(cpus);

    my $max_time = $query->param(max_time);

    $mach = $query->param(mach);

    my $exe = $query->param(exe);

    $exe = $exe . " $args";

    # run the command through Globus, trap output, return to caller process

    @output = gridport_globus_job_submit($mach,$cpus,60,$exe,$max_time,$queue);

    Gridport globus job submit l.jpg

    sub gridport_globus_job_submit {

    my @job = ();

    my $user = &get_username();

    ### get the input and set up globus

    my ($mach, $cpus, $timeout, $exe, $max_cpu_time, $queue) = @_;

    &globus_config($user); # verify data

    &mach_config($mach); # verify data

    #build the globus command

    my $globus_submit = "$globus_job_submit{$machines{$mach}{gv}} ";

    $globus_submit .= "$machines{$mach}{name}{job} -np $cpus -queue $queue ";

    $globus_submit .= "-maxtime $max_cpu_time $exe";

    @job = run_command_timeout($globus_submit, $timeout); # run job

    return @job; }

    Laboratory for applied pharmacokinetics l.jpg
    Laboratory for Applied Pharmacokinetics

    • (LAPK) Portal:

      • Users are Doctors, so need extremely simple interface

    • Must be portable – run from many countries

    • Need to hide details such as

      • Type of resources (T3E), file storage, batch script details, compilation,UNIX command line

    • Major Success:

      • LAPK users can now run multiple jobs at one time using portal.

      • Not possible before because developers had to keep codes & scripts simple enough for doctors to use on T3E

    Laboratory for applied pharmacokinetics49 l.jpg
    Laboratory for Applied Pharmacokinetics

    • Uses portal services/capabilities:

      • File upload/download between local host/portal/HPC systems

      • Job Submit:

        • submission (builds batch script, moves files to resource, submit jobs)

        • Job tracking: in the background portal tracks jobs on system and moves results back over to portal storage when done

        • Job cancel/delete

      • Job History: maintains relevant job information

    Portal services l.jpg
    Portal Services

    • How does one convert/modify existing applications?

      • Can develop your own version of GridPort

      • Can install GridPort or other toolkits

    • Remote clients are typically browsers accessing local portal (HotPage)

    • Application website located on ANY server:

      • either on local filespace/system where webserver and GridPort toolkit installed

      • Or, running on a remote machine

    • Want to allow remote users to have control over access, display, interactions, etc:

      • Need for a new service model

    Remote portal services l.jpg
    Remote Portal Services

    • Must be Grid based:

      • Critical to support GSI

    • 2 current solutions (GridPort)

      • GridPort Client Toolkit

        • Allows client to use simple HTML to build remote web pages; limited to HTML/FORMS, CGI/JSP model

      • Grid Portal Web Services:

        • supports variety of clients

          • Can be an application

          • can be a portal server program

          • Can be another web service

    Gridport client toolkit l.jpg
    GridPort Client Toolkit

    • Focus on medium/small applications and researchers

      • Not all app. Scientists can afford to hire web team

    • Base on simple protocols (HTTPS/CGI/Perl)

      • Could use applets or JSP

    • Connection to portal services is through the GCT:

      • GridPort Client Toolkit


      • Inherits all existing portal services running on portal

      • Limited job functions, but concept works and is needed

    • An Experiment in progress

      • not production yet

    Gridport client toolkit56 l.jpg
    GridPort Client Toolkit

    • Ease of use:

      • Do not have to install complex code to get started:

        • webservers, no Globus, no SSH, no SSL, no PKI, etc.

      • Do not have to write complex interface scripts to access these services (we’ve done that already)

      • Do not have to fund advanced web development teams

    • Client has local control over project, including filespace, etc.

    • Integration to existing portals can be done:

      • Bays to Estuaries project

    How does gct work l.jpg


    SDSC Repository

    Job Fail




    How Does GCT Work?

    FORM/CGI action







    Services implemented in gct l.jpg




    Check authentication state


    Sumbit jobs to queues

    Cancel jobs

    Execute commands (command like interface)


    Upload from local host

    Download to local host

    FTP – move FILE

    View Portal FILEpace (?)






    Services Implemented in GCT

    Basin bays to estuaries bbe portal l.jpg
    Basin, Bays to Estuaries (BBE) Portal

    • Community model: scientific portal for conducting multi-model Earth System Science (ESS):

      • Simulations are run to forecast the transport of sediments within the San Diego Bay area during a storm.

    • Technology developed for the BBE project:

      • Website located on BBE webserver/machine

      • Uses SRB for file management (GSI)

      • Perl/CGI

    • Uses GCT for all interactive functions:

      • minimal effort required to modify code

      • roughly 14 tests needed to integrate GCT

      • four new perl scripts required 

    Grid portals the problem l.jpg
    Grid Portals: the Problem

    • Example: portal or applications need to perform grid tasks for any arbitrary user, on any arbitrary resource, and span all ‘layers’ of the grid

      • portals must be ‘aware’ of resources (use GIS)

      • What grid services are running on that resources:

        • Globus/Legion/VegaGrid/SSH, etc

        • GIS

        • GSI/Kerberos, MyProxy

      • Request syntax differs for each resource:

        • GRAM/Legion/SSH/MAUI/PBS/Others

      • Portal must have permission to use/access for user (GSI, MyProxy)

    Grid portals complexity grows l.jpg
    Grid Portals: Complexity Grows

    • Growth of Grid presents a huge complexity problem for developers and systems that does not scale

    • Portals interact with/integrate all layers:

      • GIU/Client interface

      • Uses all middleware services (Globus, SRB, GSI-FTP, etc.)

      • Each portal in the world must store and configure same data

      • Repeated data, open to errors, variations

      • Multiple programmers repeating same tasks and implementations

      • Much portal software is “hard-coded” and not dynamic

    • The Grid is international, need for scaleable, interoperable services

    • Too much ‘hard-coding’ needed at this time (big issue for Portals)

    Web services a proposed solution l.jpg
    Web Services: a Proposed Solution

    • Web services architecture provides mechanisms for

      • dynamic service discovery (UDDI)

      • Separation of implementation from function (WSDL)

      • Know protocol (SOAP/HTTP, SOAP/RPC)

    • Service provider encapsulates implementation details

    • Client does not need to know details, just where to send the request

    • Challenge will be discovery  problem with Jini/CORBA

    • Commercial world developing web services technologies in P2P world:

      • Implies funding/support

      • rapid development/technology advancement

      • Caution: this does NOT imply cohesiveness or standards

    • Note: in some ways, Globus/GRAM is a web service

    • Advantage: language independent, so can run on any system

      • We are pursuing Perl, Python, Java, C++ at this time

    Proposed web service architecture l.jpg
    Proposed Web Service Architecture

    • Adopt W3C standards for:

      • WSDL 1.1: current standard (note: uses old XML version)

      • SOAP 1.1: over HTTP/HTTPS/GSI for authentication

        • Java 1.3 or greater

        • Python 2.0 or greater

      • XML schemas for language/description

      • Explore UDDI 2.0/WSIL

    • Require:

      • GSI

    • Adopt “anatomy of the Grid model”

      • virtual organizations

      • Portals be built as services in addition to applications

    Web services example job submit l.jpg














    (IBM SP)

    Change in workflow model: before, each step represents code residing on local portal.

    Web Services Example: Job Submit

    Slide68 l.jpg









    Grid Web


    Grid ServicesCollective and Resource Access



    Grid Protocols and Grid Security Infrastructure

    XML / SOAP over Grid Security Infrastructure

    Job Submission /


    Grid Protocols and Grid Security Infrastructure

    Discipline /



    (e.g. SDSCTeleScience)

    http, https


    File Transfer


    Data Management



    ProblemSolvingEnvironments(AVS, SciRun,Cactus)



    Web Browser



    Replica Catalog /








    Grid X.509CertificationAuthority

    • other services:

    • visualization

    • interface builders

    • collaboration tools

    • numerical gridgenerators

    • etc.



    Secure, ReliableGroup Comm.


    Python, Java, Perl, etc.,

    JSPs format to html

    CoG Kits implementing

    Web Services in servelets, servers, etc.

    Grid Web ServiceDescription (WSDL)

    & Discovery (UDDI)

    Apache Tomcat&WebSphere&Cold Fusion=JVM + servlet instantiation + routing

    Apache SOAP,.NET, etc.

    Gridport team l.jpg
    GridPort Team

    • GridPort Project represents collaboration efforts spanning TACC, SDSC, NPACI:

      • Mary Thomas, Rich Toscano (TACC)

      • Steve Mock, Maytal Dahan, Cathie Mills, Kurt Mueller (SDSC)

    • And input from other Institutions:

      • Argonne/ISI: Globus development team

      • NCSA/Alliance

      • NASA/IPG

      • GGF/GCE Interoperable Web Services Testbed

    References l.jpg





    • GridPort Toolkit: Contact: Mary Thomas ([email protected])


      • HotPage User Portals


      • Downloads


        • GridPort Toolkit, NPACI HotPage, GCT Portal (frames based)

    • GGF/GCE website: