goals n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Goals PowerPoint Presentation
Download Presentation
Goals

Loading in 2 Seconds...

play fullscreen
1 / 118

Goals - PowerPoint PPT Presentation


  • 140 Views
  • Uploaded on

Goals. Provide a comprehensive survey of the state of the art in Grids & Grid technologies Present concepts that can help categorize diverse Grid technologies and projects Review major Grid projects & contributors Explain relationships with major industrial technologies

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Goals' - clea


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
goals
Goals
  • Provide a comprehensive survey of the state of the art in Grids & Grid technologies
  • Present concepts that can help categorize diverse Grid technologies and projects
  • Review major Grid projects & contributors
  • Explain relationships with major industrial technologies
  • Introduce the activities of the Global Grid Forum
tutorial overview
Tutorial Overview
  • The opportunity
  • Requirements
  • The systems problem
  • The programming problem
  • Major Grid R&D projects
  • Grids and industry
  • Organization
  • Recap and conclusions
why grids
Why Grids?
  • A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour
  • 1,000 physicists worldwide pool resources for petaop analyses of petabytes of data
  • Civil engineers collaborate to design, execute, & analyze shake table experiments
  • Climate scientists visualize, annotate, & analyze terabyte simulation datasets
  • An emergency response team couples real time data, weather model, population data
why grids contd
Why Grids? (contd)
  • A multidisciplinary analysis in aerospace couples code and data in four companies
  • A home user invokes architectural design functions at an application service provider
  • An application service provider purchases cycles from compute cycle providers
  • Scientists working for a multinational soap company design a new product
  • A community group pools members’ PCs to analyze alternative designs for a local road
the fundamental concept
The Fundamental Concept

Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals—in the absence of central control, omniscience, trust relationships

why now
Why Now?
  • Moore’s law improvements in computing produce highly functional endsystems
  • The Internet and burgeoning wired and wireless provide universal connectivity
  • Changing modes of working and problem solving emphasize teamwork, computation
  • Network exponentials produce dramatic changes in geometry and geography
network exponentials
Network Exponentials
  • Network vs. computer performance
    • Computer speed doubles every 18 months
    • Network speed doubles every 9 months
    • Difference = order of magnitude per 5 years
  • 1986 to 2000
    • Computers: x 500
    • Networks: x 340,000
  • 2001 to 2010
    • Computers: x 60
    • Networks: x 4000

Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.

one view of requirements
One View of Requirements
  • Identity & authentication
  • Authorization & policy
  • Resource discovery
  • Resource characterization
  • Resource allocation
  • (Co-)reservation, workflow
  • Distributed algorithms
  • Remote data access
  • High-speed data transfer
  • Performance guarantees
  • Monitoring
  • Adaptation
  • Intrusion detection
  • Resource management
  • Accounting & payment
  • Fault management
  • System evolution
  • Etc.
  • Etc.
another view three obstacles to making grid computing routine

Programming Problem

Systems Problem

Another View: “Three Obstaclesto Making Grid Computing Routine”
  • New approaches to problem solving
    • Data Grids, distributed computing, peer-to-peer, collaboration grids, …
  • Structuring and writing programs
    • Abstractions, tools
  • Enabling resource sharing across distinct institutions
    • Resource discovery, access, reservation, allocation; authentication, authorization, policy; communication; fault detection and notification; …
resource
Resource
  • An entity that is to be shared
    • E.g., computers, storage, data, software
  • Does not have to be a physical entity
    • E.g., Condor pool, distributed file system, …
  • Defined in terms of interfaces, not devices
    • E.g. scheduler such as LSF and PBS define a compute resource
    • Open/close/read/write define access to a distributed file system, e.g. NFS, AFS, DFS
network protocol
Network Protocol
  • A formal description of message formats and a set of rules for message exchange
    • Rules may define sequence of message exchanges
    • Protocol may define state-change in endpoint, e.g., file system state change
  • Good protocols designed to do one thing
    • Protocols can be layered
  • Examples of protocols
    • IP, TCP, TLS (was SSL), HTTP, Kerberos
network enabled service

FTP Server

Web Server

HTTP Protocol

FTP Protocol

Telnet Protocol

TLS Protocol

TCP Protocol

TCP Protocol

IP Protocol

IP Protocol

Network Enabled Service
  • A protocol impln defining a set of capabilities
    • Protocol defines interaction with service
    • All services require protocols
    • Not all protocols are used to provide services (e.g. IP, TLS)
  • Examples: FTP and Web servers
application programmer interface
Application Programmer Interface
  • A specification for a set of routines to facilitate application development
    • Refers to definition, not implementation
    • E.g., there are many MPI implementations
  • Spec often language-specific (or IDL)
    • Routine name, number, order and type of arguments; mapping to language constructs
    • Behavior or function of routine
  • Examples
    • GSS API (security), MPI (message passing)
software development kit
Software Development Kit
  • A particular instantiation of an API
  • SDK consists of libraries and tools
    • Provides implementation of API specification
  • Can have multiple SDKs for an API
  • Examples of SDKs
    • MPICH, Motif Widgets
syntax
Syntax
  • Rules for encoding information, e.g.
    • XML, Condor ClassAds, Globus RSL
    • X.509 certificate format (RFC 2459)
    • Cryptographic Message Syntax (RFC 2630)
  • Distinct from protocols
    • One syntax may be used by many protocols (e.g., XML); & useful for other purposes
  • Syntaxes may be layered
    • E.g., Condor ClassAds -> XML -> ASCII
    • Important to understand layerings when comparing or evaluating syntaxes
a protocol can have multiple apis e g tcp ip
A Protocol can have Multiple APIsE.g., TCP/IP
  • TCP/IP APIs include BSD sockets, Winsock, System V streams, …
  • The protocol provides interoperability: programs using different APIs can exchange information
  • I don’t need to know remote user’s API

Application

Application

WinSock API

Berkeley Sockets API

TCP/IP Protocol: Reliable byte streams

an api can have multiple protocols e g message passing interface

Application

Application

MPI API

MPI API

LAM SDK

MPICH-P4 SDK

LAM protocol

MPICH-P4 protocol

Different message formats, exchange sequences, etc.

TCP/IP

TCP/IP

An API can have Multiple ProtocolsE.g., Message Passing Interface
  • MPI provides portability: any correct program compiles & runs on a platform
  • Does not provide interoperability: all processes must link against same SDK
    • E.g., MPICH and LAM versions of MPI
the systems problem resource sharing mechanisms that
The Systems Problem:Resource Sharing Mechanisms That …
  • Address security and policy concerns of resource owners and users
  • Are flexible enough to deal with many resource types and sharing modalities
  • Scale to large number of resources, many participants, many program components
  • Operate efficiently when dealing with large amounts of data & computation
aspects of the systems problem
Aspects of the Systems Problem
  • Need for interoperability when different groups want to share resources
    • Diverse components, policies, mechanisms
    • E.g., standard notions of identity, means of communication, resource descriptions
  • Need for shared infrastructure services to avoid repeated development, installation
    • E.g., one port/service/protocol for remote access to computing, not one per tool/appln
    • E.g., Certificate Authorities: expensive to run
  • A common need for protocols & services
hence a protocol oriented view of grid architecture that emphasises
Hence, a Protocol-Oriented View of Grid Architecture, that Emphasises …
  • Development of Grid protocols & services
    • Protocol-mediated access to remote resources
    • New services: e.g., resource brokering
    • “On the Grid” = speak Intergrid protocols
    • Mostly (extensions to) existing protocols
  • Development of Grid APIs & SDKs
    • Facilitate application development by supplying higher-level abstractions
  • The (hugely successful) model is the Internet
layered grid architecture by analogy to internet architecture

Application

Application

Internet Protocol Architecture

“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services

Collective

“Sharing single resources”: negotiating access, controlling use

Resource

“Talking to things”: communication (Internet protocols) & security

Connectivity

Transport

Internet

“Controlling things locally”: Access to, & control of, resources

Fabric

Link

Layered Grid Architecture(By Analogy to Internet Architecture)
grid architecture review
Grid Architecture Review
  • We now illustrate this architecture by describing a representative set of protocols
  • Several provided by Globus Toolkit, which
    • Defines, and provides quality reference implementations of key Grid protocols
    • Has been adopted as infrastructure by majority of major Grid projects
grid services architecture fabric layer protocols services
Grid Services Architecture:Fabric Layer Protocols & Services
  • Just what you would expect: the diverse mix of resources that may be shared
    • Individual computers, Condor pools, file systems, archives, metadata catalogs, networks, sensors, etc., etc.
  • Few constraints on low-level technology: connectivity and resource level protocols form the “neck in the hourglass”
  • Defined by interfaces not physical characteristics
grid services architecture connectivity layer protocols services
Grid Services Architecture:Connectivity Layer Protocols & Services
  • Communication
    • Internet protocols: IP, DNS, routing, etc.
  • Security: Grid Security Infrastructure (GSI)
    • Uniform authentication & authorization mechanisms in multi-institutional setting
    • Single sign-on, delegation, identity mapping
    • Public key technology, SSL, X.509, GSS-API
    • Supporting infrastructure: Certificate Authorities, key management, etc.

GSI: www.globus.org

why grid security is hard
Why Grid Security is Hard
  • Resources being used may be extremely valuable & the problems being solved extremely sensitive
  • Resources are often located in distinct administrative domains
    • Each resource may have own policies & procedures
  • The set of resources used by a single computation may be large, dynamic, and/or unpredictable
    • Not just client/server
  • It must be broadly available & applicable
    • Standard, well-tested, well-understood protocols
    • Integration with wide variety of tools
grid security requirements
Grid Security Requirements

User View

Resource Owner View

1) Specify local access control

2) Auditing, accounting, etc.

3) Integration w/ local systemKerberos, AFS, license mgr.

4) Protection from compromisedresources

1) Easy to use

2) Single sign-on

3) Run applicationsftp,ssh,MPI,Condor,Web,…

4) User based trust model

5) Proxies/agents (delegation)

Developer View

API/SDK with authentication, flexible message protection,

flexible communication, delegation, ...Direct calls to various security functions (e.g. GSS-API)Or security integrated into higher-level SDKs: E.g. GlobusIO, Condor-G, MPICH-G2, HDF5, etc.

grid security infrastructure gsi
Grid Security Infrastructure (GSI)
  • Extensions to existing standard protocols & APIs
    • Standards: SSL/TLS, X.509 & CA, GSS-API
    • Extensions for single sign-on and delegation
  • Globus Toolkit reference implementation of GSI
    • SSLeay/OpenSSL + GSS-API + delegation
    • Tools and services to interface to local security
      • Simple ACLs; SSLK5 & PKINIT for access to K5, AFS, etc.
    • Tools for credential management
      • Login, logout, etc.
      • Smartcards
      • MyProxy: Web portal login and delegation
      • K5cert: Automatic X.509 certificate creation
gsi in action create processes at a and b that communicate access files at c

Single sign-on via “grid-id”

& generation of proxy cred.

Or: retrieval of proxy cred.

from online repository

Remote process

creation requests*

GSI-enabled

GRAM server

Authorize

Map to local id

Create process

Generate credentials

Ditto

GSI-enabled

GRAM server

Process

Process

Communication*

Local id

Local id

Kerberos

ticket

Restricted

proxy

Remote file

access request*

Restricted

proxy

User Proxy

GSI-enabled

FTP server

Proxy

credential

Authorize

Map to local id

Access file

* With mutual authentication

GSI in Action: “Create Processes at A and B that Communicate & Access Files at C”

User

Site B

(Unix)

Site A

(Kerberos)

Computer

Computer

Site C

(Kerberos)

Storage

system

gsi working group documents
GSI Working Group Documents
  • Grid Security Infrastructure (GSI) Roadmap
    • Informational draft overview of working group activities and documents
  • Grid Security Protocols & Syntax
    • X.509 Proxy Certificates
    • X.509 Proxy Delegation Protocol
    • The GSI GSS-API Mechanism
  • Grid Security APIs
    • GSS-API Extensions for the Grid
    • GSI Shell API
gsi futures
GSI Futures
  • Scalability in numbers of users & resources
    • Credential management
    • Online credential repositories (“MyProxy”)
    • Account management
  • Authorization
    • Policy languages
    • Community authorization
  • Protection against compromised resources
    • Restricted delegation, smartcards
community authorization prototype shown august 2001

1. CAS request, with

user/group

CAS

resource names

membership

Does the

and operations

collective policy

resource/collective

authorize this

2. CAS reply, with

membership

request for this

capability

and resource CA info

user?

collective policy

information

Resource

3. Resource request,

authenticated with

Is this request

capability

authorized by

the

local policy

capability?

information

4. Resource reply

Is this request

authorized for

the CAS?

Community Authorization(Prototype shown August 2001)

User

Laura Pearlman, Steve Tuecke, Von Welch, others

grid services architecture resource layer protocols services
Grid Services Architecture:Resource Layer Protocols & Services
  • Grid Resource Access and Mgmt (GRAM)
    • Remote allocation, reservation, monitoring, control of compute resources
  • GridFTP protocol (FTP extensions)
    • High-performance data access & transport
  • Grid Resource Information Service (GRIS)
    • Access to structure & state information
  • Network reservation, monitoring, control
  • All integrated with GSI: authentication, authorization, policy, delegation
resource management problem
Resource Management Problem
  • Enabling secure, controlled remote access to computational resources and management of remote computation
    • Authentication and authorization
    • Resource discovery & characterization
    • Reservation and allocation
    • Computation monitoring and control
  • Addressed by new protocols & services
    • GRAM protocol as a basic building block
    • Resource brokering & co-allocation services
    • GSI for security, MDS for discovery
gram protocol
GRAM Protocol
  • Simple HTTP-based RPC
    • Job request
      • Returns a “job contact”: Opaque string that can be passed between clients, for access to job
    • Job cancel
    • Job status
    • Job signal
    • Event notification (callbacks) for state changes
      • Pending, active, done, failed, suspended
  • Moving to SOAP (more later…)
gridftp basic approach
GridFTP: Basic Approach
  • FTP is defined by several IETF RFCs
  • Start with most commonly used subset
    • Standard FTP: get/put etc., 3rd-party transfer
  • Implement standard but often unused features
    • GSS binding, extended directory listing, simple restart
  • Extend in various ways, while preserving interoperability with existing servers
    • Parameter set/negotiate, parallel transfers (multiple TCP streams), striped transfers (multiple hosts), partial file transfers, automatic & manual TCP buffer setting, progress monitoring, extended restart (via plug-ins)
grid information service discovery monitoring
Grid Information Service:Discovery & Monitoring
  • Provide access to static and dynamic information regarding system components
    • Large numbers of “sensors”, in resources, services, applications, etc.
  • A basis for configuration and adaptation in heterogeneous, dynamic environments
  • Requirements and characteristics
    • Uniform, flexible access to information
    • Scalable, efficient access to dynamic data
    • Access to multiple information sources
    • Decentralized maintenance
grid services architecture collective layer protocols services
Grid Services Architecture:Collective Layer Protocols & Services
  • Index servers aka metadirectory services
    • Custom views on dynamic resource collections assembled by a community
  • Resource brokers (e.g., Condor Matchmaker)
    • Resource discovery and allocation
  • Replica catalogs
  • Co-reservation and co-allocation services
  • Etc., etc.
the programming problem1
The Programming Problem
  • How the @#$@ do I develop robust, secure, long-lived applications for dynamic, heterogeneous, Grids?
  • I need, presumably:
    • Abstractions and models to add to speed/robustness/etc. of development
    • Tools to ease application development and diagnose common problems
    • Code/tool sharing to allow reuse of code components developed by others
grid programming technologies
Grid Programming Technologies
  • “Grid applications” are incredibly diverse (data, collaboration, computing, sensors, …)
    • Seems unlikely there is one solution
  • Most applications have been written “from scratch,” with or without Grid services
  • Application-specific libraries have been shown to provide significant benefits
  • No new language, programming model, etc., has yet emerged that transforms things
    • But certainly still quite possible
examples of grid programming technologies
Examples of GridProgramming Technologies
  • MPICH-G2: Grid-enabled message passing
  • CoG Kits, GridPort: Portal construction, based on N-tier architectures
  • GDMP, Data Grid Tools, SRB: replica management, collection management
  • Condor-G: simple workflow management
  • Legion: object models for Grid computing
  • Cactus: Grid-aware numerical solver framework
    • Note tremendous variety, application focus
mpich g2 a grid enabled mpi
MPICH-G2: A Grid-Enabled MPI
  • A complete implementation of the Message Passing Interface (MPI) for heterogeneous, wide area environments
    • Based on the Argonne MPICH implementation of MPI (Gropp and Lusk)
  • Globus services for authentication, resource allocation, executable staging, output, etc.
  • Programs run in wide area without change
  • See also: MetaMPI, PACX, STAMPI, MAGPIE

www.globus.org/mpi

grid based computation challenges
Grid-based Computation: Challenges
  • Locate “suitable” computers
  • Authenticate with appropriate sites
  • Allocate resources on those computers
  • Initiate computation on those computers
  • Configure those computations
  • Select “appropriate” communication methods
  • Compute with “suitable” algorithms
  • Access data files, return output
  • Respond “appropriately” to resource changes
mpich g2 use of grid services

Locates

hosts

MDS

Submits multiple jobs

globusrun

Stages

executables

GASS

Coordinates startup

DUROC

Authenticates

GRAM

GRAM

GRAM

Detects termination

Initiates job

fork

LSF

LoadLeveler

Monitors/controls

P2

P2

P2

P1

P1

P1

Communicates via vendor-MPI and TCP/IP (globus-io)

MPICH-G2 Use of Grid Services

% grid-proxy-init

% mpirun -np 256 myprog

Generates

resource specification

mpirun

cactus allen dramlitsch seidel shalf radke
Cactus(Allen, Dramlitsch, Seidel, Shalf, Radke)
  • Modular, portable framework for parallel, multidimensional simulations
  • Construct codes by linking
    • Small core (flesh): mgmt services
    • Selected modules (thorns): Numerical methods, grids & domain decomps, visualization and steering, etc.
  • Custom linking/configuration tools
  • Developed for astrophysics, but not astrophysics-specific

Thorns

Cactus “flesh”

www.cactuscode.org

cactus an application framework for dynamic grid computing
Cactus: An ApplicationFramework for Dynamic Grid Computing
  • Cactus thorns for active management of application behavior and resource use
  • Heterogeneous resources, e.g.:
    • Irregular decompositions
    • Variable halo for managing message size
    • Msg compression (comp/comm tradeoff)
    • Comms scheduling for comp/comm overlap
  • Dynamic resource behaviors/demands, e.g.:
    • Perf monitoring, contract violation detection
    • Dynamic resource discovery & migration
    • User notification and steering
cactus example terascale computing

Gig-E

100MB/sec

17

4

2

2

12

OC-12 line

But only 2.5MB/sec)

12

5

5

SDSC IBM SP

1024 procs

5x12x17 =1020

NCSA Origin Array

256+128+128

5x12x(4+2+2) =480

Cactus Example:Terascale Computing
  • Solved EEs for gravitational waves (real code)
    • Tightly coupled, communications required through derivatives
    • Must communicate 30MB/step between machines
    • Time step take 1.6 sec
  • Used 10 ghost zones along direction of machines: communicate every 10 steps
  • Compression/decomp. on all data passed in this direction
  • Achieved 70-80% scaling, ~200GF (only 14% scaling without tricks)
cactus example 2 migration in action

Running

At UC

3 successive

contract

violations

Resource

discovery

& migration

Running

At UIUC

Load

applied

(migration

time not to scale)

Cactus Example (2):Migration in Action
high throughput computing and condor
High-Throughput Computingand Condor
  • High-throughput computing
    • CPU cycles/day (week, month, year?) under non-ideal circumstances
    • “How many times can I run simulation X in a month using all available machines?”
  • Condor converts collections of distributively owned workstations and dedicated clusters into a distributed high-throughput computing facility
  • Emphasis on policy management and reliability
defining a dag

Job A

Job B

Job C

Job D

Defining a DAG
  • A DAG is defined by a .dag file, listing each of its nodes and their dependencies:

# diamond.dag

Job A a.sub

Job B b.sub

Job C c.sub

Job D d.sub

Parent A Child B C

Parent B C Child D

  • Each node runs the Condor job specified by its accompanying Condor submit file
example physics computation
Example: Physics Computation

2) Launch secondary job on WI pool; input files via Globus GASS

Master Condor job running at Caltech

Secondary Condor job on WI pool

5) Secondary reports complete to master

Caltech workstation

6) Master starts reconstruction jobs via Globus jobmanager on cluster

3) 100 Monte Carlo jobs on Wisconsin Condor pool

9) Reconstruction job reports complete to master

4) 100 data files transferred via GridFTP, ~ 1 GB each

7) GridFTP fetches data from UniTree

NCSA Linux cluster

NCSA UniTree - GridFTP-enabled FTP server

8) Processed objectivity database stored to UniTree

trace of a condor g physics run

Pre / Simulation Jobs / Post (UW Condor)

ooDigis at NCSA

ooHits at NCSA

Delay due to script error

Trace of a Condor-G Physics Run
object based approaches
Object-Based Approaches
  • Grid-enabled CORBA
    • NASA Lewis, Rutgers, ANL, others
    • CORBA wrappers for Grid protocols
    • Some initial successes
  • Legion
    • U.Virginia
    • Object models for Grid components (e.g., “vault”=storage, “host”=computer)
    • Some initial successes
portals
Portals
  • N-tier architectures enabling thin clients, with middle tiers using Grid functions
    • Thin clients = web browsers
    • Middle tier = e.g. Java Server Pages, with Java CoG Kit, GPDK, GridPort utilities
    • Bottom tier = various Grid resources
  • Numerous applications and projects, e.g.
    • Unicore, Gateway, Discover, Mississippi Computational Web Portal, NPACI Grid Port, Lattice Portal, Nimrod-G, Cactus, NASA IPG Launchpad, Grid Resource Broker, …
selected major grid projects1

g

g

g

g

g

g

Selected Major Grid Projects

New

New

New

New

New

selected major grid projects3

g

g

Selected Major Grid Projects

New

New

Also many technology R&D projects: e.g., Condor, NetSolve, Ninf, NWS

See also www.gridforum.org

a categorization
A Categorization
  • Applications
    • Apply Grid concepts within the context of a specific application discipline
  • Technologies
    • R&D developing generic Grid technologies
  • Infrastructure
    • Deployment of Grid services to support a community of users

Note that many projects share elements of all three categories

1 example application projects
(1) Example Application Projects
  • AstroGrid: astronomy, etc. (UK)
  • Earth Systems Grid: environment (US DOE)
  • EU DataGrid: physics, environment, etc. (EU)
  • EuroGrid: various (EU)
  • Fusion Collaboratory (US DOE)
  • GridLab: astrophysics, etc. (EU)
  • Grid Physics Network (US NSF)
  • MetaNEOS: numerical optimization (US NSF)
  • NEESgrid: civil engineering (US NSF)
  • Particle Physics Data Grid (US DOE)
earth system grid anl lbnl llnl ncar isi ornl
Earth System Grid(ANL, LBNL, LLNL, NCAR, ISI, ORNL)
  • Enable a distributed community [of thousands] to perform computationally intensive analyses on large climate datasets
  • Via
    • Creation of Data Grid supporting secure, high-performance remote access
    • “Smart data servers” supporting reduction and analyses
    • Integration with environmental data analysis systems, protocols, and thin clients

www.earthsystemgrid.org (soon)

earth system grid architecture
Earth System Grid Architecture

Attribute Specification

Replica Catalog

Metadata Catalog

Application

Multiple Locations

Logical Collection and Logical File Name

MDS

Selected

Replica

Replica

Selection

Performance

Information &

Predictions

GridFTP commands

NWS

Disk Cache

TapeLibrary

Disk Array

Disk Cache

Replica Location 1

Replica Location 2

Replica Location 3

grid communities applications data grids for high energy physics

~PBytes/sec

~100 MBytes/sec

Offline Processor Farm

~20 TIPS

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

~100 MBytes/sec

Online System

Tier 0

CERN Computer Centre

~622 Mbits/sec or Air Freight (deprecated)

Tier 1

FermiLab ~4 TIPS

France Regional Centre

Germany Regional Centre

Italy Regional Centre

~622 Mbits/sec

Tier 2

Tier2 Centre ~1 TIPS

Caltech ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

HPSS

HPSS

HPSS

HPSS

HPSS

~622 Mbits/sec

Institute ~0.25TIPS

Institute

Institute

Institute

Physics data cache

~1 MBytes/sec

1 TIPS is approximately 25,000

SpecInt95 equivalents

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server

Pentium II 300 MHz

Pentium II 300 MHz

Pentium II 300 MHz

Pentium II 300 MHz

Tier 4

Physicist workstations

Grid Communities & Applications:Data Grids for High Energy Physics

Image courtesy Harvey Newman, Caltech

grid physics network griphyn
Grid Physics Network (GriPhyN)

Enabling R&D for advanced data grid systems, focusing in particular on Virtual Data concept

ATLAS

CMS

LIGO

SDSS

www.griphyn.org; see also www.ppdg.net, www.eu-datagrid.org

the virtual data concept
The Virtual Data Concept

“[a virtual data grid enables] the definition and delivery of a potentially unlimited virtual space of data products derived from other data. In this virtual space, requests can be satisfied via direct retrieval of materialized products and/or computation, with local and global resource management, policy, and security constraints determining the strategy used.”

virtual data in action
Virtual Datain Action
  • Data request may
    • Access local data
    • Compute locally
    • Compute remotely
    • Access remote data
  • Scheduling subject to local & global policies
  • Local autonomy
griphyn ppdg data grid architecture

MCAT; GriPhyN catalogs

MDS

MDS

GDMP

DAGMAN, Kangaroo

GSI, CAS

Globus

GRAM

GridFTP; GRAM; SRM

GriPhyN/PPDGData Grid Architecture

Application

= initial solution is operational

DAG

Catalog Services

Monitoring

Planner

Info Services

DAG

Repl. Mgmt.

Executor

Policy/Security

Reliable Transfer

Service

Compute Resource

Storage Resource

Ewa Deelman, Mike Wilde

grid communities and applications mathematicians solve nug30
Grid Communities and Applications:Mathematicians Solve NUG30
  • Community=an informal collaboration of mathematicians and computer scientists
  • Condor-G delivers 3.46E8 CPU seconds in 7 days (peak 1009 processors) in U.S. and Italy (8 sites)
  • Solves NUG30 quadratic assignment problem
  • 14,5,28,24,1,3,16,15,
  • 10,9,21,2,4,29,25,22,
  • 13,26,17,30,6,20,19,
  • 8,18,7,27,12,11,23

www.mcs.anl.gov/metaneos: Argonne, Iowa, NWU, Wisconsin

grid communities and applications network for earthquake eng simulation
Grid Communities and Applications:Network for Earthquake Eng. Simulation
  • NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other
  • On-demand access to experiments, data streams, computing, archives, collaboration

www.neesgrid.org: Argonne, Michigan, NCSA, UIUC, USC

2 example technology r d projects
(2) Example Technology R&D Projects
  • Access Grid, CAVERNsoft collaboration tech
  • Condor
  • Globus Toolkit
  • Grid Application Dev. Software project
  • Legion
  • Network Weather Service
  • Portal Toolkits
  • Storage Resource Broker

And many many more …

access grid

Presenter

mic

Presenter

camera

Ambient mic

(tabletop)

Audience camera

Access Grid
  • High-end group work and collaboration technology
  • Grid services being used for discovery, configuration, authentication
  • O(50) systems deployed worldwide
  • Basis for SC’2001 SC Global event in Nov 2001
    • www.scglobal.org

www.mcs.anl.gov/fl/Accessgrid

condor
Condor
  • High-throughput computing platform for mapping many tasks to idle computers
  • Three major components
    • Scheduler manages pool(s) of computers
    • DAGman manages user task pools
    • Matchmaker schedules tasks to computers
  • Widely used for parameter studies, data analysis
  • Condor-G extensions support wide area execution in Grid environment

www.cs.wisc.edu/condor

slide81

personal

Condor

Globus Grid

your

workstation

600 Condor

jobs

LSF

PBS

glide-in jobs

Condor

Condor Pool

Friendly Condor Pool

slide82

Condor-G

User task pool (a DAG)

Super

computer

USER

Task submission API:

Add/delete task

Define dependency

Set costs

Cluster

Work

stations

Cycle

vendor

GRAM

Authenticate: GSI

Authorization: GSI

Stage executables: GASS

Monitor, control, report errors

Redirect stderr, stdout: GASS

Transfer results: GASS

Condor-G Agent

Manage task pool

Cache credentials

Locate, select resources

Manage computation

Detect, handle failure

Negotiate cost

Notify completion

GRIS

Monitor & publish state

of resource

Condor Daemon

Advertise resource

characteristics

Stage user executable

Checkpoint

Redirect system calls

www.cs.wisc.edu/condor

globus toolkit
Globus Toolkit
  • Globus Toolkit is the source of many of the protocols described in “Grid architecture”
  • Adopted by almost all major Grid projects worldwide as a source of infrastructure
  • Open source, open architecture framework encourages community development
  • Active R&D program continues to move technology forward
  • Developers at ANL, USC/ISI, NCSA, LBNL, and other institutions

www.globus.org

globus toolkit components include
Globus ToolkitComponents Include …
  • Core protocols and services
    • Grid Security Infrastructure
    • Grid Resource Access & Management
    • MDS information & monitoring
    • GridFTP data access & transfer
  • Other services
    • Community Authorization Service
    • DUROC co-allocation service
  • Other Data Grid technologies
    • Replica catalog, replica management service
globus applications and deployments
Globus Applications and Deployments
  • Application projects include
    • GriPhyN, PPDG, NEES, EU DataGrid, ESG, Fusion Collaboratory, etc., etc.
  • Infrastructure deployments include
    • DISCOM, NASA IPG, NSF TeraGrid, DOE Science Grid, EU DataGrid, etc., etc.
    • UK Grid Center, U.S. GRIDS Center
  • Technology projects include
    • Data Grids, Access Grid, Portals, CORBA, MPICH-G2, Condor-G, GrADS, etc., etc.
globus futures
Globus Futures
  • Numerous large projects are pushing hard on production deployment & application
    • Much will be learned in next 2 years!
  • Active R&D program, focused for example on
    • Security & policy for resource sharing
    • Flexible, high-perf., scalable data sharing
    • Integration with Web Services etc.
    • Programming models and tools
  • Community code development producing a true Open Grid Architecture
legion the grid as a single virtual machine
Legion:The Grid as a Single Virtual Machine
  • Traditional OS Services on grid, e.g., security, file system, process management
  • High-level Grid Services, e.g., scheduling, accounting, p-space studies, specialized application portals …
  • Resource Abstractions, e.g., queuing systems, special devices
  • Programming Model - objects, graphs, events

www.cs.virginia.edu/legion

3 infrastructure deployments
(3) Infrastructure Deployments
  • Institutional Grid deployments: deploying services and network infrastructure
    • DISCOM, IPG, TeraGrid, DOE Science Grid, DOD Grid, NEESgrid, ASCI (Netherlands)
  • International deployments: supporting international experiments and science
    • iVDGL, StarLight
  • Support centers
    • U.K. Grid Center
    • U.S. GRIDS Center
ipg milestone 3 large computing node completed 12 2000

Whitcomb

IPG Milestone 3:Large Computing NodeCompleted 12/2000

high-lift subsonicwind tunnel model

Glenn

Cleveland, OH

Ames

Moffett Field, CA

Langley

Hampton, VA

Sharp

OVERFLOW on IPGusingGlobus and MPICH-G2 for intra-problem, wide area communication

Lomax

512 node SGI Origin 2000

Application POC: Mohammad J. Djomehri

Slide courtesy Bill Johnston, LBNL & NASA

targeted starlight optical network connections

NW Univ (Chicago) StarLight Hub

UIC

Atlanta

I-WIRE

Chicago Cross connect

Ill Inst of Tech

ANL

Univ of Chicago

Indianapolis (Abilene NOC)

St Louis

GigaPoP

AMPATH

NCSA/UIUC

Targeted StarLightOptical Network Connections

CERN

Asia-Pacific

SURFnet

CA*net4

Vancouver

Seattle

NTON

Portland

U Wisconsin

San Francisco

NYC

Chicago

PSC

NTON

IU

NCSA

Asia-Pacific

DTF 40Gb

Los Angeles

Atlanta

San Diego

(SDSC)

AMPATH

www.startap.net

proposed 13 6 tf linux teragrid computing at 40 gb s
Proposed 13.6 TF Linux TeraGrid:Computing at 40 Gb/s

Site Resources

Site Resources

26

HPSS

HPSS

4

24

External Networks

External Networks

8

5

Caltech

Argonne

External Networks

External Networks

NCSA/PACI

8 TF

240 TB

SDSC

4.1 TF

225 TB

Site Resources

Site Resources

HPSS

UniTree

ivdgl
iVDGL
  • International Virtual-Data Grid Laboratory
    • A place to conduct Data Grid tests at scale
    • Concrete manifestation of world-wide grid activity
    • Continuing activity that will drive Grid awareness
    • A basis for further funding
  • Scale of effort
    • For national, intl scale Data Grid tests, operations
    • Computationally and data intensive computing
    • Fast networks
  • Who
    • Initially US-UK-EU; Japan, Australia
    • Other world regions later; discussions with Russia, China, Pakistan, India, South America

www.ivdgl.org (soon)

ivdgl map circa 2003 2004

Tier0/1 facility

Tier2 facility

Tier3 facility

10 Gbps link

2.5 Gbps link

622 Mbps link

Other link

iVDGL Map Circa 2003-2004
ivdgl as a laboratory
iVDGL as a Laboratory
  • Grid Exercises
    • “Easy” intra-experiment (10-20%, national, transatlantic) – first
    • “Harder” wide-scale (50-100% of all resources)
  • Local control of resources vitally important
    • Experiments, politics demand it
  • Strong interest from other disciplines
    • Virtual Observatory community in Europe/US
    • Gravity wave community in Europe/US/(Japan?)
    • Earthquake engineering, bioinformatics
    • Computer scientists (wide scale tests)
u s grids center
U.S. GRIDS Center
  • GRIDS = Grid Research, Integration, Deployment, & Support
  • NSF-funded center to provide
    • State-of-the-art middleware infrastructure to support national-scale collaborative science and engineering
    • Integration platform for experimental middleware technologies
  • ISI, NCSA, SDSC, UC, UW + commercial partners

www.grids-center.org

grids and industry1
Grids and Industry
  • Grid concepts (flexible, controlled sharing) are directly relevant to industrial concerns, e.g.
    • Application service providers (computing on demand, share computing + data)
    • Internet/distributed computing: pool CPUs across Intranet or Internet
    • Peer-to-Peer: controlling what resources are used for
    • Distributed computing for resource sharing within or across organizations
examples of self styled grid companies
Examples of[Self-styled] Grid Companies
  • Avaki
    • Legion technology
  • Entropia
    • Harness idle commodity desktop systems
  • Insors
    • Access Grid technology
  • Platform
    • LSF: a distributed scheduler
  • Sun
    • Sun Grid Engine: a distributed scheduler
grid communities and applications home computers evaluate aids drugs
Grid Communities and Applications:Home Computers Evaluate AIDS Drugs
  • Community =
    • 1000s of home computer users
    • Philanthropic computing vendor (Entropia)
    • Research group (Scripps)
  • Common goal= advance AIDS research
relationships
Relationships
  • Grid technologies are complementary to other distributed computing technologies
    • Additive, not competitive
  • To date, have addressed primarily systems issues of interoperability and sharing
    • Need to integrate with tools that address programming, workflow, modeling issues
    • Ideally, also integrate with other “systems” technologies
  • Integration with other technologies critical
future direction grid services
Future Direction: Grid Services?
  • Core Grid protocols predate Web Services
    • Various substrates: HTTP, LDAP, FTP
  • Possible future: retarget to Web Services, as part of ongoing redesign
    • SOAP for all protocols, WSDL, UDDI
    • “Grid Services” layer provides soft-state, secure management of remote context
    • Other Grid protocols build on this substrate
  • Grid Services enhance Web Services with key Grid (“peer-to-peer”) capabilities
grid events
Grid Events
  • Global Grid Forum: working meeting
    • Meets 3 times/year, alternates U.S.-Europe, with July meeting as major event
  • HPDC: major academic conference
    • HPDC’10 in San Francisco, Aug 6-9, 2001
    • HPDC’11 in Scotland with GGF’8, July 2002
  • Other meetings include
    • CCGrid, EuroGlobus, Globus Retreat (next Aug 9-10, 2001), Grid’2001 (with SC’2001)

www.gridforum.org, www.hpdc.org

global grid forum history
“Global Grid Forum”: History

1988

1999

2000

GF BOF (Orlando)

GGF-1

GF1 (San Jose — NASA Ames)

GF2 (Chicago — iCAIR)

eGrid and GF BOFs (Portland)

GF3(San Diego — SDSC)

Global GF 1

(Amsterdam)

eGrid1(Posnan — PSNC)

GF4 (Redmond — Microsoft)

Asia-Pacific GF Planning (Yokohama)

eGrid2 (Munich — at Europar)

GF5 (Boston — Sun)

Global GF BOF (Dallas)

global grid forum 1 march 2000 u s and european gfs merge
Global Grid Forum 1 (March 2000)U.S. and European GFs Merge
  • 325 Participants
    • Capped by facilities
    • 60 late-registrants turned away
  • 192 Organizations
    • Previous high: 110 at GF-5
  • 28 Countries
    • Previous high: 11 at GF-5
  • 85 Document Authors

Next meeting: GGF2, Washington DC, July 2001: www.gridforum.org

a little history u s perspective
A Little History(U.S. Perspective)
  • Early 90s
    • Gigabit testbeds, metacomputing
  • Mid to late 90s
    • Early experiments (e.g., I-WAY), software projects (e.g., Globus), application experiments
  • 2001
    • Major application communities emerging
    • Major infrastructure deployments are underway
    • Rich technology base has been constructed
    • Global Grid Forum: >1000 people on mailing lists, 192 orgs at last meeting, 28 countries
major application communities are emerging
“Major Application Communities are Emerging”
  • Intellectual buy-in, commitment
    • Earthquake engineering: NEESgrid
    • Exp. physics, etc.: GriPhyN, PPDG, EU Data Grid
    • Simulation: Earth System Grid, Astrophysical Sim. Collaboratory
    • Collaboration: Access Grid
  • Emerging, e.g.
    • Bioinformatics Grids
    • National Virtual Observatory
major infrastructure deployments are underway
“Major Infrastructure Deployments are Underway”
  • Projects well under way
    • NSF “National Technology Grid”
    • NASA “Information Power Grid”
    • DOE ASCI DISCOM Grid
  • On the drawing board
    • DOE Science Grid
    • NSF Distributed Terascale Facility (“TeraGrid”)
    • DOD MOD Grid
a rich technology base has been constructed
“A Rich Technology Basehas been Constructed”
  • 6+ years of R&D have produced a substantial code base based on open architecture principles: esp. the Globus Toolkit, including
    • Grid Security Infrastructure
    • Resource directory and discovery services
    • Secure remote resource access
    • Data Grid protocols, services, and tools
  • Essentially all U.S. projects have adopted this as a common suite of protocols & services
  • Enabling wide range of higher-level services
the future all software is network centric
The Future:All Software is Network-Centric
  • We don’t build or buy computers anymore, we borrow or lease them (when I walk into a room, need to solve a problem, need to communicate)
  • A “computer” is a dynamically, often collaboratively constructed collection of processors, data sources, sensors, networks (similar observations apply for software!)
and thus
And Thus …
  • Reduced barriers to access mean that we do much more computing, and more interesting computing, than today => Many more components (& services); massive parallelism
  • All resources are owned by others => Sharing (for fun or profit) is fundamental; trust, policy, negotiation, payment
  • All computing is performed on unfamiliar systems => Dynamic behaviors, discovery, adaptivity, failure
in closing challenges to consider
In Closing: Challenges to Consider
  • Creating an new class of infrastructure
    • Stimulate rapid uptake across many disciplines
    • Programming tools as well as infrastructure
    • Where are the skilled people to build the Grid?
  • Involving industry
    • Impact amplified greatly if industry is engaged
    • Balancing proprietary vs. open arch concerns
  • Managing expectations
    • The Internet took 20 years from research lab to global phenomenon
summary
Summary
  • Grid computing concepts are real: they underlie a variety of major projects in physics, other sciences, and industry
  • Promise is that by allowing coordinating sharing, enable quantitative & qualitative changes in problem solving techniques
  • Clear picture of Grid architecture emerging
  • Major opportunities to create a new class of common infrastructure
  • Many challenges lie ahead!!
acknowledgments
Acknowledgments
  • Some material based on talks & tutorials previously developed with Carl Kesselman and Steve Tuecke
  • Thanks also to Paul Avery, Charlie Catlett, Dennis Gannon, Marty Humphrey, Bill Johnston, and others for material
for more information
For More Information
  • Book (Morgan Kaufman)
    • www.mkp.com/grids
  • Perspective on Grids
    • “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, IJSA, 2001
    • www.globus.org/research/papers/anatomy.pdf