grid computing n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Grid Computing PowerPoint Presentation
Download Presentation
Grid Computing

Loading in 2 Seconds...

play fullscreen
1 / 97

Grid Computing - PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on

Grid Computing. CSCE 590 Special Topics in Linux. The Computataional Grid. Emerging computational and networking infrastructure pervasive, uniform, and reliable access to remote data, computational, sensor, and human resources

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Grid Computing' - vachel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
grid computing

Grid Computing

CSCE 590 Special Topics in Linux

the computataional grid
The Computataional Grid
  • Emerging computational and networking infrastructure
    • pervasive, uniform, and reliable access to remote data, computational, sensor, and human resources
  • Enable entirely new approaches to applications and problem solving
    • remote resources the rule, not the exception
  • Wide-area distributed computing
    • national and international
what can you do with one
What Can You Do with One?
  • Combine dozens of supercomputers to solve a single problem
  • Link realtime satellite data feeds with distributed computational and display systems
  • Enable schools across the country to participate in interactive simulations and data analysis
  • Interactively combine the output of many independent servers to analyze a new genome
  • Build a network of immersive virtual reality sites to collaboratively design a new vehicle
example aeronautic design
Example: Aeronautic Design

Collaboration

Simulation

Instrumentation

Design data

why now
Why Now?
  • The Internet as infrastructure
    • Increasing bandwidth, advanced services
  • Advances in storage capacity
    • Terabytes, petabytes per site
  • Increased availability of compute resources
    • clusters, supercomputers, etc.
  • Advanced applications
    • simulation based design, advanced scientific instruments, ...
today s information infrastructure
Today’s Information Infrastructure

O(106) nodes

  • Network-centric: simple, fixed end systems; few embedded capabilities; few services; no user-level quality of service
tomorrow s infrastructure not just faster and more reliable
Tomorrow’s Infrastructure:Not Just “Faster and More Reliable”

O(109) nodes

  • Application-centric: heterogeneous, mobile end-systems; many embedded capabilities; rich services; user-level quality of service

Caching

Resource

Discovery

QoS

grid services architecture

Distributed

computing

Collab.

design

Remote

control

Application

Toolkit Layer

Data-

intensive

Remote

viz

Information

Resource mgmt

. . .

Grid Services

Layer

Security

Data access

Fault detection

Transport

. . .

Multicast

Grid Fabric

Layer

Instrumentation

Control interfaces

QoS mechanisms

Grid Services Architecture

High-energy

physics data

analysis

Collaborative

engineering

On-line

instrumentation

Applications

Regional

climate studies

Parameter

studies

grid services middleware
Grid Services (“Middleware”)
  • Standard services that
    • Provide uniform, high-level access to a wide range of resources (including networks)
    • Address interdomain issues of security, policy, etc.
    • Permit application-level management and monitoring of end-to-end performance
  • Middleware-level and higher-level APIs and tools targeted at application programmers
    • Map between application and Grid
emmerging production grids
Emmerging Production Grids

NASA Information

Power Grid

PACI Grid

additional information
Additional Information
  • text------
  • Other sites:
    • Globus web site:
      • http://www.globus.org
    • Grid forum web site:
      • http://www.gridforum.org
the grid problem
The Grid Problem
  • Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource

From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”

  • Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of…
    • central location,
    • central control,
    • omniscience,
    • existing trust relationships.
elements of the problem
Elements of the Problem
  • Resource sharing
    • Computers, storage, sensors, networks, …
    • Sharing always conditional: issues of trust, policy, negotiation, payment, …
  • Coordinated problem solving
    • Beyond client-server: distributed data analysis, computation, collaboration, …
  • Dynamic, multi-institutional virtual orgs
    • Community overlays on classic org structures
    • Large or small, static or dynamic
online access to scientific instruments
Online Access to Scientific Instruments

Advanced Photon Source

wide-area

dissemination

desktop & VR clients with shared controls

real-time

collection

archival

storage

tomographic reconstruction

DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago

data grids for high energy physics

~PBytes/sec

~100 MBytes/sec

Offline Processor Farm

~20 TIPS

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

~100 MBytes/sec

Online System

Tier 0

CERN Computer Centre

~622 Mbits/sec or Air Freight (deprecated)

Tier 1

France Regional Centre

Germany Regional Centre

Italy Regional Centre

FermiLab ~4 TIPS

~622 Mbits/sec

Tier 2

Tier2 Centre ~1 TIPS

Caltech ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

HPSS

HPSS

HPSS

HPSS

HPSS

~622 Mbits/sec

Institute ~0.25TIPS

Institute

Institute

Institute

Physics data cache

~1 MBytes/sec

1 TIPS is approximately 25,000

SpecInt95 equivalents

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server

Pentium II 300 MHz

Pentium II 300 MHz

Pentium II 300 MHz

Pentium II 300 MHz

Tier 4

Physicist workstations

Data Grids forHigh Energy Physics

Image courtesy Harvey Newman, Caltech

mathematicians solve nug30
Mathematicians Solve NUG30
  • Looking for the solution to the NUG30 quadratic assignment problem
  • An informal collaboration of mathematicians and computer scientists
  • Condor-G delivered 3.46E8 CPU seconds in 7 days (peak 1009 processors) in U.S. and Italy (8 sites)
  • 14,5,28,24,1,3,16,15,
  • 10,9,21,2,4,29,25,22,
  • 13,26,17,30,6,20,19,
  • 8,18,7,27,12,11,23

MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin

network for earthquake engineering simulation
Network for EarthquakeEngineering Simulation
  • NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other
  • On-demand access to experiments, data streams, computing, archives, collaboration

NEESgrid: Argonne, Michigan, NCSA, UIUC, USC

home computers evaluate aids drugs
Home ComputersEvaluate AIDS Drugs
  • Community =
    • 1000s of home computer users
    • Philanthropic computing vendor (Entropia)
    • Research group (Scripps)
  • Common goal= advance AIDS research
broader context
Broader Context
  • “Grid Computing” has much in common with major industrial thrusts
    • Business-to-business, Peer-to-peer, Application Service Providers, Storage Service Providers, Distributed Computing, Internet Computing…
  • Sharing issues not adequately addressed by existing technologies
    • Complicated requirements: “run program X at site Y subject to community policy P, providing access to data at Z according to policy Q”
    • High performance: unique demands of advanced & high-performance systems
why now1
Why Now?
  • Moore’s law improvements in computing produce highly functional endsystems
  • The Internet and burgeoning wired and wireless provide universal connectivity
  • Changing modes of working and problem solving emphasize teamwork, computation
  • Network exponentials produce dramatic changes in geometry and geography
network exponentials
Network Exponentials
  • Network vs. computer performance
    • Computer speed doubles every 18 months
    • Network speed doubles every 9 months
    • Difference = order of magnitude per 5 years
  • 1986 to 2000
    • Computers: x 500
    • Networks: x 340,000
  • 2001 to 2010
    • Computers: x 60
    • Networks: x 4000

Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.

the globus project making grid computing a reality
The Globus Project™Making Grid computing a reality
  • Close collaboration with real Grid projects in science and industry
  • Development and promotion of standard Grid protocols to enable interoperability and shared infrastructure
  • Development and promotion of standard Grid software APIs and SDKs to enable portability and code sharing
  • The Globus Toolkit™: Open source, reference software base for building grid infrastructure and applications
  • Global Grid Forum: Development of standard protocols and APIs for Grid computing
selected major grid projects1

g

g

g

g

g

g

Selected Major Grid Projects

New

New

New

New

New

selected major grid projects3

g

g

Selected Major Grid Projects

New

New

Also many technology R&D projects: e.g., Condor, NetSolve, Ninf, NWS

See also www.gridforum.org

the 13 6 tf teragrid computing at 40 gb s
The 13.6 TF TeraGrid:Computing at 40 Gb/s

Site Resources

Site Resources

26

HPSS

HPSS

4

24

External Networks

External Networks

8

5

Caltech

Argonne

External Networks

External Networks

NCSA/PACI

8 TF

240 TB

SDSC

4.1 TF

225 TB

Site Resources

Site Resources

HPSS

UniTree

TeraGrid/DTF: NCSA, SDSC, Caltech, Argonne www.teragrid.org

ivdgl international virtual data grid laboratory

Tier0/1 facility

Tier2 facility

Tier3 facility

10 Gbps link

2.5 Gbps link

622 Mbps link

Other link

iVDGL:International Virtual Data Grid Laboratory

U.S. PIs: Avery, Foster, Gardner, Newman, Szalay www.ivdgl.org

some important definitions
Some Important Definitions
  • Resource
  • Network protocol
  • Network enabled service
  • Application Programmer Interface (API)
  • Software Development Kit (SDK)
  • Syntax
  • Not discussed, but important: policies
resource
Resource
  • An entity that is to be shared
    • E.g., computers, storage, data, software
  • Does not have to be a physical entity
    • E.g., Condor pool, distributed file system, …
  • Defined in terms of interfaces, not devices
    • E.g. scheduler define a compute resource
    • Open/close/read/write define access to a distributed file system, e.g. NFS, AFS, DFS
network protocol
Network Protocol
  • A formal description of message formats and a set of rules for message exchange
    • Rules may define sequence of message exchanges
    • Protocol may define state-change in endpoint, e.g., file system state change
  • Good protocols designed to do one thing
    • Protocols can be layered
  • Examples of protocols
    • IP, TCP, TLS (was SSL), HTTP, Kerberos
network enabled services

FTP Server

Web Server

HTTP Protocol

FTP Protocol

Telnet Protocol

TLS Protocol

TCP Protocol

TCP Protocol

IP Protocol

IP Protocol

Network Enabled Services
  • Implementation of a protocol that defines a set of capabilities
    • Protocol defines interaction with service
    • All services require protocols
    • Not all protocols are used to provide services (e.g. IP, TLS)
  • Examples: FTP and Web servers
application programming interface
Application Programming Interface
  • A specification for a set of routines to facilitate application development
    • Refers to definition, not implementation
    • E.g., there are many implementations of MPI
  • Spec often language-specific (or IDL)
    • Routine name, number, order and type of arguments; mapping to language constructs
    • Behavior or function of routine
  • Examples
    • GSS API (security), MPI (message passing)
software development kit
Software Development Kit
  • A particular instantiation of an API
  • SDK consists of libraries and tools
    • Provides implementation of API specification
  • Can have multiple SDKs for an API
  • Examples of SDKs
    • MPICH, Motif Widgets
syntax
Syntax
  • Rules for encoding information, e.g.
    • XML, Condor ClassAds, Globus RSL
    • X.509 certificate format (RFC 2459)
    • Cryptographic Message Syntax (RFC 2630)
  • Distinct from protocols
    • One syntax may be used by many protocols (e.g., XML); & useful for other purposes
  • Syntaxes may be layered
    • E.g., Condor ClassAds -> XML -> ASCII
    • Important to understand layerings when comparing or evaluating syntaxes
a protocol can have multiple apis
A Protocol can have Multiple APIs
  • TCP/IP APIs include BSD sockets, Winsock, System V streams, …
  • The protocol provides interoperability: programs using different APIs can exchange information
  • I don’t need to know remote user’s API

Application

Application

WinSock API

Berkeley Sockets API

TCP/IP Protocol: Reliable byte streams

an api can have multiple protocols

Application

Application

MPI API

MPI API

LAM SDK

MPICH-P4 SDK

LAM protocol

MPICH-P4 protocol

Different message formats, exchange sequences, etc.

TCP/IP

TCP/IP

An API can have Multiple Protocols
  • MPI provides portability: any correct program compiles & runs on a platform
  • Does not provide interoperability: all processes must link against same SDK
    • E.g., MPICH and LAM versions of MPI
apis and protocols are both important
APIs and Protocols are Both Important
  • Standard APIs/SDKs are important
    • They enable application portability
    • But w/o standard protocols, interoperability is hard (every SDK speaks every protocol?)
  • Standard protocols are important
    • Enable cross-site interoperability
    • Enable shared infrastructure
    • But w/o standard APIs/SDKs, application portability is hard (different platforms access protocols in different ways)
why discuss architecture
Why Discuss Architecture?
  • Descriptive
    • Provide a common vocabulary for use when describing Grid systems
  • Guidance
    • Identify key areas in which services are required
  • Prescriptive
    • Define standard “Intergrid” protocols and APIs to facilitate creation of interoperable Grid systems and portable applications
one view of requirements
One View of Requirements
  • Adaptation
  • Intrusion detection
  • Resource management
  • Accounting & payment
  • Fault management
  • System evolution
  • Etc.
  • Etc.
  • Identity & authentication
  • Authorization & policy
  • Resource discovery
  • Resource characterization
  • Resource allocation
  • (Co-)reservation, workflow
  • Distributed algorithms
  • Remote data access
  • High-speed data transfer
  • Performance guarantees
  • Monitoring
another view three obstacles to making grid computing routine
Another View: “Three Obstaclesto Making Grid Computing Routine”
  • New approaches to problem solving
    • Data Grids, distributed computing, peer-to-peer, collaboration grids, …
  • Structuring and writing programs
    • Abstractions, tools
  • Enabling resource sharing across distinct institutions
    • Resource discovery, access, reservation, allocation; authentication, authorization, policy; communication; fault detection and notification; …
the systems problem resource sharing mechanisms that
The Systems Problem:Resource Sharing Mechanisms That …
  • Address security and policy concerns of resource owners and users
  • Are flexible enough to deal with many resource types and sharing modalities
  • Scale to large number of resources, many participants, many program components
  • Operate efficiently when dealing with large amounts of data & computation
aspects of the systems problem
Aspects of the Systems Problem
  • Need for interoperability when different groups want to share resources
    • Diverse components, policies, mechanisms
    • E.g., standard notions of identity, means of communication, resource descriptions
  • Need for shared infrastructure services to avoid repeated development, installation
    • E.g., one port/service/protocol for remote access to computing, not one per tool/appln
    • E.g., Certificate Authorities: expensive to run
  • A common need for protocols & services
hence a protocol oriented view of grid architecture that emphasizes
Hence, a Protocol-Oriented Viewof Grid Architecture, that Emphasizes …
  • Development of Grid protocols & services
    • Protocol-mediated access to remote resources
    • New services: e.g., resource brokering
    • “On the Grid” = speak Intergrid protocols
    • Mostly (extensions to) existing protocols
  • Development of Grid APIs & SDKs
    • Interfaces to Grid protocols & services
    • Facilitate application development by supplying higher-level abstractions
  • The (hugely successful) model is the Internet
layered grid architecture by analogy to internet architecture

Application

Application

Internet Protocol Architecture

“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services

Collective

“Sharing single resources”: negotiating access, controlling use

Resource

“Talking to things”: communication (Internet protocols) & security

Connectivity

Transport

Internet

“Controlling things locally”: Access to, & control of, resources

Fabric

Link

Layered Grid Architecture(By Analogy to Internet Architecture)
protocols services and apis occur at each level
Protocols, Services,and APIs Occur at Each Level

Applications

Languages/Frameworks

Collective Service APIs and SDKs

Collective Service Protocols

Collective Services

Resource APIs and SDKs

Resource Service Protocols

Resource Services

Connectivity APIs

Connectivity Protocols

Local Access APIs and Protocols

Fabric Layer

important points
Important Points
  • Built on Internet protocols & services
    • Communication, routing, name resolution, etc.
  • “Layering” here is conceptual, does not imply constraints on who can call what
    • Protocols/services/APIs/SDKs will, ideally, be largely self-contained
    • Some things are fundamental: e.g., communication and security
    • But, advantageous for higher-level functions to use common lower-level functions
the hourglass model
The Hourglass Model

A p p l i c a t i o n s

  • Focus on architecture issues
    • Propose set of core services as basic infrastructure
    • Use to construct high-level, domain-specific solutions
  • Design principles
    • Keep participation cost low
    • Enable local control
    • Support for adaptation
    • “IP hourglass” model

Diverse global services

Core

services

Local OS

where are we with architecture
Where Are We With Architecture?
  • No “official” standards exist
  • But:
    • Globus Toolkit™ has emerged as the de facto standard for several important Connectivity, Resource, and Collective protocols
    • Technical specifications are being developed for architecture elements: e.g., security, data, resource management, information
    • Internet drafts submitted in security area
fabric layer protocols services
Fabric LayerProtocols & Services
  • Just what you would expect: the diverse mix of resources that may be shared
    • Individual computers, Condor pools, file systems, archives, metadata catalogs, networks, sensors, etc., etc.
  • Few constraints on low-level technology: connectivity and resource level protocols form the “neck in the hourglass”
  • Defined by interfaces not physical characteristics
connectivity layer protocols services
Connectivity LayerProtocols & Services
  • Communication
    • Internet protocols: IP, DNS, routing, etc.
  • Security: Grid Security Infrastructure (GSI)
    • Uniform authentication, authorization, and message protection mechanisms in multi-institutional setting
    • Single sign-on, delegation, identity mapping
    • Public key technology, SSL, X.509, GSS-API
    • Supporting infrastructure: Certificate Authorities, certificate & key management, …

GSI: www.gridforum.org/security/gsi

resource layer protocols services
Resource LayerProtocols & Services
  • Grid Resource Allocation Management (GRAM)
    • Remote allocation, reservation, monitoring, control of compute resources
  • GridFTP protocol (FTP extensions)
    • High-performance data access & transport
  • Grid Resource Information Service (GRIS)
    • Access to structure & state information
  • Others emerging: Catalog access, code repository access, accounting, etc.
  • All built on connectivity layer: GSI & IP

GRAM, GridFTP, GRIS: www.globus.org

collective layer protocols services
Collective LayerProtocols & Services
  • Index servers aka metadirectory services
    • Custom views on dynamic resource collections assembled by a community
  • Resource brokers (e.g., Condor Matchmaker)
    • Resource discovery and allocation
  • Replica catalogs
  • Replication services
  • Co-reservation and co-allocation services
  • Workflow management services
  • Etc.

Condor: www.cs.wisc.edu/condor

example high throughput computing system

API

SDK

C-point

Protocol

Checkpoint

Repository

API

SDK

Access

Protocol

Compute

Resource

Example:High-ThroughputComputing System

App

High Throughput Computing System

Collective

(App)

Dynamic checkpoint, job management, failover, staging

Collective

(Generic)

Brokering, certificate authorities

Access to data, access to computers, access to network performance data

Resource

Communication, service discovery (DNS), authentication, authorization, delegation

Connect

Storage systems, schedulers

Fabric

example data grid architecture
Example:Data Grid Architecture

App

Discipline-Specific Data Grid Application

Collective

(App)

Coherency control, replica selection, task management, virtual data catalog, virtual data code catalog, …

Collective

(Generic)

Replica catalog, replica management, co-allocation, certificate authorities, metadata catalogs,

Access to data, access to computers, access to network performance data, …

Resource

Communication, service discovery (DNS), authentication, authorization, delegation

Connect

Storage systems, clusters, networks, network caches, …

Fabric

the programming problem
The Programming Problem
  • But how do I develop robust, secure, long-lived, well-performing applications for dynamic, heterogeneous Grids?
  • I need, presumably:
    • Abstractions and models to add to speed/robustness/etc. of development
    • Tools to ease application development and diagnose common problems
    • Code/tool sharing to allow reuse of code components developed by others
grid programming technologies
Grid Programming Technologies
  • “Grid applications” are incredibly diverse (data, collaboration, computing, sensors, …)
    • Seems unlikely there is one solution
  • Most applications have been written “from scratch,” with or without Grid services
  • Application-specific libraries have been shown to provide significant benefits
  • No new language, programming model, etc., has yet emerged that transforms things
    • But certainly still quite possible
examples of grid programming technologies
Examples of GridProgramming Technologies
  • MPICH-G2: Grid-enabled message passing
  • CoG Kits, GridPort: Portal construction, based on N-tier architectures
  • GDMP, Data Grid Tools, SRB: replica management, collection management
  • Condor-G: workflow management
  • Legion: object models for Grid computing
  • Cactus: Grid-aware numerical solver framework
    • Note tremendous variety, application focus
mpich g2 a grid enabled mpi
MPICH-G2: A Grid-Enabled MPI
  • A complete implementation of the Message Passing Interface (MPI) for heterogeneous, wide area environments
    • Based on the Argonne MPICH implementation of MPI (Gropp and Lusk)
  • Requires services for authentication, resource allocation, executable staging, output, etc.
  • Programs run in wide area without change
    • Modulo accommodating heterogeneous communication performance
  • See also: MetaMPI, PACX, STAMPI, MAGPIE

www.globus.org/mpi

a grid application scenario
A Grid Application Scenario
  • A distributed simulation involving 10 supercomputers at 10 different locations
    • How do you know where they are?
    • How do you identify yourself to each?
    • How do you get permission to use them?
    • How do you submit remote jobs?
    • How do you get access to resources on all the machines simultaneously?
    • What happens if a machine fails?
    • How are input/output files managed?
basic grid services
Basic Grid Services
  • Security
    • Authentication: both client and server
    • Authorization: what privileges does the client have?
    • Access control: Sites want local control of operations that remote users are allowed to perform
    • Confidential data transfer using encryption
basic grid services cont
Basic Grid Services (cont.)
  • Resource management
    • Mechanism for submitting jobs to remote locations
    • Local policies for use, management, resource configuration
    • Scheduling of important resources
      • Coordinating scarce, expensive resources (e.g., cooperating supercomputers)
    • Advanced reservations to guarantee:
      • Quality of service
      • Completion of operations (e.g., reserve disk space for a large data transfer)
basic grid services cont1
Basic Grid Services (cont.)
  • Information Services
    • Register and query information about grid resources
      • Where are all the Cray T3E’s in the grid?
      • Where is a storage system with 250 gigabytes of free space that transfers data at 1 gigabit/sec?
    • Centerpiece for many Grid components
    • Performance measurement services
      • What is the current bandwidth of the link from jupiter.isi.edu to apogee.sdsc.edu?
    • Dynamic environment: assume the information service contains old information
basic grid services cont2
Basic Grid Services (cont.)
  • Efficient Data Transfers
    • Secure (authentication, encryption)
    • Parallel transfers
    • Partial file transfers
    • Third-party transfers
    • Reliable transfers
  • Replica Management Service
    • Large (petabyte-scale) datasets
    • Multiple stored and cached copies
    • Select the “best” copy with best performance
basic grid services cont3
Basic Grid Services (cont.)
  • Fault detection
  • Detect and report “failure” of component of a computation
    • Limited by ability to distinguish between network partition and system failure
  • Goal: make low-level operations reliable
  • No libraries for checkpoint and restart
    • Can’t “checkpoint” a socket
    • Only application knows how to checkpoint and restart
    • Likewise, storage system must do logging
major grid computing infrastructure projects
Major Grid ComputingInfrastructure Projects
  • The Globus Project
    • “Bag of services” model for grid computing
    • USC Information Sciences Institute and Argonne National Laboratory (Chicago)
    • We will use Globus for most of the examples in this class
  • The Legion Project
    • Object-oriented approach to grid computing
  • The Condor Project
    • Schedule computations on pool of resources
the globus approach
The Globus Approach
  • The Globus toolkit provides a range of basic Grid services
    • Security, information, fault detection, communication, resource management, ...
  • These services are simple and orthogonal
    • Can be used independently, mix and match
    • Programming model independent
  • For each there are well-defined APIs
  • Standards are used extensively
    • E.g., LDAP, GSS-API, X.509, ...
hardware what is different in a grid
Hardware:What is different in a grid?
  • Heterogeneous hardware environment
    • computing platforms
    • network connections
    • storage systems and caches
  • Wide-area distribution
    • Wide-area network latency and bandwidth
    • Resources in different administration domains
  • Dynamic environment
    • Resources enter and leave grid
software issues in distributed operating systems
Software: Issues in Distributed Operating Systems
  • Communication models
    • Client-Server Model
    • Remote procedure call
    • Group communication
  • In a grid:
    • Algorithms must tolerate wide-area latency for message transfers
    • Avoid large numbers of messages
    • Typically perform larger transfers, initiate remote jobs rather than procedure calls
software issues in distributed operating systems1
Software: Issues in Distributed Operating Systems
  • Synchronization
    • Clock synchronization
    • Election algorithms: determine a coordinator
    • Atomic transactions
  • In a grid:
    • With wide-area latencies, typically perform synchronization on larger grain
    • Can implement atomic operations
software issues in distributed operating systems2
Software: Issues inDistributed Operating Systems
  • Processes and Processors
    • Threads
    • Allocating Processors
    • Scheduling and co-scheduling resources
    • Fault tolerance
  • In a grid: scheduling, allocation, & fault tolerance issues get more complicated in the wide area environment
software issues in a distributed operating system
Software: Issues in a Distributed Operating System
  • Distributed file systems
    • File service that reads and writes file, controls access
    • Creating, deleting & managing directories
    • Naming
    • Sharing
    • Caching and consistency
    • Replication and updates
  • In a grid, same issues complicated by wide area distribution, different administrative domains, enormous data sets
software issues for a distributed operating system
Software: Issues for a Distributed Operating System
  • Distributed Shared Memory
    • Generally applies to machines in a LAN
    • Each processor contains memory corresponding to part of the shared memory address space
    • Each processor caches data from other processors
    • Many consistency algorithms
  • In a grid: EASIER! Globus does not support a shared address space
    • Legion has a single shared object space
summary heterogeneity makes things harder in a grid
Summary: Heterogeneity makes things harder in a grid
  • Heterogeneous software and hardware
  • Different administrative domains
  • Different policies for use and management of local resources
    • Must do coordinated scheduling
  • Different security policies
  • Dynamic environment
    • Must discover resources
    • Robust in the presence of network, resource failures
where do computational grids get their names
Where do computational grids get their names?
  • “A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities.”
  • Name (and definition) imply an analogy to the electric power grid
    • Power inexpensive, universally available
    • Enabled new devices and industries
an infrastructure analogy the electric power grid
An Infrastructure Analogy:The Electric Power Grid
  • Revolutionary development: transmission and distribution of electricity
  • Before: power accessible in crude forms
      • human work
      • horses
      • water power
      • steam engines
  • Today: cheap, reliable power universally available
electric power grid cont
Electric Power Grid (cont.)
  • Power to billions of devices
  • Efficient
  • Low-cost
  • Reliable
  • North America: 10,000 generators linked to billions of outlets
  • Heterogeneous components, distributed ownership
  • Interconnections between regions: share reserve capacity, trade excess power
electric power grid cont1
Electric Power Grid (cont.)
  • Required more than just technology
    • Regulatory, political and institutional development
    • Infrastructure for monitoring and management
  • Huge social impact
    • Fundamentally changed work and home life
  • Huge environmental impact
    • Consume resources, generate pollution, global warming, …
another infrastructure analogy railroads and the rise of chicago
Another Infrastructure Analogy: Railroads and the Rise of Chicago
  • Early 1800s: Chicago was “a small field of onions on a very large lake”
  • Impact of railroad infrastructure:
    • Trains used for shipment of goods
    • Chicago a “cache” for agricultural products
    • New financial institutions, technologies and industries
      • Board of Trade
      • Stockyards, refrigerated rail cars
    • Midwest’s native ecosystems destroyed
      • Bison replaced by cattle
      • Prairie replaced by wheat and corn fields
new infrastructure has serious social consequences
New Infrastructure Has Serious Social Consequences
  • More examples: highway system, telephone network, banking system
  • What changes will the Grid infrastructure bring about?
    • Enable unimagined applications
    • Likely to have positive and negative effects
  • Are we ready to deal with the rate of change?
    • Processing power, bandwidth, storage all growing exponentially
based on infrastructure analogies desired characteristics of grids
Based on Infrastructure Analogies: Desired Characteristics of Grids
  • Pooling of resources
    • Compute cycles, data, people, sensors
  • Dependable service
    • Predictable
    • Sustained performance
    • Often high-performance
grid characteristics cont
Grid Characteristics (cont.)
  • Consistent service
    • Standard services available
    • Via standard interfaces
    • Enable application development
  • Pervasive
    • Services always available
  • Inexpensive
    • Otherwise not widely accepted and used
application examples
Application Examples
  • Online instrumentation
  • Distributed supercomputing
  • Collaborative engineering
  • High-throughput computing
  • Remote job submission, meta-queueing
grid applications distributed supercomputing
Grid Applications:Distributed Supercomputing
  • Solve problems that cannot be solved using a single system
  • Example application:distributed, interactive simulation involving 100,000s of entities
  • Difficult issues:
    • “Co-scheduling” of scarce, expensive resources
    • Algorithms that scale to many nodes and tolerate latency
    • Achieving and sustaining high performance across heterogeneous systems
globus example distributed supercomputing
SF-Express distributed, interactive simulation

100K vehicles (2002 goal) using 13 computers, 1386 nodes, 9 sites

Largest DIS ever done

Globus mechanisms for

Resource allocation

Distributed startup

I/O and configuration

Fault detection

Globus Example:Distributed Supercomputing

NCSA

Origin

Caltech

Exemplar

CEWES

SP

Maui

SP

P. Messina et al., Caltech

grid applications high throughput computing
Grid Applications:High-Throughput Computing
  • Schedule large numbers of loosely-coupled or independent tasks
  • Tie together idle workstations
    • Put unused cycles to work
  • Example applications: chip design, solving cryptographic problems
  • Systems:
    • Condor: manages pool of hundreds of workstations around the world
    • Entropia: startup company
grid applications data intensive computing
Grid Applications:Data-Intensive Computing
  • Geographically distributed data repositories, digital libraries and databases
  • Up to petabytes of data
  • Example applications: High-energy physics experiments, climate modeling, human genome project databases
  • Challenging Issues:
    • High-performance data transfers in wide-area environments
    • Management of caching and replication
globus data intensive computing

“Access datasets A, B;

run A->meso->hydro;

compare result with B”

Historical

data

archive

Historical

data

archive

Query

manager

Cache

manager

Resource

manager

Cache

Simulation

data

archive

Analysis

engine

Cache

Cache

Cache

meso

hydro

compare

Globus Data-Intensive Computing

“How do midwest flood

frequencies under 2xCO2

scenario compare with

historical data?”

grid applications collaborative computing
Grid Applications:Collaborative Computing
  • Enabling and enhancing human interactions
    • Virtual shared spaces
    • Shared resources: data archives, simulations
  • Example applications: collaborative design or collaborative exploration of data sets
  • Challenges:
    • Real-time requirements of human perception
    • Rich interactions
globus example collaborative engineering
Globus Example:Collaborative Engineering
  • Manipulate shared virtual space:
    • Simulation components
    • Multiple flows:
    • Control, Text, Video,
    • Audio, Database,
    • Simulation, Tracking, Haptics, Rendering
  • Uses Globus communication

CAVERNsoft:

UIC Electronic Visualization Laboratory

grids changing science
Grids Changing Science
  • NSF National Earthquake Engineering Center
    • Integrated instrumentation, collaboration, simulation environment
  • National Environmental
  • High-energy Physics Grid (GriPhyn)
  • CERN Data Grid
current and future applications
Current and Future Applications
  • Interesting applications exist today
  • More sophisticated applications will follow
  • Characteristics
    • Appetite for resources (CPU, memory, storage)
    • Synchronization
    • Only satisfied by multiple systems
    • Need high availability of resources
who will use grids
Who will use grids?

1. Governments

  • Disaster response, national defense, national collaboratory, strategic computing reserve

2. Private grids for institutions

  • Relatively low-cost, small-scale
  • Central management
  • Example: hospitals and medical personnel
who will use grids cont
Who will use grids? (cont.)

3. “Virtual Grid”: Multi-institution collaboration

    • Large, fluid, highly-distributed community
    • Hundreds of researchers and students around the world
    • Share instruments, data archives, software, computers
  • Public Grid
    • Enormous community
    • Consumers, service providers, resource providers, network providers
summary
Summary
  • Grids will change the way we do science and engineering
  • Transition of services and application to production use
  • Future will see increases sophistication and scope of services, tools, and applications