Grid computing a primer
1 / 60

Grid Computing - A Primer - PowerPoint PPT Presentation

  • Uploaded on

Grid Computing - A Primer. Sridhara Dasu, Department of Physics, U. Wisconsin. Grid Computing What is the buzz all about? What is the promise? My Perspective What is in it for me? How is it working for us? In UW-Madison And, beyond … Conclusion Why should you be interested?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Grid Computing - A Primer' - aure

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Grid computing a primer
Grid Computing - A Primer

Sridhara Dasu, Department of Physics, U. Wisconsin

  • Grid Computing

    • What is the buzz all about?

    • What is the promise?

  • My Perspective

    • What is in it for me?

    • How is it working for us?

      • In UW-Madison

      • And, beyond …

  • Conclusion

    • Why should you be interested?

    • What are the consequences for you?

Acknowledgements: Condor Team GLOBUS Team I.Foster/Argonne M.Livny/Wisconsin D.Bradley/Wisconsin

Sridhara Dasu

Grid computing is in the news
Grid Computing isin the News …

Sridhara Dasu

The opportunity or challenge computational cornucopia
The Opportunity (or Challenge):Computational Cornucopia

  • Abundant computation, data, bandwidth

    • In many fields, too much data—not too little

    • Simulations of unprecedented accuracy

    • Ubiquitous internet  distance not a barrier

  • But as a consequence

    • Rate of change accelerates

    • Complex problems  multidisciplinary distributed teams & sharing of resources & expertise

    • Without infrastructure, you can’t compete

Sridhara Dasu

Why distributed teams are important
Why Distributed Teams Are Important

  • Increasingly challenging & complex problems

    • Particle physics, Global change, Cosmology, Life sciences

    • Manufacturing, Mineral exploration

    • Film production, Game development, …

  • Required expertise & resources also distributed

    • People

    • Computational capability

    • Data

    • Sensors

Sridhara Dasu

The grid
The Grid

“Resource sharing & coordinated problem solving in dynamic … virtual organizations”

  • Enable integration of distributed service & resources

  • Using general-purpose protocols & infrastructure

  • To achieve useful qualities of service

“The Anatomy of the Grid”, Foster, Kesselman, Tuecke, 2001

Sridhara Dasu

What is a grid
What is a Grid?

  • The key criteria:

    • Coordinated distributed resources …

    • Uses standard, open, general-purpose protocols and interfaces …

    • Deliver non-trivial qualities of service.

  • What is not a Grid?

    • A cluster, a network attached storage device, a scientific instrument, a network, etc.

    • Each is an important component of a Grid, but by itself does not constitute a Grid

Sridhara Dasu

Why should you care
Why Should You Care?

1) Grid is a promising technology [Vision]

  • It ushers in a virtualized, collaborative, distributed world

    2) Grids are being commissioned now [Reality]

  • Grids are built (not bought), but are delivering real benefits in academic and commercial settings

    3) An open Grid is to your advantage [Future]

  • Standards are being defined now that will determine the future of this technology

Sridhara Dasu

The power grid on demand access to electricity
The Power Grid:On-Demand Access to Electricity

Decouple production &

consumption, enabling

  • On-demand access

  • Economies of scale

  • Consumer flexibility

  • New devices

Quality, economies of scale


Sridhara Dasu

But computing isn t really like electricity
But Computing Isn’t Really Like Electricity!

  • How about “access computing resources like we access Web content”?

    • We have no idea where a website is, or on what computer or operating system it runs

  • Two interrelated opportunities

    1) Enhance economy, flexibility, access by virtualizing computing resources

    2) Deliver entirely new capabilities by integrating distributed resources

Sridhara Dasu


Application Virtualization

Infrastructure Virtualization

  • Dynamic & intelligent

  • provisioning

  • Automatic failover









Source: The Grid: Blueprint for a New Computing Infrastructure (2nd Edition), 2004

Sridhara Dasu

Local clusters to global grids
Local Clusters to Global Grids

Cluster Grid Enterprise Grid Global Grid

Sridhara Dasu

Grid deployment trends
Grid Deployment Trends



Mission Criticality


Department Enterprise Collaboration Internet

Sridhara Dasu

Transparent service
Transparent Service
















Webster says: Autonomic = acting or occurring involuntarily <autonomic reflexes>

Sridhara Dasu

Multidisciplinary teams problem solving in the 21 st century
Multidisciplinary Teams:Problem Solving in the 21st Century

  • Teams organized around common goals

    • Communities: “Virtual organizations”

  • With diverse membership & capabilities

    • Heterogeneity is a strength not a weakness

  • And geographic and political distribution

    • No location/organization possesses all required skills and resources

  • Must adapt as a function of the situation

    • Adjust membership, reallocate responsibilities, renegotiate resources

Sridhara Dasu

Challenging technical requirements
Challenging Technical Requirements

  • Dynamic formation and management of virtual organizations

  • Discovery & online negotiation of access to services: who, what, why, when, how

  • Configuration of applications and systems able to deliver multiple qualities of service

  • Autonomic management of distributed infrastructures, services, and applications

  • Management of distributed state

  • Open, extensible, evolvable infrastructure

Sridhara Dasu

The globus project making grid computing a reality since 1996
The Globus Project™Making Grid computing a reality (since 1996)

  • Close collaboration with real Grid projects in science and industry

  • The Globus Toolkit®: Open source software base for building Grid infrastructure and applications

  • Development and promotion of standard Grid protocols to enable interoperability and shared infrastructure

  • Development and promotion of standard Grid software APIs to enable portability and code sharing

  • Global Grid Forum: We co-founded GGF to foster Grid standardization and community

Sridhara Dasu

Globus toolkit 2 key protocols
Globus Toolkit 2Key Protocols

  • The Globus Toolkit v2 (GT2)centers around four key protocols

    • Connectivity layer:

      • Security: Grid Security Infrastructure (GSI)

    • Resource layer:

      • Resource Management: Grid Resource Allocation Management (GRAM)

      • Information Services: Grid Resource Information Protocol (GRIP)

      • Data Transfer: Grid File Transfer Protocol (GridFTP)

  • Also key collective layer protocols

    • Info Services, Replica Management, etc.

Sridhara Dasu

Resource management

Est. 1986


High Throughput Computing


Resource Management

UW Condor Project - Miron Livny’s group (

  • Predates Globus

  • High throughput computing on commodity resources

  • Successful enterprise level deployment

    • UW Computer Science Condor pool

    • UW Condor pools in other departments

    • INFN/Italy pools

    • Inter-pool flocking

    • Also, some industrial users

Sridhara Dasu

The layers of condor




Application Agent

Customer Agent


Owner Agent



Remote Execution Agent

Local Resource Manager


The Layers of Condor

Complete solution for resource management

Sridhara Dasu

A grid job
A Grid Job

  • Must be able to run in the background: no interactive input, windows, GUI, etc.

  • Can still use STDIN, STDOUT, and STDERR (the keyboard and the screen), but files are used for these instead of the actual devices

  • Organize data files, input/output

Sridhara Dasu

Condor universes
Condor Universes

  • The Standard Universe

    • Check-points executable state

    • Job migration to other resources to continue execution

    • Transparent IO redirection to user submit machines

    • Robust against resource preemption for higher priority tasks + resource failures

    • Limitations on applications (e.g., shlib, MT)

  • The Vanilla Universe

    • Traditional batch jobs with no limitations

    • External solutions for IO redirection

    • Not robust against preemption or resource failures

  • The Globus Universe (new)

    • Adapted to emerging Grid standards

    • Part of Globus Toolkit

Sridhara Dasu

Condor g globus condor


middleware deployed across entire Grid

remote access to computational resources

dependable, robust data transfer


job scheduling across multiple resources

strong fault tolerance with checkpointing and migration

layered over Globus as “personal batch system” for the Grid

Condor-G: Globus + Condor

Sridhara Dasu

Condor g


Globus Toolkit





Fabric (processing, storage, communication)

Sridhara Dasu

Creating a submit description file
Creating a Submit Description File

  • A plain ASCII text file

  • Tells Condor-G about your job:

    • Which executable, grid site, input, output and error files to use, command-line arguments, environment variables, etc.

  • Can describe many jobs at once (a “cluster”) each with different input, arguments, output, etc.

Sridhara Dasu

Simple submit description file
Simple Submit Description File

# Simple condor_submit input file

# (Lines beginning with # are comments)

# NOTE: the words on the left side are not

# case sensitive, but filenames are!

Universe = globus

GlobusScheduler =

Executable = my_job


Sridhara Dasu

Running condor submit
Running condor_submit

  • You give condor_submit the name of the submit file you have created

  • condor_submit parses the file, checks for errors, and creates a “ClassAd” that describes your job(s)

  • Sends your job’s ClassAd(s) and executable to the Condor-G schedd, which stores the job in its queue

    • Atomic operation, two-phase commit

  • View the queue with condor_q

Sridhara Dasu

Condor submit sequence


Globus Resource


Gate Keeper


Local Job



condor_submit sequence

Sridhara Dasu

Running condor submit1
Running condor_submit

% condor_submit my_job.submit-file

Submitting job(s).

1 job(s) submitted to cluster 1.

% condor_q

-- Submitter: : <> :


1.0 frieda 6/16 06:52 0+00:00:00 I 0 0.0 my_job

1 jobs; 1 idle, 0 running, 0 held


Sridhara Dasu


  • Directed Acyclic Graph Manager

  • DAGMan allows you to specify the dependencies between your Condor-G jobs, so it can manage them automatically for you.

  • (e.g., “Don’t run job “B” until job “A” has completed successfully.”)

Sridhara Dasu

What is a dag

Job A

Job B

Job C

Job D

What is a DAG?

  • A DAG is the datastructure used by DAGMan to represent these dependencies.

  • Each job is a “node” in the DAG.

  • Each node can have any number of “parent” or “children” nodes – as long as there are no loops!

Sridhara Dasu

Defining a dag

Job A

Job B

Job C

Job D

Defining a DAG

  • A DAG is defined by a .dagfile, listing each of its nodes and their dependencies:

    # diamond.dag

    Job A a.sub

    Job B b.sub

    Job C c.sub

    Job D d.sub

    Parent A Child B C

    Parent B C Child D

  • each node will run the Condor-G job specified by its accompanying Condor submit file

Sridhara Dasu

What about data
What about Data?

Data Placement* (DaP) must be an integral part of the end-to-end solution

Stork (Another UW-Computer Science Product)

  • Schedules, runs, monitors, and manages Data Placement (DaP) jobs in a heterogeneous Grid environment & ensures that they complete.

  • What Condor (G) means for computational jobs, Stork means the same for DaP jobs.

  • Just submit a bunch of DaP jobs and then relax..

  • Interoperates with various storage services

* Space management and Data transfer

Sridhara Dasu

Full condor g capabilities




Full Condor-G Capabilities











Sridhara Dasu

Uw enterprise level grid
UW “Enterprise Level” Grid

  • Condor pool at CS

    • 1000 ~1GHz Intel CPUs

  • Condor pools at various departments

    • 100 ~2.4 GHz Intel CPUs at Physics, etc.

    • New: Grid Laboratory of Wisconsin

  • Condor jobs flock from various departments to CS Pool as needed

  • Excellent utilization

    • Especially when the Condor Standard Universe is used

      • Premption, Checkpointing, Job Migration

Sridhara Dasu

Grid laboratory of wisconsin
Grid Laboratory of Wisconsin

2003 Initiative funded by NSF/UWSix GLOW Sites

  • Computational Genomics, Chemistry

  • Amanda, Ice-cube, Physics/Space Science

  • High Energy Physics/CMS, Physics

  • Materials by Design, Chemical Engineering

  • Radiation Therapy, Medical Physics

  • Computer Science

Phase-1 already has ~300 Xeon CPUs

Expect to grow to about 700 CPUs + 100 TB disk

Sridhara Dasu

Condor glow ideas
Condor/GLOW Ideas

  • Exploit commodity hardware for high throughput computing

    • The base hardware is the same at all sites

    • Local configuration optimization as needed

      • e.g., Number of CPU elements vs storage elements

    • Must meet global requirements

      • It turns out that our initial assessment calls for almost identical configuration at all sites

  • Managed locally at 6 sites

    • Shared globally across all sites

    • Higher priority for local jobs

Sridhara Dasu

The large hadron collider1
The Large Hadron Collider

Building and commissioning the accelerator and detectors, and extracting interesting physics out of this massive data sample is a big challenge.

Sridhara Dasu

Event filtering before archival
Event Filtering Before Archival

Output: 1MB/event @100 Hz Petabyte per year

Sridhara Dasu

Analysis teams resources
Analysis Teams + Resources

Input: ~109 events (petabyte databases)

Complex algorithms developed by collaborating physicists

Output: Publications with ~100s of selected events

Sridhara Dasu

Simulation early grid deployment
Simulation: Early Grid Deployment

  • Detailed simulations necessary

    • Large numbers of background events need to be simulated

      • Dominated by fluctuations of tails

  • Computation scale

    • Background events occur on every crossing - 40 MHz

      • Up to 10 minutes on a 1 GHz CPU to simulate full event

      • 2 x 109 s CPU time to simulate 1 s of LHC operation

      • Requires 1000 CPUs running for 1 month

    • CMS has large number of detector channels, 108

      • Each event requires 1-10 MB storage space

      • 32-320 TB needed for 1 s of LHC operation

    • Optimizing CPU and data storage

      • Simulate in bins and reuse some data

  • Pleasantly parallel application

    • Ideal Grid testbed candidate

      • Used UW “enterprise level” classic Condor grid successfully

      • With Grid2003 used nation wide Globus/Condor-G based true grid

Sridhara Dasu

Tapping uw enterprise level grid
Tapping UW “Enterprise Level” Grid

We tapped resources on the UW campus opportunistically

We produced more events in 2003 than most other CMS collaborators - because of using our UW enterprise level grid and condor standard universe!

2004 numbers are through March, and were also running our new C++ simulation code that is a factor of 2 slower. We have typically used less than 50% of available resources and ran for about 30% of the year.

Sridhara Dasu

Cost savings from grids
Cost Savings from Grids

  • The size of cost savings from grids will come in two waves:

    • First from the adoption of clusters

    • Then from the adoption of Enterprise Grids

  • Firms using Clusters estimate that cost savings will be small at first, but will grow to 15% to 30% savings in IT Costs in 2005-2008.

  • Firms planning to use Enterprise Grids estimate that they will experience a second wave of benefits. Savings will grow to 15% to 30% by 2007-2010.

Source: Robert Cohen, “Grid Computing: Projected Impact on North Carolina’s Economy & Broadband Use through 2010,” Rural Internet Access Authority, September 2003.

Sridhara Dasu

Grid drawbacks being addressed now
Grid drawbacks being addressed now

  • Low utilization of enterprise resources

  • High cost of provisioning for peak demand

  • Inadequate resources prevent use of advanced applications

  • Lack of information integration

Sridhara Dasu

Cyberinfrastructure vos relevance far beyond science
Cyberinfrastructure & VOs Relevance Far Beyond Science

1) Virtualization of information technology

  • From vertical silos to on-demand access

  • Improve efficiency of delivery, increase flexibility of use

  • E.g., financial services, e-commerce

    2) New applications, products, & services enabled by much computation & data

  • Media, life sciences, manufacturing, seismic exploration, online gaming, etc., etc., etc.

Sridhara Dasu

The value of grid computing ibm perspective
The Value of Grid Computing:IBM Perspective



Higher Quality of Service

Increased Productivity


Reduced Complexity

& Cost

Improved Resiliency

Sridhara Dasu

Grids hp perspective




Grids: HP Perspective

computing utility or GRID

virtual data center


programmable data center

grid-enabled systems


Tru64, HP-UX, Linux


Open VMS clusters, TruCluster, MC ServiceGuard


shared, traded resources

Sridhara Dasu

Grid vision marketing and reality
Grid Vision, Marketing, and Reality

  • Vision

    • Computing resources can be shared like content on the Web

  • Marketing

    • Have we got a Grid for you!

      • [Data, compute, knowledge, information, desktop, PC, enterprise, cluster, …]

  • Reality

    • Commercial products mostly non-interoperable

    • Open source tools offer de facto standards, but are also far from a complete solution

Sridhara Dasu

Standards matter
Standards Matter!

  • Open, standard protocols

    • Enable interoperability

    • Avoid product/vendor lock-in

    • Enable innovation/competition on end points

    • Enable ubiquity

  • In Grid space, must address how we

    • Describe, discover, & access resources

    • Monitor, manage, & coordinate, resources

    • Account & charge for resources

      For many different types of resource

Sridhara Dasu

Developing grid standards

Managed shared

virtual systems


Open Grid

Services Arch

Web services, etc.

Real standards

Multiple implementations

Globus Toolkit



Defacto standard

Single implementation

Developing Grid Standards

Increased functionality,









Sridhara Dasu

Open grid services architecture
Open Grid Services Architecture

Adopt service-oriented architecture

  • Key to virtualization, discovery, composition, local-remote transparency

    + Standard service description & access

  • Leverage industry standard Web services

    + Distributed service management protocols

  • A “component model for Web services”

    = A framework for creating, managing, & delivering interoperable services

“The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, Foster, Kesselman, Nick, Tuecke, 2002

Sridhara Dasu

Grid web services
Grid & Web Services

  • Grid and Web Services are merging

    • Grid is an aggressive use case of Web Services

  • Web Services standards landscape is in flux

    • OGSI/A will need to evolve with it

    • Uncertain status of security & policy standards continues to be a big source of concern

  • Grid services standards landscape heating up

  • W3C, OASIS, GGF are key standards orgs

  • Open source software important for adoption

Sridhara Dasu

Ogsa status implementations
OGSA Status: Implementations

  • Globus Toolkit v3: Linux for the Grid

    • Open source middleware, commercial support

    • A range of computation & data management, registry, and security functions

  • Some nice announced OGSI-based products

    • IBM, Avaki, Platform, Sun, NEC, HP, UD, Entropia, DataSynapse, Insors, Oracle, etc.

    • Read the fine print: “Intent is to use OGSI-based products,” “OGSA-compliant software,” “Embraces fundamental OGSA concepts”

Sridhara Dasu

Why should you care1
Why Should You Care?

1) Grid is a promising technology [Vision]

  • It ushers in a virtualized, collaborative, distributed world

    2) Grids are being commissioned now [Reality]

  • Grids are built (not bought), but are delivering real benefits in academic and commercial settings

    3) An open Grid is to your advantage [Future]

  • Standards are being defined now that will determine the future of this technology

Sridhara Dasu

Consequences for network
Consequences for Network

  • Increased bandwidth use

    • Data transport for distributed collaborative applications are likely to be larger than web browsing

  • Demands on Quality of Service

    • As Grid computing becomes integral part of user needs QoS requirements go up

  • Access, Authentication, Auditing

    • Issues are being addressed adequately as Grids are being commissioned

      • However, there are always malicious hackers

      • And, there is also naïve users unintentionally causing lockup by demanding excessive resources on the grid

Sridhara Dasu

Pop quiz the grid is
Pop Quiz: The Grid Is …

  • A collaboration & resource sharing infrastructure for scientific applications

  • A distributed service integration and management technology

  • A disruptive technology that enables a virtualized, collaborative, distributed world

  • An open source technology & community

  • A marketing slogan

  • All of the above

Sridhara Dasu

To learn more
To Learn More

2nd Edition

Sridhara Dasu