Grid computing a primer
Download
1 / 60

Grid Computing - A Primer - PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on

Grid Computing - A Primer. Sridhara Dasu, Department of Physics, U. Wisconsin. Grid Computing What is the buzz all about? What is the promise? My Perspective What is in it for me? How is it working for us? In UW-Madison And, beyond … Conclusion Why should you be interested?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Grid Computing - A Primer' - aure


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Grid computing a primer
Grid Computing - A Primer

Sridhara Dasu, Department of Physics, U. Wisconsin

  • Grid Computing

    • What is the buzz all about?

    • What is the promise?

  • My Perspective

    • What is in it for me?

    • How is it working for us?

      • In UW-Madison

      • And, beyond …

  • Conclusion

    • Why should you be interested?

    • What are the consequences for you?

Acknowledgements: Condor Team GLOBUS Team I.Foster/Argonne M.Livny/Wisconsin D.Bradley/Wisconsin

Sridhara Dasu


Grid computing is in the news
Grid Computing isin the News …

Sridhara Dasu



The opportunity or challenge computational cornucopia
The Opportunity (or Challenge):Computational Cornucopia

  • Abundant computation, data, bandwidth

    • In many fields, too much data—not too little

    • Simulations of unprecedented accuracy

    • Ubiquitous internet  distance not a barrier

  • But as a consequence

    • Rate of change accelerates

    • Complex problems  multidisciplinary distributed teams & sharing of resources & expertise

    • Without infrastructure, you can’t compete

Sridhara Dasu


Why distributed teams are important
Why Distributed Teams Are Important

  • Increasingly challenging & complex problems

    • Particle physics, Global change, Cosmology, Life sciences

    • Manufacturing, Mineral exploration

    • Film production, Game development, …

  • Required expertise & resources also distributed

    • People

    • Computational capability

    • Data

    • Sensors

Sridhara Dasu


The grid
The Grid

“Resource sharing & coordinated problem solving in dynamic … virtual organizations”

http://www.mkp.com/mk/default.asp?isbn=1558609334

  • Enable integration of distributed service & resources

  • Using general-purpose protocols & infrastructure

  • To achieve useful qualities of service

“The Anatomy of the Grid”, Foster, Kesselman, Tuecke, 2001

Sridhara Dasu


What is a grid
What is a Grid?

  • The key criteria:

    • Coordinated distributed resources …

    • Uses standard, open, general-purpose protocols and interfaces …

    • Deliver non-trivial qualities of service.

  • What is not a Grid?

    • A cluster, a network attached storage device, a scientific instrument, a network, etc.

    • Each is an important component of a Grid, but by itself does not constitute a Grid

Sridhara Dasu


Why should you care
Why Should You Care?

1) Grid is a promising technology [Vision]

  • It ushers in a virtualized, collaborative, distributed world

    2) Grids are being commissioned now [Reality]

  • Grids are built (not bought), but are delivering real benefits in academic and commercial settings

    3) An open Grid is to your advantage [Future]

  • Standards are being defined now that will determine the future of this technology

Sridhara Dasu


The power grid on demand access to electricity
The Power Grid:On-Demand Access to Electricity

Decouple production &

consumption, enabling

  • On-demand access

  • Economies of scale

  • Consumer flexibility

  • New devices

Quality, economies of scale

Time

Sridhara Dasu


But computing isn t really like electricity
But Computing Isn’t Really Like Electricity!

  • How about “access computing resources like we access Web content”?

    • We have no idea where a website is, or on what computer or operating system it runs

  • Two interrelated opportunities

    1) Enhance economy, flexibility, access by virtualizing computing resources

    2) Deliver entirely new capabilities by integrating distributed resources

Sridhara Dasu


Virtualization

Application Virtualization

Infrastructure Virtualization

  • Dynamic & intelligent

  • provisioning

  • Automatic failover

Virtualization

Applications:

Delivery

Application

Services:

Distribution

Servers:

Execution

Source: The Grid: Blueprint for a New Computing Infrastructure (2nd Edition), 2004

Sridhara Dasu


Local clusters to global grids
Local Clusters to Global Grids

Cluster Grid Enterprise Grid Global Grid

Sridhara Dasu


Grid deployment trends
Grid Deployment Trends

Corporate

Corporate

Mission Criticality

Scientific

Department Enterprise Collaboration Internet

Sridhara Dasu


Transparent service
Transparent Service

Utility

Computing

Utility

Computing

Grid

Autonomic

Computing

Autonomic

Computing

Service-

Oriented

Architecture

Service-

Oriented

Architecture

Webster says: Autonomic = acting or occurring involuntarily <autonomic reflexes>

Sridhara Dasu



Multidisciplinary teams problem solving in the 21 st century
Multidisciplinary Teams:Problem Solving in the 21st Century

  • Teams organized around common goals

    • Communities: “Virtual organizations”

  • With diverse membership & capabilities

    • Heterogeneity is a strength not a weakness

  • And geographic and political distribution

    • No location/organization possesses all required skills and resources

  • Must adapt as a function of the situation

    • Adjust membership, reallocate responsibilities, renegotiate resources

Sridhara Dasu


Challenging technical requirements
Challenging Technical Requirements

  • Dynamic formation and management of virtual organizations

  • Discovery & online negotiation of access to services: who, what, why, when, how

  • Configuration of applications and systems able to deliver multiple qualities of service

  • Autonomic management of distributed infrastructures, services, and applications

  • Management of distributed state

  • Open, extensible, evolvable infrastructure

Sridhara Dasu


The globus project making grid computing a reality since 1996
The Globus Project™Making Grid computing a reality (since 1996)

  • Close collaboration with real Grid projects in science and industry

  • The Globus Toolkit®: Open source software base for building Grid infrastructure and applications

  • Development and promotion of standard Grid protocols to enable interoperability and shared infrastructure

  • Development and promotion of standard Grid software APIs to enable portability and code sharing

  • Global Grid Forum: We co-founded GGF to foster Grid standardization and community

Sridhara Dasu


Globus toolkit 2 key protocols
Globus Toolkit 2Key Protocols

  • The Globus Toolkit v2 (GT2)centers around four key protocols

    • Connectivity layer:

      • Security: Grid Security Infrastructure (GSI)

    • Resource layer:

      • Resource Management: Grid Resource Allocation Management (GRAM)

      • Information Services: Grid Resource Information Protocol (GRIP)

      • Data Transfer: Grid File Transfer Protocol (GridFTP)

  • Also key collective layer protocols

    • Info Services, Replica Management, etc.

Sridhara Dasu


Resource management

Est. 1986

C

High Throughput Computing

ondor

Resource Management

UW Condor Project - Miron Livny’s group (http://www.cs.wisc.edu/condor)

  • Predates Globus

  • High throughput computing on commodity resources

  • Successful enterprise level deployment

    • UW Computer Science Condor pool

    • UW Condor pools in other departments

    • INFN/Italy pools

    • Inter-pool flocking

    • Also, some industrial users

Sridhara Dasu


The layers of condor

Application

Submit

(client)

Application Agent

Customer Agent

Matchmaker

Owner Agent

Execute

(service)

Remote Execution Agent

Local Resource Manager

Resource

The Layers of Condor

Complete solution for resource management

Sridhara Dasu


A grid job
A Grid Job

  • Must be able to run in the background: no interactive input, windows, GUI, etc.

  • Can still use STDIN, STDOUT, and STDERR (the keyboard and the screen), but files are used for these instead of the actual devices

  • Organize data files, input/output

Sridhara Dasu


Condor universes
Condor Universes

  • The Standard Universe

    • Check-points executable state

    • Job migration to other resources to continue execution

    • Transparent IO redirection to user submit machines

    • Robust against resource preemption for higher priority tasks + resource failures

    • Limitations on applications (e.g., shlib, MT)

  • The Vanilla Universe

    • Traditional batch jobs with no limitations

    • External solutions for IO redirection

    • Not robust against preemption or resource failures

  • The Globus Universe (new)

    • Adapted to emerging Grid standards

    • Part of Globus Toolkit

Sridhara Dasu


Condor g globus condor

Globus

middleware deployed across entire Grid

remote access to computational resources

dependable, robust data transfer

Condor

job scheduling across multiple resources

strong fault tolerance with checkpointing and migration

layered over Globus as “personal batch system” for the Grid

Condor-G: Globus + Condor

Sridhara Dasu


Condor g

Condor

Globus Toolkit

Condor

Condor-G

User/Application

Grid

Fabric (processing, storage, communication)

Sridhara Dasu


Creating a submit description file
Creating a Submit Description File

  • A plain ASCII text file

  • Tells Condor-G about your job:

    • Which executable, grid site, input, output and error files to use, command-line arguments, environment variables, etc.

  • Can describe many jobs at once (a “cluster”) each with different input, arguments, output, etc.

Sridhara Dasu


Simple submit description file
Simple Submit Description File

# Simple condor_submit input file

# (Lines beginning with # are comments)

# NOTE: the words on the left side are not

# case sensitive, but filenames are!

Universe = globus

GlobusScheduler = host.domain.edu/jobmanager

Executable = my_job

Queue

Sridhara Dasu


Running condor submit
Running condor_submit

  • You give condor_submit the name of the submit file you have created

  • condor_submit parses the file, checks for errors, and creates a “ClassAd” that describes your job(s)

  • Sends your job’s ClassAd(s) and executable to the Condor-G schedd, which stores the job in its queue

    • Atomic operation, two-phase commit

  • View the queue with condor_q

Sridhara Dasu


Condor submit sequence

Condor_q

Globus Resource

Condor_submit

Gate Keeper

Condor-G

Local Job

Scheduler

Condor-G

condor_submit sequence

Sridhara Dasu


Running condor submit1
Running condor_submit

% condor_submit my_job.submit-file

Submitting job(s).

1 job(s) submitted to cluster 1.

% condor_q

-- Submitter: perdita.cs.wisc.edu : <128.105.165.34:1027> :

ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

1.0 frieda 6/16 06:52 0+00:00:00 I 0 0.0 my_job

1 jobs; 1 idle, 0 running, 0 held

%

Sridhara Dasu


Dagman
DAGMan

  • Directed Acyclic Graph Manager

  • DAGMan allows you to specify the dependencies between your Condor-G jobs, so it can manage them automatically for you.

  • (e.g., “Don’t run job “B” until job “A” has completed successfully.”)

Sridhara Dasu


What is a dag

Job A

Job B

Job C

Job D

What is a DAG?

  • A DAG is the datastructure used by DAGMan to represent these dependencies.

  • Each job is a “node” in the DAG.

  • Each node can have any number of “parent” or “children” nodes – as long as there are no loops!

Sridhara Dasu


Defining a dag

Job A

Job B

Job C

Job D

Defining a DAG

  • A DAG is defined by a .dagfile, listing each of its nodes and their dependencies:

    # diamond.dag

    Job A a.sub

    Job B b.sub

    Job C c.sub

    Job D d.sub

    Parent A Child B C

    Parent B C Child D

  • each node will run the Condor-G job specified by its accompanying Condor submit file

Sridhara Dasu


What about data
What about Data?

Data Placement* (DaP) must be an integral part of the end-to-end solution

Stork (Another UW-Computer Science Product)

  • Schedules, runs, monitors, and manages Data Placement (DaP) jobs in a heterogeneous Grid environment & ensures that they complete.

  • What Condor (G) means for computational jobs, Stork means the same for DaP jobs.

  • Just submit a bunch of DaP jobs and then relax..

  • Interoperates with various storage services

* Space management and Data transfer

Sridhara Dasu


Full condor g capabilities

SRM

SRB

NeST

Full Condor-G Capabilities

Planner(s)

DAGMan

Stork

(DaP)

Condor-G(compute)

Gate

Keeper

StartD

RFT

GridFTP

Sridhara Dasu


Uw enterprise level grid
UW “Enterprise Level” Grid

  • Condor pool at CS

    • 1000 ~1GHz Intel CPUs

  • Condor pools at various departments

    • 100 ~2.4 GHz Intel CPUs at Physics, etc.

    • New: Grid Laboratory of Wisconsin

  • Condor jobs flock from various departments to CS Pool as needed

  • Excellent utilization

    • Especially when the Condor Standard Universe is used

      • Premption, Checkpointing, Job Migration

Sridhara Dasu


Grid laboratory of wisconsin
Grid Laboratory of Wisconsin

2003 Initiative funded by NSF/UWSix GLOW Sites

  • Computational Genomics, Chemistry

  • Amanda, Ice-cube, Physics/Space Science

  • High Energy Physics/CMS, Physics

  • Materials by Design, Chemical Engineering

  • Radiation Therapy, Medical Physics

  • Computer Science

Phase-1 already has ~300 Xeon CPUs

Expect to grow to about 700 CPUs + 100 TB disk

Sridhara Dasu


Condor glow ideas
Condor/GLOW Ideas

  • Exploit commodity hardware for high throughput computing

    • The base hardware is the same at all sites

    • Local configuration optimization as needed

      • e.g., Number of CPU elements vs storage elements

    • Must meet global requirements

      • It turns out that our initial assessment calls for almost identical configuration at all sites

  • Managed locally at 6 sites

    • Shared globally across all sites

    • Higher priority for local jobs

Sridhara Dasu



The large hadron collider1
The Large Hadron Collider

Building and commissioning the accelerator and detectors, and extracting interesting physics out of this massive data sample is a big challenge.

Sridhara Dasu


Event filtering before archival
Event Filtering Before Archival

Output: 1MB/event @100 Hz Petabyte per year

Sridhara Dasu


Analysis teams resources
Analysis Teams + Resources

Input: ~109 events (petabyte databases)

Complex algorithms developed by collaborating physicists

Output: Publications with ~100s of selected events

Sridhara Dasu


Simulation early grid deployment
Simulation: Early Grid Deployment

  • Detailed simulations necessary

    • Large numbers of background events need to be simulated

      • Dominated by fluctuations of tails

  • Computation scale

    • Background events occur on every crossing - 40 MHz

      • Up to 10 minutes on a 1 GHz CPU to simulate full event

      • 2 x 109 s CPU time to simulate 1 s of LHC operation

      • Requires 1000 CPUs running for 1 month

    • CMS has large number of detector channels, 108

      • Each event requires 1-10 MB storage space

      • 32-320 TB needed for 1 s of LHC operation

    • Optimizing CPU and data storage

      • Simulate in bins and reuse some data

  • Pleasantly parallel application

    • Ideal Grid testbed candidate

      • Used UW “enterprise level” classic Condor grid successfully

      • With Grid2003 used nation wide Globus/Condor-G based true grid

Sridhara Dasu


Tapping uw enterprise level grid
Tapping UW “Enterprise Level” Grid

We tapped resources on the UW campus opportunistically

We produced more events in 2003 than most other CMS collaborators - because of using our UW enterprise level grid and condor standard universe!

2004 numbers are through March, and were also running our new C++ simulation code that is a factor of 2 slower. We have typically used less than 50% of available resources and ran for about 30% of the year.

Sridhara Dasu



Cost savings from grids
Cost Savings from Grids

  • The size of cost savings from grids will come in two waves:

    • First from the adoption of clusters

    • Then from the adoption of Enterprise Grids

  • Firms using Clusters estimate that cost savings will be small at first, but will grow to 15% to 30% savings in IT Costs in 2005-2008.

  • Firms planning to use Enterprise Grids estimate that they will experience a second wave of benefits. Savings will grow to 15% to 30% by 2007-2010.

Source: Robert Cohen, “Grid Computing: Projected Impact on North Carolina’s Economy & Broadband Use through 2010,” Rural Internet Access Authority, September 2003. http://www.e-nc.org

Sridhara Dasu


Grid drawbacks being addressed now
Grid drawbacks being addressed now

  • Low utilization of enterprise resources

  • High cost of provisioning for peak demand

  • Inadequate resources prevent use of advanced applications

  • Lack of information integration

Sridhara Dasu


Cyberinfrastructure vos relevance far beyond science
Cyberinfrastructure & VOs Relevance Far Beyond Science

1) Virtualization of information technology

  • From vertical silos to on-demand access

  • Improve efficiency of delivery, increase flexibility of use

  • E.g., financial services, e-commerce

    2) New applications, products, & services enabled by much computation & data

  • Media, life sciences, manufacturing, seismic exploration, online gaming, etc., etc., etc.

Sridhara Dasu


The value of grid computing ibm perspective
The Value of Grid Computing:IBM Perspective

Increased

Efficiency

Higher Quality of Service

Increased Productivity

& ROI

Reduced Complexity

& Cost

Improved Resiliency

Sridhara Dasu


Grids hp perspective

switchfabric

compute

storage

Grids: HP Perspective

computing utility or GRID

virtual data center

value

programmable data center

grid-enabled systems

UDC

Tru64, HP-UX, Linux

clusters

Open VMS clusters, TruCluster, MC ServiceGuard

today

shared, traded resources

Sridhara Dasu


Grid vision marketing and reality
Grid Vision, Marketing, and Reality

  • Vision

    • Computing resources can be shared like content on the Web

  • Marketing

    • Have we got a Grid for you!

      • [Data, compute, knowledge, information, desktop, PC, enterprise, cluster, …]

  • Reality

    • Commercial products mostly non-interoperable

    • Open source tools offer de facto standards, but are also far from a complete solution

Sridhara Dasu


Standards matter
Standards Matter!

  • Open, standard protocols

    • Enable interoperability

    • Avoid product/vendor lock-in

    • Enable innovation/competition on end points

    • Enable ubiquity

  • In Grid space, must address how we

    • Describe, discover, & access resources

    • Monitor, manage, & coordinate, resources

    • Account & charge for resources

      For many different types of resource

Sridhara Dasu


Developing grid standards

Managed shared

virtual systems

Research

Open Grid

Services Arch

Web services, etc.

Real standards

Multiple implementations

Globus Toolkit

Internet

standards

Defacto standard

Single implementation

Developing Grid Standards

Increased functionality,

standardization

Custom

solutions

1990

1995

2000

2005

2010

Sridhara Dasu


Open grid services architecture
Open Grid Services Architecture

Adopt service-oriented architecture

  • Key to virtualization, discovery, composition, local-remote transparency

    + Standard service description & access

  • Leverage industry standard Web services

    + Distributed service management protocols

  • A “component model for Web services”

    = A framework for creating, managing, & delivering interoperable services

“The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, Foster, Kesselman, Nick, Tuecke, 2002

Sridhara Dasu


Grid web services
Grid & Web Services

  • Grid and Web Services are merging

    • Grid is an aggressive use case of Web Services

  • Web Services standards landscape is in flux

    • OGSI/A will need to evolve with it

    • Uncertain status of security & policy standards continues to be a big source of concern

  • Grid services standards landscape heating up

  • W3C, OASIS, GGF are key standards orgs

  • Open source software important for adoption

Sridhara Dasu


Ogsa status implementations
OGSA Status: Implementations

  • Globus Toolkit v3: Linux for the Grid

    • Open source middleware, commercial support

    • A range of computation & data management, registry, and security functions

  • Some nice announced OGSI-based products

    • IBM, Avaki, Platform, Sun, NEC, HP, UD, Entropia, DataSynapse, Insors, Oracle, etc.

    • Read the fine print: “Intent is to use OGSI-based products,” “OGSA-compliant software,” “Embraces fundamental OGSA concepts”

Sridhara Dasu


Why should you care1
Why Should You Care?

1) Grid is a promising technology [Vision]

  • It ushers in a virtualized, collaborative, distributed world

    2) Grids are being commissioned now [Reality]

  • Grids are built (not bought), but are delivering real benefits in academic and commercial settings

    3) An open Grid is to your advantage [Future]

  • Standards are being defined now that will determine the future of this technology

Sridhara Dasu


Consequences for network
Consequences for Network

  • Increased bandwidth use

    • Data transport for distributed collaborative applications are likely to be larger than web browsing

  • Demands on Quality of Service

    • As Grid computing becomes integral part of user needs QoS requirements go up

  • Access, Authentication, Auditing

    • Issues are being addressed adequately as Grids are being commissioned

      • However, there are always malicious hackers

      • And, there is also naïve users unintentionally causing lockup by demanding excessive resources on the grid

Sridhara Dasu


Pop quiz the grid is
Pop Quiz: The Grid Is …

  • A collaboration & resource sharing infrastructure for scientific applications

  • A distributed service integration and management technology

  • A disruptive technology that enables a virtualized, collaborative, distributed world

  • An open source technology & community

  • A marketing slogan

  • All of the above

Sridhara Dasu


To learn more
To Learn More

2nd Edition

www.mkp.com/grid2

Sridhara Dasu