Grid computing from old traces to new applications
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

Grid Computing: From Old Traces to New Applications PowerPoint PPT Presentation


  • 42 Views
  • Uploaded on
  • Presentation posted in: General

The Failure Trace Archive. Grid Computing: From Old Traces to New Applications. Alexandru Iosup , Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Catalin Dumitrescu, Mathieu Jan, Dick Epema. Parallel and Distributed Systems Group, TU Delft.

Download Presentation

Grid Computing: From Old Traces to New Applications

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Grid computing from old traces to new applications

The FailureTraceArchive

Grid Computing:

From Old Traces to New Applications

Alexandru Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Catalin Dumitrescu, Mathieu Jan, Dick Epema

Parallel and Distributed Systems Group, TU Delft

Big thanks to our collaborators: U Wisc./Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, …

DGSim

Fribourg, Switzerland


About the speaker

About the Speaker

  • Systems

    • The Koala grid scheduler

    • The Tribler BitTorrent-compatible P2P file-sharing

    • The POGGI and CAMEO gaming platforms

  • Performance

    • The Grid Workloads Archive (Nov 2006)

    • The Failure Trace Archive (Nov 2009)

    • The Peer-to-Peer Trace Archive (Apr 2010)

    • Tools: DGSim trace-based grid simulator, GrenchMark workload-based grid benchmarking

  • Team of 15+ active collaborators in NL, AT, RO, US

  • Happy to be in Berkeley until September


The grid

The Grid

An ubiquitous, always-on computational and data storage platform on which users can seamlessly run their (large-scale) applications

Shared capacity & costs, economies of scale


The dutch grid das system and extensions

The Dutch Grid: DAS System and Extensions

UvA/MultimediaN (46)

VU (85 nodes)

DAS-3: a 5-cluster grid

  • 272 AMD Opteron nodes 792 cores, 1TB memory

  • Heterogeneous:

  • 2.2-2.6 GHz single/dual core nodes

  • Myrinet-10G (excl. Delft)

  • Gigabit Ethernet

SURFnet6

UvA/VL-e (41)

Clouds

10 Gb/s lambdas

  • Amazon EC2+S3, Mosso, …

DAS-4 (upcoming)

  • Multi-cores: general purpose, GPU, Cell, …

Leiden (32)

TU Delft (68)


Many grids built

Many Grids Built

DAS, Grid’5000, OSG, NGS, CERN, …

Why grids and not The Grid?


Agenda

Agenda

  • Introduction

  • Was it the System?

  • Was it the Workload?

  • Was it the System Designer?

  • New Application Types

  • Suggestions for Collaboration

  • Conclusion


The failure trace archive failure and recovery events

The Failure Trace ArchiveFailure and Recovery Events

http://fta.inria.fr

20+ traces online

D. Kondo, B. Javadi, A. Iosup, D. Epema, The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems, CCGrid 2010 (Best Paper Award)


Was it the system

Was it the System?

  • No

    • System can grow fast

    • Good data and models to support system designers

  • Yes

    • Grid middleware unscalable [CCGrid06,HPDC09]

    • Grid middleware failure-prone [CCGrid07,Grid07]

    • Grid resources unavailable [CCGrid10]

    • Inability to load balance well [SC|07]

    • Poor online information about resource availability


Agenda1

Agenda

  • Introduction

  • Was it the System?

  • Was it the Workload?

  • Was it the System Designer?

  • New Application Types

  • Suggestions for Collaboration

  • Conclusion


The grid workloads archive per job arrival start stop structure etc

The Grid Workloads ArchivePer-Job Arrival, Start, Stop, Structure, etc.

http://gwa.ewi.tudelft.nl

6 traces online

1.5 yrs

>750K

>250

A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, D. Epema, The Grid Workloads Archive, FGCS 24, 672—686, 2008.


How are real grids used data analysis and modeling

How Are Real Grids Used? Data Analysis and Modeling

  • Grids vs. parallel production environments such as clusters and (small) supercomputers

  • Bags of single-processor tasks vs. single parallel jobs

  • Bigger bursts of job arrivals

  • More jobs

Grid Systems

Parallel production environments


Grid workloads analysis grid workload components

Grid WorkloadsAnalysis: Grid Workload Components

Workflows (WFs)

Bags-of-Tasks (BoTs)

Time [units]

  • BoT size = 2-70 tasks, most 5-20

  • Task runtime highly variable, from minutes to tens of hours

  • WF size = 2-1k tasks, most 30-40

  • Task runtime of minutes


Was it the workload

Was it the Workload?

  • No

    • Similar workload characteristics across grids

    • High utilization possible due to single-node jobs

    • High load imbalance

    • Good data and models to support system designers[Grid06,EuroPar08,HPDC08-10,FGCS08]

  • Yes

    • Too many tasks (system limitation)

    • Poor online information about job characteristics +High variability of job resource requirements

    • How to schedule BoTs, WFs, mixtures in grids?


Agenda2

Agenda

  • Introduction

  • Was it the System?

  • Was it the Workload?

  • Was it the System Designer?

  • New Application Types

  • Suggestions for Collaboration

  • Conclusion


Problems in grid scheduling and resource management the system

Problems in Grid Scheduling and Resource Management The System

Grid schedulers do not own resources themselves

They have to negotiate with autonomous local schedulers

Authentication/multi-organizational issues

Grid schedulers interface to local schedulers

Some may have support for reservations, others are queuing-based

Grid resources are heterogeneous and dynamic

Hardware (processor architecture, disk space, network)

Basic software (OS, libraries)

Grid software (middleware)

Resources may fail

Lack of complete and accurate resource information


Problems in grid scheduling and resource management the workloads

Problems in Grid Scheduling and Resource Management The Workloads

Workloads are heterogeneous and dynamic

Grid schedulers may not have control over the full workload (multiple submission points)

Jobs may have performance requirements

Lack of complete and accurate job information

Application structure is heterogeneous

Single sequential job

Bags of Tasks; parameter sweeps (Monte Carlo), pilot jobs

Workflows, pipelines, chains-of-tasks

Parallel jobs (MPI); malleable, coallocated


The koala grid scheduler

The Koala Grid Scheduler

Developed in the DAS system

Has been deployed on the DAS-2 in September 2005

Ported to DAS-3 in April 2007

Independent from grid middlewares such as Globus

Runs on top of local schedulers

Objectives:

Data and processor co-allocation in grids

Supporting different application types

Specialized application-oriented scheduling policies

Koala homepage: http://www.st.ewi.tudelft.nl/koala/


Koala in a nutshell

Koala in a Nutshell

A bridge between theory and practice

  • Parallel Applications

    • MPI, Ibis,…

    • Co-Allocation

    • Malleability

  • Parameter Sweep Applications

    • Cycle Scavenging

    • Run as low-priority jobs

  • Workflows


Inter operating grids through delegated matchmaking inter operation architectures

Inter-Operating Grids Through Delegated MatchMakingInter-Operation Architectures

Delegated MatchMaking

Independent

Centralized

Hybrid hierarchical/ decentralized

Hierarchical

Decentralized


Inter operating grids through delegated matchmaking the delegated matchmaking mechanism

Resource request

Local load too high

Bind remote resource

Delegate

Resource usage rights

Inter-Operating Grids Through Delegated MatchMaking The Delegated MatchMaking Mechanism

  • Deal with local load locally (if possible)

  • When local load is too high, temporarily bind resources from remote sites to the local environment.

    • May build delegation chains.

    • Delegate resource usage rights, do not migrate jobs.

  • Deal with delegations each delegation cycle (delegated matchmaking)

The Delegated MatchMaking Mechanism=Delegate Resource Usage Rights, Do Not Delegate Jobs


What is the potential gain of grid inter operation delegated matchmaking vs alternatives

DMM

Decentralized

Centralized

Independent

What is the Potential Gain of Grid Inter-Operation?Delegated MatchMaking vs. Alternatives

(Higher is better)

  • DMM

    • High goodput

    • Low wait time

    • Finishes all jobs

  • Even better for load imbalance between grids

  • Reasonable overhead

  • [see thesis]

Grid Inter-Operation (through DMM)delivers good performance


4 2 studies on grid scheduling 5 5 scheduling under cycle stealing

4.2. Studies on Grid Scheduling [5/5]Scheduling under Cycle Stealing

monitors/informs idle/demanded resources

  • CS Policies:

  • Equi-All:

  • grid-wide basis

  • Equi-PerSite:

  • per cluster

Head Node

Scheduler

KCM

Node

  • Requirements

  • Unobtrusiveness

    Minimal delay for (higher priority) local and grid jobs

  • Fairness

    3. Dynamic Resource Allocation

    4. Efficiency

    5. Robustness and Fault Tolerance

grow/shrink

messages

submits

launchers

registers

Launcher

submits

PSA(s)

CS-Runner

Launcher

deploys, monitors, and preempts tasks

JDF

Clusters

  • Application Level Scheduling:

  • Pull-based approach

  • Shrinkage policy

Deployed as Koala Runner

O. Sonmez, B. Grundeken, H. Mohamed, A. Iosup, D. Epema: Scheduling Strategies for Cycle Scavenging in Multicluster Grid Systems. CCGRID 2009: 12-19


Was it the system designer

Was it the System Designer?

  • No

    • Mechanisms to inter-operate grids: DMM [SC|07], …

    • Mechanisms to run many grid application types: WFs, BoTs, parameter sweeps, cycle scavenging, …

    • Scheduling algorithms with inaccurate information [HPDC ‘08, ‘09, ‘10]

    • Tools for empirical and trace-based experimentation

  • Yes

    • Still too many tasks

    • What about new application types?


Agenda3

Agenda

  • Introduction

  • Was it the System?

  • Was it the Workload?

  • Was it the System Designer?

  • New Application Types

  • Suggestions for Collaboration

  • Conclusion


Msgs are a popular growing market

Sources: MMOGChart, own research.

Sources: ESA, MPAA, RIAA.

MSGs are a Popular, Growing Market

  • 25,000,000 subscribed players (from 150,000,000+ active)

  • Over 10,000 MSGs in operation

  • Market size 7,500,000,000$/year


Massively social gaming as new grid cloud application

Romeo and Juliet

Massively Social Gaming as New Grid/Cloud Application

Massively Social Gaming

(online) games with massive numbers of players (100K+), for which social interaction helps the gaming experience

[SC|08, TPDS’10]

  • Virtual worldExplore, do, learn, socialize, compete+

  • ContentGraphics, maps, puzzles, quests, culture+

  • Game analyticsPlayer stats and relationships

[EuroPar09 BPAward, CPE10]

[ROIA09]


Grid computing from old traces to new applications

Suggestions for Collaboration

  • Scheduling mixtures of grid/HPC/cloud workloads

    • Scheduling and resource management in practice

    • Modeling aspects of cloud infrastructure and workloads

  • Condor on top of Mesos

  • Massively Social Gaming and Mesos

    • Step 1: Game analytics and social network analysis in Mesos

  • The Grid Research Toolbox

    • Using and sharing traces: The Grid Workloads Archive and The Failure Trace Archive

    • GrenchMark: testing large-scale distributed systems

    • DGSim: simulating multi-cluster grids


Thank you questions observations

Thank you! Questions? Observations?

Alex Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Dick Epema

email: [email protected]

  • More Information:

    • The Koala Grid Scheduler: www.st.ewi.tudelft.nl/koala

    • The Grid Workloads Archive: gwa.ewi.tudelft.nl

    • The Failure Trace Archive: fta.inria.fr

    • The DGSim simulator: www.pds.ewi.tudelft.nl/~iosup/dgsim.php

    • The GrenchMark perf. eval. tool: grenchmark.st.ewi.tudelft.nl

    • Cloud research: www.st.ewi.tudelft.nl/~iosup/research_cloud.html

    • Gaming research: www.st.ewi.tudelft.nl/~iosup/research_gaming.html

    • see PDS publication database at: www.pds.twi.tudelft.nl/

DGSim

Big thanks to our collaborators: U. Wisc.-Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, …


  • Login