grid computing from old traces to new applications
Download
Skip this Video
Download Presentation
Grid Computing: From Old Traces to New Applications

Loading in 2 Seconds...

play fullscreen
1 / 28

Grid Computing: From Old Traces to New Applications - PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on

The Failure Trace Archive. Grid Computing: From Old Traces to New Applications. Alexandru Iosup , Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Catalin Dumitrescu, Mathieu Jan, Dick Epema. Parallel and Distributed Systems Group, TU Delft.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Grid Computing: From Old Traces to New Applications' - hamish-mcdaniel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
grid computing from old traces to new applications

The FailureTraceArchive

Grid Computing:

From Old Traces to New Applications

Alexandru Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Catalin Dumitrescu, Mathieu Jan, Dick Epema

Parallel and Distributed Systems Group, TU Delft

Big thanks to our collaborators: U Wisc./Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, …

DGSim

Fribourg, Switzerland

about the speaker
About the Speaker
  • Systems
    • The Koala grid scheduler
    • The Tribler BitTorrent-compatible P2P file-sharing
    • The POGGI and CAMEO gaming platforms
  • Performance
    • The Grid Workloads Archive (Nov 2006)
    • The Failure Trace Archive (Nov 2009)
    • The Peer-to-Peer Trace Archive (Apr 2010)
    • Tools: DGSim trace-based grid simulator, GrenchMark workload-based grid benchmarking
  • Team of 15+ active collaborators in NL, AT, RO, US
  • Happy to be in Berkeley until September
the grid
The Grid

An ubiquitous, always-on computational and data storage platform on which users can seamlessly run their (large-scale) applications

Shared capacity & costs, economies of scale

the dutch grid das system and extensions
The Dutch Grid: DAS System and Extensions

UvA/MultimediaN (46)

VU (85 nodes)

DAS-3: a 5-cluster grid

  • 272 AMD Opteron nodes 792 cores, 1TB memory
  • Heterogeneous:
  • 2.2-2.6 GHz single/dual core nodes
  • Myrinet-10G (excl. Delft)
  • Gigabit Ethernet

SURFnet6

UvA/VL-e (41)

Clouds

10 Gb/s lambdas

  • Amazon EC2+S3, Mosso, …

DAS-4 (upcoming)

  • Multi-cores: general purpose, GPU, Cell, …

Leiden (32)

TU Delft (68)

many grids built
Many Grids Built

DAS, Grid’5000, OSG, NGS, CERN, …

Why grids and not The Grid?

agenda
Agenda
  • Introduction
  • Was it the System?
  • Was it the Workload?
  • Was it the System Designer?
  • New Application Types
  • Suggestions for Collaboration
  • Conclusion
the failure trace archive failure and recovery events
The Failure Trace ArchiveFailure and Recovery Events

http://fta.inria.fr

20+ traces online

D. Kondo, B. Javadi, A. Iosup, D. Epema, The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems, CCGrid 2010 (Best Paper Award)

was it the system
Was it the System?
  • No
    • System can grow fast
    • Good data and models to support system designers
  • Yes
    • Grid middleware unscalable [CCGrid06,HPDC09]
    • Grid middleware failure-prone [CCGrid07,Grid07]
    • Grid resources unavailable [CCGrid10]
    • Inability to load balance well [SC|07]
    • Poor online information about resource availability
agenda1
Agenda
  • Introduction
  • Was it the System?
  • Was it the Workload?
  • Was it the System Designer?
  • New Application Types
  • Suggestions for Collaboration
  • Conclusion
the grid workloads archive per job arrival start stop structure etc
The Grid Workloads ArchivePer-Job Arrival, Start, Stop, Structure, etc.

http://gwa.ewi.tudelft.nl

6 traces online

1.5 yrs

>750K

>250

A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, D. Epema, The Grid Workloads Archive, FGCS 24, 672—686, 2008.

how are real grids used data analysis and modeling
How Are Real Grids Used? Data Analysis and Modeling
  • Grids vs. parallel production environments such as clusters and (small) supercomputers
  • Bags of single-processor tasks vs. single parallel jobs
  • Bigger bursts of job arrivals
  • More jobs

Grid Systems

Parallel production environments

grid workloads analysis grid workload components
Grid WorkloadsAnalysis: Grid Workload Components

Workflows (WFs)

Bags-of-Tasks (BoTs)

Time [units]

  • BoT size = 2-70 tasks, most 5-20
  • Task runtime highly variable, from minutes to tens of hours
  • WF size = 2-1k tasks, most 30-40
  • Task runtime of minutes
was it the workload
Was it the Workload?
  • No
    • Similar workload characteristics across grids
    • High utilization possible due to single-node jobs
    • High load imbalance
    • Good data and models to support system designers[Grid06,EuroPar08,HPDC08-10,FGCS08]
  • Yes
    • Too many tasks (system limitation)
    • Poor online information about job characteristics +High variability of job resource requirements
    • How to schedule BoTs, WFs, mixtures in grids?
agenda2
Agenda
  • Introduction
  • Was it the System?
  • Was it the Workload?
  • Was it the System Designer?
  • New Application Types
  • Suggestions for Collaboration
  • Conclusion
problems in grid scheduling and resource management the system
Problems in Grid Scheduling and Resource Management The System

Grid schedulers do not own resources themselves

They have to negotiate with autonomous local schedulers

Authentication/multi-organizational issues

Grid schedulers interface to local schedulers

Some may have support for reservations, others are queuing-based

Grid resources are heterogeneous and dynamic

Hardware (processor architecture, disk space, network)

Basic software (OS, libraries)

Grid software (middleware)

Resources may fail

Lack of complete and accurate resource information

problems in grid scheduling and resource management the workloads
Problems in Grid Scheduling and Resource Management The Workloads

Workloads are heterogeneous and dynamic

Grid schedulers may not have control over the full workload (multiple submission points)

Jobs may have performance requirements

Lack of complete and accurate job information

Application structure is heterogeneous

Single sequential job

Bags of Tasks; parameter sweeps (Monte Carlo), pilot jobs

Workflows, pipelines, chains-of-tasks

Parallel jobs (MPI); malleable, coallocated

the koala grid scheduler
The Koala Grid Scheduler

Developed in the DAS system

Has been deployed on the DAS-2 in September 2005

Ported to DAS-3 in April 2007

Independent from grid middlewares such as Globus

Runs on top of local schedulers

Objectives:

Data and processor co-allocation in grids

Supporting different application types

Specialized application-oriented scheduling policies

Koala homepage: http://www.st.ewi.tudelft.nl/koala/

koala in a nutshell
Koala in a Nutshell

A bridge between theory and practice

  • Parallel Applications
    • MPI, Ibis,…
    • Co-Allocation
    • Malleability
  • Parameter Sweep Applications
    • Cycle Scavenging
    • Run as low-priority jobs
  • Workflows
inter operating grids through delegated matchmaking inter operation architectures
Inter-Operating Grids Through Delegated MatchMakingInter-Operation Architectures

Delegated MatchMaking

Independent

Centralized

Hybrid hierarchical/ decentralized

Hierarchical

Decentralized

inter operating grids through delegated matchmaking the delegated matchmaking mechanism

Resource request

Local load too high

Bind remote resource

Delegate

Resource usage rights

Inter-Operating Grids Through Delegated MatchMaking The Delegated MatchMaking Mechanism
  • Deal with local load locally (if possible)
  • When local load is too high, temporarily bind resources from remote sites to the local environment.
    • May build delegation chains.
    • Delegate resource usage rights, do not migrate jobs.
  • Deal with delegations each delegation cycle (delegated matchmaking)

The Delegated MatchMaking Mechanism=Delegate Resource Usage Rights, Do Not Delegate Jobs

what is the potential gain of grid inter operation delegated matchmaking vs alternatives

DMM

Decentralized

Centralized

Independent

What is the Potential Gain of Grid Inter-Operation?Delegated MatchMaking vs. Alternatives

(Higher is better)

  • DMM
    • High goodput
    • Low wait time
    • Finishes all jobs
  • Even better for load imbalance between grids
  • Reasonable overhead
  • [see thesis]

Grid Inter-Operation (through DMM)delivers good performance

4 2 studies on grid scheduling 5 5 scheduling under cycle stealing
4.2. Studies on Grid Scheduling [5/5]Scheduling under Cycle Stealing

monitors/informs idle/demanded resources

  • CS Policies:
  • Equi-All:
  • grid-wide basis
  • Equi-PerSite:
  • per cluster

Head Node

Scheduler

KCM

Node

  • Requirements
  • Unobtrusiveness

Minimal delay for (higher priority) local and grid jobs

  • Fairness

3. Dynamic Resource Allocation

4. Efficiency

5. Robustness and Fault Tolerance

grow/shrink

messages

submits

launchers

registers

Launcher

submits

PSA(s)

CS-Runner

Launcher

deploys, monitors, and preempts tasks

JDF

Clusters

  • Application Level Scheduling:
  • Pull-based approach
  • Shrinkage policy

Deployed as Koala Runner

O. Sonmez, B. Grundeken, H. Mohamed, A. Iosup, D. Epema: Scheduling Strategies for Cycle Scavenging in Multicluster Grid Systems. CCGRID 2009: 12-19

was it the system designer
Was it the System Designer?
  • No
    • Mechanisms to inter-operate grids: DMM [SC|07], …
    • Mechanisms to run many grid application types: WFs, BoTs, parameter sweeps, cycle scavenging, …
    • Scheduling algorithms with inaccurate information [HPDC ‘08, ‘09, ‘10]
    • Tools for empirical and trace-based experimentation
  • Yes
    • Still too many tasks
    • What about new application types?
agenda3
Agenda
  • Introduction
  • Was it the System?
  • Was it the Workload?
  • Was it the System Designer?
  • New Application Types
  • Suggestions for Collaboration
  • Conclusion
msgs are a popular growing market

Sources: MMOGChart, own research.

Sources: ESA, MPAA, RIAA.

MSGs are a Popular, Growing Market
  • 25,000,000 subscribed players (from 150,000,000+ active)
  • Over 10,000 MSGs in operation
  • Market size 7,500,000,000$/year
massively social gaming as new grid cloud application

Romeo and Juliet

Massively Social Gaming as New Grid/Cloud Application

Massively Social Gaming

(online) games with massive numbers of players (100K+), for which social interaction helps the gaming experience

[SC|08, TPDS’10]

  • Virtual worldExplore, do, learn, socialize, compete+
  • ContentGraphics, maps, puzzles, quests, culture+
  • Game analyticsPlayer stats and relationships

[EuroPar09 BPAward, CPE10]

[ROIA09]

slide27

Suggestions for Collaboration

  • Scheduling mixtures of grid/HPC/cloud workloads
    • Scheduling and resource management in practice
    • Modeling aspects of cloud infrastructure and workloads
  • Condor on top of Mesos
  • Massively Social Gaming and Mesos
    • Step 1: Game analytics and social network analysis in Mesos
  • The Grid Research Toolbox
    • Using and sharing traces: The Grid Workloads Archive and The Failure Trace Archive
    • GrenchMark: testing large-scale distributed systems
    • DGSim: simulating multi-cluster grids
thank you questions observations
Thank you! Questions? Observations?

Alex Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Dick Epema

email: [email protected]

  • More Information:
    • The Koala Grid Scheduler: www.st.ewi.tudelft.nl/koala
    • The Grid Workloads Archive: gwa.ewi.tudelft.nl
    • The Failure Trace Archive: fta.inria.fr
    • The DGSim simulator: www.pds.ewi.tudelft.nl/~iosup/dgsim.php
    • The GrenchMark perf. eval. tool: grenchmark.st.ewi.tudelft.nl
    • Cloud research: www.st.ewi.tudelft.nl/~iosup/research_cloud.html
    • Gaming research: www.st.ewi.tudelft.nl/~iosup/research_gaming.html
    • see PDS publication database at: www.pds.twi.tudelft.nl/

DGSim

Big thanks to our collaborators: U. Wisc.-Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, …

ad