the performance of bags of tasks in large scale distributed computing systems
Download
Skip this Video
Download Presentation
The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems

Loading in 2 Seconds...

play fullscreen
1 / 24

The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems - PowerPoint PPT Presentation


  • 160 Views
  • Uploaded on

The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems. Alexandru Iosup , Ozan Sonmez, Shanny Anoep, and Dick Epema. Parallel and Distributed Systems Group, TU Delft. ACM/IEEE Int’l. Symposium on High Performance Distributed Computing.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems' - bob


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the performance of bags of tasks in large scale distributed computing systems
The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems

Alexandru Iosup, Ozan Sonmez, Shanny Anoep, and Dick Epema

Parallel and Distributed Systems Group, TU Delft

ACM/IEEE Int’l. Symposium on High Performance Distributed Computing

the vl e project

Natural gas price →$$ for grid computing

The VL-e project
  • A grid project in the Netherlands (2004-)
  • Natural gas money: VL-e 45 MEuro / 800 MEuro total research package
  • Overall aim:

… to design and build a virtual lab for (digitally) enhanced science (e-science)experiments (no in-vivo or in-vitro, but in-silico experiments).

  • Goals:
    • create prototypes of application-specific e-science environments
    • design and develop re-usable ICT/grid components
    • validate with real-life applications in testbeds
the vl e project application areas

Grid Services

Harness multi-domain distributed resources

The VL-e project: application areas

Philips

IBM

Unilever

Data

Intensive

Science

Medical

Diagnosis &

Imaging

Bio-

Diversity

Bio-

Informatics

Food

Informatics

Dutch

Telescience

Virtual Laboratory (VL)

Application Oriented Services

Management

of comm. &

computing

the vl e project application areas1

Grid Services

Harness multi-domain distributed resources

The VL-e project: application areas

Philips

IBM

Unilever

Bags-of-Tasks

Data

Intensive

Science

Medical

Diagnosis &

Imaging

Bio-

Diversity

Bio-

Informatics

Food

Informatics

Dutch

Telescience

Virtual Laboratory (VL)

Application Oriented Services

Management

of comm. &

computing

the vl e project application areas2

Grid Services

Harness multi-domain distributed resources

The VL-e project: application areas

Philips

IBM

Unilever

Data

Intensive

Science

Medical

Diagnosis &

Imaging

Bio-

Diversity

Bio-

Informatics

Food

Informatics

Dutch

Telescience

Bags-of-Tasks

Virtual Laboratory (VL)

Application Oriented Services

Management

of comm. &

computing

the challenge
The Challenge
  • Complete scientific work better, …
    • User-oriented performance metrics(time a critical performance component)
    • Bags-of-tasks for ease-of-use
  • … in real systems
    • Workloads (now that real traces are available)
    • Information unavailability
  • What to do?
    • Hint: the next 10% improvement won’t cut it!
the challenge cont d
The Challenge (cont’d.)
  • System modelWhat is a good model for the study of large-scale distributed computing systems that run bag-of-tasks?
  • Input modelWhat is a good model for bag-of-tasks workloads in large-scale distributed computing systems?
  • What is the best setup for such system/input?
    • How to find the best?
    • If a best is found, can there be another?
the performance of bags of tasks in large scale distributed computing systems1
The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems
  • Introduction and Motivation
  • Context: System Model
  • Workload Model
  • Design Space Exploration
  • Conclusion
context system model 1 4 overview
Context: System Model [1/4]Overview
  • System Model
    • Clustersexecute jobs
    • Resource managerscoordinate job execution
    • Resource management architecturesroute jobs among resource managers
    • Task selection policiescreate the eligible set
    • Task scheduling policies:schedule the eligible set
context system model 2 4 resource management architectures route jobs among resource managers

Separated Clusters (sep-c)

Centralized (csp)

Decentralized (fcondor)

Context: System Model [2/4]Resource Management Architecturesroute jobs among resource managers
context system model 3 4 task selection policies create the eligible set
Context: System Model [3/4]Task Selection Policiescreate the eligible set
  • Age-based:
    • S-T: Select Tasks in the order of their arrival.
    • S-BoT: Select BoTs in the order of their arrival.
  • User priority based:
    • S-U-Prio: Select the tasks of the User with the highest Priority.
  • Based on fairness in resource consumption:
    • S-U-T: Select the Tasks of the User with the lowest res. cons.
    • S-U-BoT: Select the BoTs of the User with the lowest res. cons.
    • S-U-GRR: Select the User Round-Robin/all tasks for this user.
    • S-U-RR: Select the User Round-Robin/one task for this user.
context system model 4 4 task scheduling policies schedule the eligible set

Task Information

K

H

U

ECT, FPLT

K

ECT-P

FPF

Resource Information

DFPLT,MQD

H

RR, WQR

U

STFR

Context: System Model [4/4]Task Scheduling Policiesschedule the eligible set
  • Information availability:
    • Known
    • Unknown
    • Historical records
  • Sample policies:
    • Earliest Completion Time (with Prediction of Runtimes) (ECT(-P))
    • Fastest Processor First (FPF)
    • (Dynamic) Fastest Processor Largest Task ((D)FPLT)
    • Shortest Task First w/ Replication (STFR)
    • Work Queue w/ Replication (WQR)
the performance of bags of tasks in large scale distributed computing systems2
The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems
  • Introduction and Motivation
  • Context: System Model
  • Workload Model
  • Design Space Exploration
  • Conclusion
workload modeling 101 what matters
Workload Modeling 101: What Matters

TimeUnit=100s

Longer queues

  • Job arrival process & job service time:
    • Self-similarity (burstiness) vs. Poisson [Leland & Ott ToN’94]
    • Job grouping: bags-of-tasks dominant application type in multi-cluster grids and cycle-scavenging systems (the e-Science infrastructure)[IosupJSE EuroPar’07]
  • Job size: almost always 1CPU [IosupDELW Grid’06]

No.Packets/Time Unit

TimeUnit=0.01s

No.Packets/Time Unit

Time Units

Time Units

a bag of tasks workload model
A Bag-of-Tasks Workload Model
  • Model:
    • Users, Bags-of-Tasks, Tasks
    • Heavy-tailed distributions for inter-arrival time, job service time→ can model self-similar workloads
    • More details (e.g., parameter values): see article
  • Validation data: the Grid Workloads Archive
    • 7 long-term grid traces
    • >5 million tasks
    • >2500 users
    • >40k CPUs
    • Domains: HEP, graphics, AI, math, biomed, climate, finance, aero…

http://gwa.ewi.tudelft.nl/

the performance of bags of tasks in large scale distributed computing systems3
The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems
  • Introduction and Motivation
  • Context: System Model
  • Workload Model
  • Design Space Exploration
  • Conclusion
design space exploration 1 5 overview
Design Space Exploration [1/5]Overview
  • Design space exploration: time to understand how our solutions fit into the complete system.
  • Study the impact of:
    • The Task Scheduling Policy (s policies)
    • The Workload Characteristics (P characteristics)
    • The Dynamic System Information (I levels)
    • The Task Selection Policy (S policies)
    • The Resource Management Architecture (A policies)

s x 7P x I x S x A x (environment) → >2M design points

design space exploration 2 5 experimental setup
Design Space Exploration [2/5]Experimental Setup
  • Simulator:
    • DGSim [IosupETFL SC’07, IosupSE EuroPar’08]
  • System:
    • DAS + Grid’5000 [Cappello & Bal CCGrid’07]
    • >3,000 CPUs: relative perf. 1-1.75
  • Metrics:
    • Makespan
    • Normalized Schedule Length ~ speed-up
  • Workloads:
    • Real: DAS + Grid’5000
    • Realistic: system load 20-95% (from workload model)
design space exploration 3 5 selected results a design guidelines for scheduling policies
Design Space Exploration [3/5] Selected Results ADesign Guidelines for Scheduling Policies
  • Influence of the information type:
    • (K,K): best balance between MS and NSL
    • (*,U),(U,*): surprisingly good (FPF) to surprisingly poor (WQR4x)
    • (*,H),(H,*): poor. Simple runtime predictors don’t work (see article)
  • Where to invest time?
    • K -> H, K-> U: adapt for information type with lowest variation

WQR4x

FPF

design space exploration 4 5 selected results b task selection only for busy systems
Design Space Exploration [4/5] Selected Results B Task Selection Only for Busy Systems
  • Not much difference until system load over 50%.
    • For DAS + Grid’5000 no change of task selection policy.

S-BoT

Same performance

S-T

design space exploration 5 5 selected results c resource management architecture
Design Space Exploration [5/5] Selected Results C Resource Management Architecture
  • Centralized, separated, or distributed?
    • Centralized is best [Note: job overhead not considered.]
    • Distributed: good for system load below 50%; over 50% it does not finish all tasks.
the performance of bags of tasks in large scale distributed computing systems4
The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems
  • Introduction and Motivation
  • Context: System Model
  • Workload Model
  • Design Space Exploration
  • Conclusion
conclusion

Task Information

K

H

U

ECT, FPLT

K

ECT-P

FPF

Resource Information

DFPLT,MQD

H

RR, WQR

U

STFR

Conclusion

• System Model = Resource Management Architecture + Task Selection Policy + Task Scheduling Policy

• Information availability framework

• BoT workload model

• Design space exploration: the performance of bags-of-tasks

?

Future Work

  • Better predictors
  • (H,H) task scheduling policies
thank you questions remarks observations
Thank you! Questions? Remarks? Observations?
  • Contact: [email protected] [google “Iosup“]
  • Web sites:
    • http://www.vl-e.nl : VL-e project
    • http://www.pds.ewi.tudelft.nl : PDS group articles & software

Help building the Grid Workloads Archive:http://gwa.ewi.tudelft.nl

ad