Overprovisioning
Download
1 / 25

Overprovisioning for Performance Consistency in Grids - PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on

Overprovisioning for Performance Consistency in Grids. Nezih Yigitbasi and Dick Epema. P arallel and Distributed Systems Group Delft University of Technology. http://guardg.st.ewi.tudelft.nl/. The Problem: Performance inconsistency in grids. Inconsistent performance common in grids

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Overprovisioning for Performance Consistency in Grids' - von


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Overprovisioning

for Performance Consistency

in Grids

Nezih Yigitbasi andDick Epema

Parallel and Distributed Systems Group

Delft University of Technology

http://guardg.st.ewi.tudelft.nl/


The problem performance inconsistency in grids
The Problem: Performance inconsistency in grids

  • Inconsistent performance common in grids

    • bursty workloads

    • variable background loads

    • high rate of failures

    • highly dynamic & heterogeneous environment

How can we provide consistent performance in grids?

Bag-of-Tasks with 128 tasks submitted every 15 minutes

~70X


Our goals
Our goals

GOAL-1

Realistic performance evaluation of static and dynamic overprovisioning strategies

(system’s perspective)

GOAL-2

Dynamically determine the overprovisioning factor (Κ) for user specified performance requirements

(user’s perspective)


Outline
Outline

Overprovisioning Strategies

Experimental Setup

Results

Dynamically Determining Κ

Conclusions


Overprovisioning i
Overprovisioning (I)

  • Increasing the system capacity to provide better, and in particular, consistentperformance even under variable workloads and unexpected demands

    Pros

    • simple

    • obviates the need for complex algorithms

    • easy to deploy & maintain

      Cons

    • cost-ineffective

    • workloads may evolve (e.g., increasing user base)

    • lowly-utilized systems


Overprovisioning ii
Overprovisioning (II)

  • High overprovisioning factors (Κ) are common in modern systems

    • Google: 450,000 (2005)

    • Microsoft: 218,000 (mid-2008)

    • Facebook: 10,000+ (2009)

  • Preferred way of providing performance guarantees

    • typical data center utilization is no more than 15-50%

    • telecommunication systems have ~30% on average

L. A. Barroso and U. Hölzle, The Case for Energy-Proportional Computing, IEEE Computer, December 2007.


Overprovisioning strategies

Dynamic

Static

Waste

1. Static

  • Largest

  • All

  • Number

  • Where should we deploy the resources?

  • Does it make any difference?

    2. Dynamic

  • Dynamic overprovisioning

    • a.k.a. auto-scaling

    • low/high thresholds for acquiring/releasing resources

  • Given Κ, it is straightforward to determine the number of processors for a strategy

  • Demand

    Capacity

    Time


    Outline1
    Outline

    Overprovisioning Strategies

    Experimental Setup

    Results

    Dynamically Determining Κ

    Conclusions


    System model
    System model

    global

    queue

    local

    queues

    GRM

    • DAS-3 multi-cluster grid

      • Global Resource Managers (GRM) interacting with Local Resource Managers (LRM)

    LRM

    LRM

    LRM

    global job

    local jobs


    Workload
    Workload

    • Realistic workloads consisting of Bag-of-Tasks (BoT)

    • Simulations using 10 workloads with 80% load

      • each workload has ~1650 BoTs and ~10K tasks

      • duration of each workload is [1 day-1week]

    • Real background load trace

      • DAS-3 trace of June’08 (http://gwa.ewi.tudelft.nl/)

    (Distribution parameters are determined after base-two log transformation)


    Scheduling model
    Scheduling model

    • We consider the following BoT scheduling policies

      • Static Scheduling

        • statically partitions tasks across clusters

      • Dynamic Scheduling

        • takes cluster load into account

        • Dynamic Per Task Scheduling

        • Dynamic Per BoT Scheduling

      • Prediction-based Scheduling

        • average of the last two runtimes for prediction

        • sends the task to the cluster which is predicted to lead to the earliest completion time (ECT)


    Methodology
    Methodology

    • Compare the overprovisioned system with the initial system (NO)

    • For Dynamic

      • 69/129 s and 18/23 s for min/max acquisition/release

      • 60%/70% for low/high thresholds

      • Κvaries over time so for a fair comparison keep it in ± 10% range


    Traditional performance metrics
    Traditional performance metrics

    Makespan of a BoT

    Difference between the earliest time of submission of any of its tasks, and the latest time of completion of any of its tasks

    Normalized Schedule Length (NSL) of a BoT

    Ratio of its makespan to the sum of the runtimes of its tasks on a reference processor (slowdown)

    Makespan

    First task submitted

    Last task done


    Consistency metrics
    Consistency metrics

    • We define two metrics to capture the notion of consistency across two dimensions

    • System gets more consistent as Cd gets closer to 1, Cs gets closer to 0

    • A tighter range of the NSL is a sign of better consistency


    Outline2
    Outline

    Overprovisioning Strategies

    Experimental Setup

    Results

    Dynamically Determining Κ

    Conclusions


    Performance of scheduling policies
    Performance of scheduling policies

    Dynamic Per Task

    is the best

    ECT is the worst


    Performance of different strategies

    Different

    Strategies

    Different Overprovisioning Factors (Κ)

    • Consistency obtained with overprovisioning is much better than the initial system (NO)

    • Static strategies provide similar performance (only K matters)

      • All and Largest are viable alternatives to Number as Number increases the administration, installation, and maintenance costs

    • Dynamic strategy has better performance compared to static strategies

    • K= 2.5 is the critical value


    Cost of different strategies

    • Use CPU-Hours

      • time a processor is used [h]

      • round up a partial instance-hours to one hour similar to the Amazon EC2 on-demand instances pricing model

    • Significant reduction, as high as ~40%, in cost


    Outline3
    Outline

    Overprovisioning Strategies

    Experimental Setup

    Results

    Dynamically Determining Κ

    Conclusions


    Determining dynamically
    Determining Κ dynamically

    • So far system’s perspective, now user’s perspective

    • How can we dynamically determine Κ given the user performance requirements?

    • We use a simple feedback-control approach to deploy additional resources dynamically to meet user performance requirements


    Evaluation
    Evaluation

    • Simulated DAS-3 without background load

    • ~1.5 month workload consisting of ~33K BoTs

      • Empirically show that the controller stabilizes

    • Average makespan for the workload in the initial system (without the controller) is ~3120 minutes

    • Three scenarios from tight to loose performance requirements

      • [250m-300m]

      • [700m-750m]

      • [1000m-1250m]


    Results i
    Results (I)

    • Significant improvement, as high as ~65%, when the performance requirements are tight

    • ~40%-50% improvement for loose performance requirements


    Results ii
    Results (II)

    [700m-750m]

    [250m-300m]

    [1000m-1250m]


    Conclusions
    Conclusions

    GOAL-1: Realistic Performance Evaluation of Different Strategies

    • Overprovisioning improves performance consistency significantly

    • Static strategies provide similar performance (only K matters)

    • Dynamic strategy performs better than the static strategies

    • Need to determine the critical value to maximize the benefit of overprovisioning

    GOAL-2: Dynamically Determining Κ for Given User Performance Requirements

    • Feedback-controlled system tuning K dynamically using historical performance data and specified performance requirements

    • The number of BoTs meeting the performance requirements increases significantly, as high as 65%, compared to the initial system


    Thank you questions comments
    Thank you! Questions? Comments?

    [email protected]

    http://www.st.ewi.tudelft.nl/~nezih/

    • More Information:

      • Guard-g Project: http://guardg.st.ewi.tudelft.nl/

      • PDS publication database: http://www.pds.twi.tudelft.nl


    ad