Job scheduling market based and proportional sharing
Download
1 / 59

Job Scheduling: Market-based and Proportional Sharing - PowerPoint PPT Presentation


  • 264 Views
  • Uploaded on

Job Scheduling: Market-based and Proportional Sharing. Richard Dutton CSci 780 – P2P and Grid Systems November 22, 2004. Importance. Computing and storage resources are being shared among many users Grids, utility computing, “cluster utilities”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Job Scheduling: Market-based and Proportional Sharing' - Michelle


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Job scheduling market based and proportional sharing l.jpg

Job Scheduling: Market-based and Proportional Sharing

Richard Dutton

CSci 780 – P2P and Grid Systems

November 22, 2004


Importance l.jpg
Importance

  • Computing and storage resources are being shared among many users

    • Grids, utility computing, “cluster utilities”

  • Sharing can improve resource efficiency and flexibility and bring more computing power

  • Difficulty comes in controlling resource usage


Outline l.jpg
Outline

  • Market-based framework

    • Motivation/Problem

    • Characteristics of approach

  • Interposed Proportional Sharing

    • Goals

    • Methodology

    • Experimentation/Results

  • Conclusions

  • Discussion



Motivation l.jpg
Motivation

  • Grids enable sharing of resources

    • Users have access to more resources

    • Resource usage must be arbitrated for competing demands

  • Both parties (resource provider and consumer) must benefit

    • Provider – effective global use of resources

    • Consumer – fairness, predictable behavior, control over relative priority of jobs


Motivation 2 l.jpg
Motivation(2)

  • Why market-based approach?

    • Large scale of resources and participants in Grid

    • Varying supply and demand

  • Market-based approach is good because

    • Provides decentralized resource management

    • Selfish consumers lead to global goal

      • User gets jobs done quickly, providers efficiently delegate resources

      • Laissez faire


Ideas from batch systems l.jpg
Ideas from batch systems

  • Batch systems incorporate:

    • User priority

    • Weighted proportional sharing

    • SLAs

      for setting bounds on available resources

  • Additionally, market-based approach will use relative urgency and cost of jobs


Value based system l.jpg
Value-based system

  • Value-based (sometimes called user-centric) scheduling allows users to define a value (yield or utility) to each job

  • System is trying to maximize total value of jobs completed rather than simply meeting deadlines or reaching a certain throughput

  • Users bid for resources and pay with the value (value  currency)

  • System sells to highest “bidder” to maximize profits


Risk vs reward l.jpg
Risk vs. Reward

  • Focus here is scheduling in grid service sites

  • Since price is derived from completion time, scheduler must take into consideration length of a task with its value and opportunity cost

  • What this means: scheduler must balance the risk of deferring a task with the reward of scheduling the task


Example market based task service l.jpg
Example: Market-Based Task Service

  • Tasks are batch computation jobs

    • Self-contained units of work

    • Execute anywhere

    • Consume known resources

  • Tasks give some value upon completion

  • Tasks associated with value function – gives value as function of completion time


Example market based task service11 l.jpg
Example: Market-Based Task Service

  • Characteristics of a market-based task service

    • Negotiation between customers and providers

      • Value  price and quality of service  completion time

    • Form contracts for task execution

      • Not meeting terms of the contract implies a penalty

    • Consumers look for the best deal and each site attempts to maximize its profits


Slide12 l.jpg

Market Framework

Bid (value, service demand)

Accept (completion time, price)

Accept (contract)

Customer

Bid (value, service demand)

Reject

Bid (value, service demand)

Accept (completion time, price)

Reject

Task Service Sites


Goals l.jpg
Goals

  • Develop policy choices for the task service sites to maximize profits

    • Acceptance – admission control

    • Scheduling

  • Use value metric to balance risk and reward in bidding and scheduling

  • Not concerned with currency supply, pricing systems, incentive mechanisms, payment…


Value functions l.jpg
Value functions

  • Negotiation between site and bidder establishes agreement on price and QoS

  • Value function maps service quality (completion time) to value

    • Want the formulation to be “simple, rich, and tractable”

    • Generalization of linear decay value functions from Millenium

    • Expresses how value of task degrades with time – decayi


Value function l.jpg

Maximum Value

Runtime

Value

Penalty

Time

Value function


Decisions l.jpg
Decisions

  • Which tasks to admit?

  • When to run an admitted task?

    • Wish to maximize profit

  • How much should a task be charged?

    • Based on value functions

  • Must find highest priced tasks and reject those which do not meet some minimum levels


Experimental methodology l.jpg
Experimental Methodology

  • Simulator to allow bidding and schedule according to a task service economy with linear value functions

  • Use synthetic traces that are representative of real batch workloads

  • Compare against FirstPrice from Millenium

  • Concerned with relative performance and sensitivity analysis of using value and decay


Risk reward heuristics l.jpg
Risk/Reward Heuristics

  • Discounting Future Gains

    • Leans toward shorter tasks – less likely to be preempted

    • Realizes gains more quickly with short tasks – risk-averse scheduler

  • Opportunity Cost – takes into account the slope of decay

    • Leans toward more urgent tasks

    • If all tasks must be completed, it is best to complete most urgent tasks first


Discounting future gains l.jpg
Discounting Future Gains

  • Based on Present Value from finance

    • PVi = yieldi / (1 + (discount_rate * RPTi))

    • PVi can be thought of as investment value

    • Interest is earned at discount_rate for Remaining Processing Time (RPT)

    • High discount_rate causes the system to be more risk-averse

  • HeuristiccalledPV selects jobs in order of discounted gain PVi/RPTi



Opportunity cost l.jpg
Opportunity Cost

  • Extendedheuristictoaccountforlossesfromopportunitycost

    • Loss in revenue from choosing some task i before task j

  • Opportunity cost to start i is given by aggregate loss of all other competing tasks

  • Bounded penalties require O(n2) time

  • Unbounded penalties computed in O(log n)


Balancing gains and opportunity cost l.jpg
Balancing Gains and Opportunity Cost

  • It is risky to defer gains from high-value task based solely on opportunity cost

  • Solution: FirstReward

    • rewardi = ((α)*PVi – (1-α)*costi)/RPTi

  • The αparameter controls how much system considers expected gains

    • α=1 and discount_rate =0 reduces FirstReward to PV


  • Bounded penalties l.jpg
    Bounded Penalties

    • Shows it is more important to consider costs than gains  low alpha

    • Most effective around α=.3


    Unbounded penalties l.jpg
    Unbounded Penalties

    • Shows it is ONLY important to consider costs, not gains

    • Magnitude of improvements much greater


    Negotiation l.jpg
    Negotiation

    • Client submits task bids

    • Site accepts/rejects bid

    • If site accepts, it negotiates to set a price and completion time


    Admission control l.jpg
    Admission Control

    • Steps for proposed tasks

      • Integrate task into candidate schedule according to FirstReward

      • Determine yield for the task if accepted

      • Apply acceptance heuristic to determine acceptability

      • If accepting, issue a bid to the client

      • If client accepts the contract, place task into schedule to execute

  • Acceptance heuristic based on amount of additional delay the task can allow before its value falls below some yield threshold


  • Summary of market based services l.jpg
    Summary of Market-based Services

    • Develops heuristics for scheduling and admission control in market-based grid task service

    • Value-based scheduling allows user to specify the value and urgency of the job

    • Maximizing user value in turn maximizes yield globally

    • Approach based on computational economy



    Overview l.jpg
    Overview

    • This paper deals with share-based scheduling algorithms for differentiated service in network services, in particular storage service utilities

    • Allows a server to be shared among many request flows with some probabilistic assurance of receiving some minimum share of resources


    Situation l.jpg
    Situation

    • Sharing of resources must be fair

    • SLA’s often define contractual obligations between client and service


    Goals of this research l.jpg
    Goals of This Research

    • Performance isolation

      • A surge from one flow should not degrade the performance of another flow

  • Differentiated application service quality

    • Performance should be predictable and configurable

  • Non-intrusive resource control

    • Designed to work without changes to existing services, like commercial storage servers

    • Views server as a black box

    • Control server resources externally


  • Idea in words l.jpg
    Idea in Words

    • As the name suggests, the idea is to interpose a request scheduler between the client and server

    • The scheduler will intercept requests to the server

    • Depending on the request and state of previous requests, it will delay, reorder, or simply dispatch the request



    Interposed request scheduling l.jpg
    Interposed Request Scheduling

    • Scheduler intercepts requests

    • Dispatches according to some policies seeking to fairly share resources among all flows

    • Parameter D limits maximum number of outstanding requests

    • Each flow has separate queue

    • Scheduler dispatches from each queue on FIFO basis


    Related approaches l.jpg
    Related Approaches

    • Façade proposes an interposed request scheduler that uses Earliest Deadline First(EDF)

      • Drawback: unfair – cannot provide performance isolation

      • Uses priority scheduling to achieve isolation

    • SLEDS – per-client network adapter

      • Uses leaky bucket filter to shape and throttle I/O flows

      • Not work-conserving


    Proportional sharing l.jpg
    Proportional Sharing

    • Proposes 3 proportional sharing algorithms

      • SFQ(D) – Start-time Fair Queuing

      • FSFQ(D) – Four-tag Start-time Fair Queuing

      • RW(D) – Request Windows

        which are general and configurable solutions

        that provide

        • Performance isolation

        • Fairness

        • Work-conservation


    Fair queuing l.jpg
    Fair Queuing

    • Each flow is assigned a weight Φf

    • Resources allocated among active flows in proportion to weight

    • A flow is active if it has at least 1 outstanding request

    • Fair: Proven property bounding difference in work done for any pair of active flows (lag)

    • Work-conserving: surplus resources consumed by active flows without penalty


    Start time fair queuing l.jpg
    Start-time Fair Queuing

    • Start-time Fair Queuing(SFQ) is the basis for the scheduling algorithms due to fairness

    • SFQ assigns a tag to each request upon arrival and dispatches the requests in ascending order of tags

    • Fairness stems from method of computing and assigning tags


    Slide39 l.jpg
    SFQ

    • Assigns a start tag and finish tag for each request

      • Start tag:

      • Finish tag:

    • Defines a system notion of virtual time v(t) that advances as active flows progress

    • For example, v(t) advances quickly with less competition in order to use surplus resources


    Slide40 l.jpg
    SFQ

    • Start tag of a flow’s most recent request acts as the flow’s virtual clock

      • Flow with small tag value is behind and will receive priority

      • Flow with large tag value are ahead and may be held back

    • However, newly active flows will have their tag values set by v(t) so that there is fair competition between all active flows

    • Drawback: traditional SFQ [specifically v(t) ] does not work well in the face of concurrency


    Interposed proportional sharing41 l.jpg
    Interposed Proportional Sharing

    • Goal: use a variant of SFQ for interposed scheduling which handles up to D requests concurrently

      • Ideal goal: the interposed scheduler can dispatch enough jobs concurrently to completely use resources

  • Scheduler wants to always have D concurrent outstanding requests

  • This value D represents a tradeoff between server resource utilization and scheduler fairness

    • Example: large D allows server to always have jobs waiting, but also increases the wait time for incoming requests


  • Min sfq l.jpg
    Min-SFQ

    • Adaptation of SFQ which defines v(t) as the minimum start-tag of any outstanding request

    • Issue:

      • v(t) advances according to slowest active flow f

      • Sudden burst from f will penalize aggressive flows which are using surplus from f’s idle resources

      • If v(t) lags behind, it degrades down to Virtual Clock algorithm

      • If v(t) gets too far ahead, becomes FIFO

        • Both known to be unfair


    Sfq d l.jpg
    SFQ(D)

    • Goal is to advance v(t) fairly

    • Solution1: derive v(t) from active flows, not lagging flows

    • v(t) is defined as the start tag of the queued request with the lowest start tag at the last dispatch

    • Still uses the initial rules for determining the tags


    Sfq d44 l.jpg
    SFQ(D)

    • Since the algorithm for dispatching is strictly SFQ, earlier properties still hold

      • Fairness

      • Bound on lag for different flows

    • Authors prove that SFQ(D) has fairness and lag bounds for requests completed

      • Client’s view of fairness


    Sfq d45 l.jpg
    SFQ(D)

    • Problems

      • v(t) advances monotonically on request dispatch events, but may not advance on every dispatch

      • Therefore, bursts of requests may get the same start tag regardless of being behind or ahead

      • It is most fair for the scheduler to be biased in these situations against flows that have been using surplus resources

    • Realization: MinSFQ doesn’t suffer from this


    Fsfq d l.jpg
    FSFQ(D)

    • Refinement of SFQ(D) that favors slow flows over ones that are ahead

    • Four-tag Start-time Fair Queuing

      • Combines fairness policies of SFQ(D) and MinSFQ(D)

      • Adds two new “adjusted “tags derived from MinSFQ(D)

      • The new tags are used to break ties in favor of lagging flows


    Problems l.jpg
    Problems

    • SFQ(D) and FSFQ(D) require a central point of interposition to intercept and schedule all requests

      • Made for network switch or router

      • Introduces single point of failure and complexity

    • Scheduling overhead grows at best logarithmically – limits scalability

    • Authors propose simple decentralized approach called Request Windows (RW)


    Request windows l.jpg
    Request Windows

    • Credit-based server access scheme

    • Interposed at the client

    • Each flow i is given a number of credits ni

    • Each request from i uses a portion of i’s credit allocation

    • For a given flow i, it’s portion of the total weight D is seen as



    Request windows50 l.jpg
    Request Windows

    • Pros

      • Under light load, a flow will encounter little congestion and complete quickly

      • Similar to self-clocking nature of TCP

    • Cons

      • RW is not fully work-conserving

      • Yields tight fairness bound, but may limit concurrency and ability to use surplus resources

    • As with SFQ(D), able to prove bound on lag between active flows


    Experiments l.jpg
    Experiments

    • Implemented prototype interposed request scheduler by extending an NFS proxy

      • Implemented SFQ(D)/FSFQ(D), RW(D) and EDF

    • Created a disk array simulator to provide most results

    • Used fstress load generator for workload dominated by random reads on large files

    • Simulated fstress workloads with different D, arrival rate, and weighted shares

    • Goal: evaluate performance isolation




    Fsfq d and rw d same experiment l.jpg
    FSFQ(D) and RW(D): Same Experiment


    Varying shares l.jpg
    Varying Shares

    • Still provides isolation from heavy users

    • Response time insensitive to weight in light load, but sensitive during heavy load

    • FSFQ(D) provides slightly better isolation than SFQ(D)

    • RW(D) provides better isolation than FSFQ(D) but less utilization of idle resources


    Varying d l.jpg
    Varying D

    • As hypothesized, increase in D weakens fairness in both FSFQ and RW

    • Low values of D have great fairness but lower levels of concurrency


    Summary l.jpg
    Summary

    • Interposed request scheduling can provide a non-intrusive form of fairly sharing resources (performance isolation)

    • FSFQ(D) provides slightly better isolation

    • RW(D) provides tightest fairness bounds but at the expense of under-utilizing the resources

    • It may be appropriate to use some hybrid between FSFQ and RW


    Discussion l.jpg
    Discussion

    • Is the market-based scheduling reasonable?

    • There are many assumptions that must be made, i.e. known costs, pricing, payment, issuance of currency, that the papers gloss over. Are these papers just concepts (like a Grid) or could they ever actually be used?


    References l.jpg
    References

    • Balancing Risk and Reward in Market-Based Task Scheduling by David Irwin, Jeff Chase, and Laura Grit. In the Thirteenth International Symposium on High Performance Distributed Computing (HPDC-13), June 2004.

    • Interposed Proportional Sharing for a Storage Service Utility by Wei Jin, Jeff Chase, and Jasleen Kaur. In the Joint International Conference on Measurement and Modeling of Computer Systems (ACM SIGMETRICS / Performance), June 2004.

    • Christopher Lumb, Arif Merchant, and Guillermo A. Alvarez. Facade: Virtual storage devices with performance guarantees. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies, San Francisco, CA, March 2003.

    • David D. Chambliss, Guillermo A. Alvarez, Prashant Pandey, Divyesh Jadav, and Tzongyu P. Lee Jian Xu, Ram Menon. Performance virtualization for large-scale storage systems. In 22nd International Symposium on Reliable Distributed Systems (SRDS '03), October 2003.

    • Pawan Goyal, Harrick M. Vin, and Haichen Chen. Start-time fair queuing: A scheduling algorithm for integrated services packet switching networks.IEEE/ACM Transactions on Networking, 5(5):690.704, October 1997.

    • B. N. Chun and D. E. Culler. User-centric performance analysis of market-based cluster batch schedulers. In 2nd IEEE International Symposium on Cluster Computing and the Grid, May 2002.

    • Duke Internet Storage and Systems Group – ISSG – http://issg.cs.duke.edu/index.html


    ad