job scheduling market based and proportional sharing l.
Skip this Video
Loading SlideShow in 5 Seconds..
Job Scheduling: Market-based and Proportional Sharing PowerPoint Presentation
Download Presentation
Job Scheduling: Market-based and Proportional Sharing

Loading in 2 Seconds...

play fullscreen
1 / 59

Job Scheduling: Market-based and Proportional Sharing - PowerPoint PPT Presentation

  • Uploaded on

Job Scheduling: Market-based and Proportional Sharing. Richard Dutton CSci 780 – P2P and Grid Systems November 22, 2004. Importance. Computing and storage resources are being shared among many users Grids, utility computing, “cluster utilities”

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Job Scheduling: Market-based and Proportional Sharing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
job scheduling market based and proportional sharing

Job Scheduling: Market-based and Proportional Sharing

Richard Dutton

CSci 780 – P2P and Grid Systems

November 22, 2004

  • Computing and storage resources are being shared among many users
    • Grids, utility computing, “cluster utilities”
  • Sharing can improve resource efficiency and flexibility and bring more computing power
  • Difficulty comes in controlling resource usage
  • Market-based framework
    • Motivation/Problem
    • Characteristics of approach
  • Interposed Proportional Sharing
    • Goals
    • Methodology
    • Experimentation/Results
  • Conclusions
  • Discussion
  • Grids enable sharing of resources
    • Users have access to more resources
    • Resource usage must be arbitrated for competing demands
  • Both parties (resource provider and consumer) must benefit
    • Provider – effective global use of resources
    • Consumer – fairness, predictable behavior, control over relative priority of jobs
motivation 2
  • Why market-based approach?
    • Large scale of resources and participants in Grid
    • Varying supply and demand
  • Market-based approach is good because
    • Provides decentralized resource management
    • Selfish consumers lead to global goal
      • User gets jobs done quickly, providers efficiently delegate resources
      • Laissez faire
ideas from batch systems
Ideas from batch systems
  • Batch systems incorporate:
    • User priority
    • Weighted proportional sharing
    • SLAs

for setting bounds on available resources

  • Additionally, market-based approach will use relative urgency and cost of jobs
value based system
Value-based system
  • Value-based (sometimes called user-centric) scheduling allows users to define a value (yield or utility) to each job
  • System is trying to maximize total value of jobs completed rather than simply meeting deadlines or reaching a certain throughput
  • Users bid for resources and pay with the value (value  currency)
  • System sells to highest “bidder” to maximize profits
risk vs reward
Risk vs. Reward
  • Focus here is scheduling in grid service sites
  • Since price is derived from completion time, scheduler must take into consideration length of a task with its value and opportunity cost
  • What this means: scheduler must balance the risk of deferring a task with the reward of scheduling the task
example market based task service
Example: Market-Based Task Service
  • Tasks are batch computation jobs
    • Self-contained units of work
    • Execute anywhere
    • Consume known resources
  • Tasks give some value upon completion
  • Tasks associated with value function – gives value as function of completion time
example market based task service11
Example: Market-Based Task Service
  • Characteristics of a market-based task service
    • Negotiation between customers and providers
      • Value  price and quality of service  completion time
    • Form contracts for task execution
      • Not meeting terms of the contract implies a penalty
    • Consumers look for the best deal and each site attempts to maximize its profits

Market Framework

Bid (value, service demand)

Accept (completion time, price)

Accept (contract)


Bid (value, service demand)


Bid (value, service demand)

Accept (completion time, price)


Task Service Sites

  • Develop policy choices for the task service sites to maximize profits
    • Acceptance – admission control
    • Scheduling
  • Use value metric to balance risk and reward in bidding and scheduling
  • Not concerned with currency supply, pricing systems, incentive mechanisms, payment…
value functions
Value functions
  • Negotiation between site and bidder establishes agreement on price and QoS
  • Value function maps service quality (completion time) to value
    • Want the formulation to be “simple, rich, and tractable”
    • Generalization of linear decay value functions from Millenium
    • Expresses how value of task degrades with time – decayi
value function

Maximum Value





Value function
  • Which tasks to admit?
  • When to run an admitted task?
    • Wish to maximize profit
  • How much should a task be charged?
    • Based on value functions
  • Must find highest priced tasks and reject those which do not meet some minimum levels
experimental methodology
Experimental Methodology
  • Simulator to allow bidding and schedule according to a task service economy with linear value functions
  • Use synthetic traces that are representative of real batch workloads
  • Compare against FirstPrice from Millenium
  • Concerned with relative performance and sensitivity analysis of using value and decay
risk reward heuristics
Risk/Reward Heuristics
  • Discounting Future Gains
    • Leans toward shorter tasks – less likely to be preempted
    • Realizes gains more quickly with short tasks – risk-averse scheduler
  • Opportunity Cost – takes into account the slope of decay
    • Leans toward more urgent tasks
    • If all tasks must be completed, it is best to complete most urgent tasks first
discounting future gains
Discounting Future Gains
  • Based on Present Value from finance
    • PVi = yieldi / (1 + (discount_rate * RPTi))
    • PVi can be thought of as investment value
    • Interest is earned at discount_rate for Remaining Processing Time (RPT)
    • High discount_rate causes the system to be more risk-averse
  • HeuristiccalledPV selects jobs in order of discounted gain PVi/RPTi
opportunity cost
Opportunity Cost
  • Extendedheuristictoaccountforlossesfromopportunitycost
    • Loss in revenue from choosing some task i before task j
  • Opportunity cost to start i is given by aggregate loss of all other competing tasks
  • Bounded penalties require O(n2) time
  • Unbounded penalties computed in O(log n)
balancing gains and opportunity cost
Balancing Gains and Opportunity Cost
  • It is risky to defer gains from high-value task based solely on opportunity cost
  • Solution: FirstReward
      • rewardi = ((α)*PVi – (1-α)*costi)/RPTi
  • The αparameter controls how much system considers expected gains
    • α=1 and discount_rate =0 reduces FirstReward to PV
bounded penalties
Bounded Penalties
  • Shows it is more important to consider costs than gains  low alpha
  • Most effective around α=.3
unbounded penalties
Unbounded Penalties
  • Shows it is ONLY important to consider costs, not gains
  • Magnitude of improvements much greater
  • Client submits task bids
  • Site accepts/rejects bid
  • If site accepts, it negotiates to set a price and completion time
admission control
Admission Control
  • Steps for proposed tasks
      • Integrate task into candidate schedule according to FirstReward
      • Determine yield for the task if accepted
      • Apply acceptance heuristic to determine acceptability
      • If accepting, issue a bid to the client
      • If client accepts the contract, place task into schedule to execute
  • Acceptance heuristic based on amount of additional delay the task can allow before its value falls below some yield threshold
summary of market based services
Summary of Market-based Services
  • Develops heuristics for scheduling and admission control in market-based grid task service
  • Value-based scheduling allows user to specify the value and urgency of the job
  • Maximizing user value in turn maximizes yield globally
  • Approach based on computational economy
  • This paper deals with share-based scheduling algorithms for differentiated service in network services, in particular storage service utilities
  • Allows a server to be shared among many request flows with some probabilistic assurance of receiving some minimum share of resources
  • Sharing of resources must be fair
  • SLA’s often define contractual obligations between client and service
goals of this research
Goals of This Research
  • Performance isolation
      • A surge from one flow should not degrade the performance of another flow
  • Differentiated application service quality
      • Performance should be predictable and configurable
  • Non-intrusive resource control
      • Designed to work without changes to existing services, like commercial storage servers
      • Views server as a black box
      • Control server resources externally
idea in words
Idea in Words
  • As the name suggests, the idea is to interpose a request scheduler between the client and server
  • The scheduler will intercept requests to the server
  • Depending on the request and state of previous requests, it will delay, reorder, or simply dispatch the request
interposed request scheduling
Interposed Request Scheduling
  • Scheduler intercepts requests
  • Dispatches according to some policies seeking to fairly share resources among all flows
  • Parameter D limits maximum number of outstanding requests
  • Each flow has separate queue
  • Scheduler dispatches from each queue on FIFO basis
related approaches
Related Approaches
  • Façade proposes an interposed request scheduler that uses Earliest Deadline First(EDF)
    • Drawback: unfair – cannot provide performance isolation
    • Uses priority scheduling to achieve isolation
  • SLEDS – per-client network adapter
    • Uses leaky bucket filter to shape and throttle I/O flows
    • Not work-conserving
proportional sharing
Proportional Sharing
  • Proposes 3 proportional sharing algorithms
    • SFQ(D) – Start-time Fair Queuing
    • FSFQ(D) – Four-tag Start-time Fair Queuing
    • RW(D) – Request Windows

which are general and configurable solutions

that provide

      • Performance isolation
      • Fairness
      • Work-conservation
fair queuing
Fair Queuing
  • Each flow is assigned a weight Φf
  • Resources allocated among active flows in proportion to weight
  • A flow is active if it has at least 1 outstanding request
  • Fair: Proven property bounding difference in work done for any pair of active flows (lag)
  • Work-conserving: surplus resources consumed by active flows without penalty
start time fair queuing
Start-time Fair Queuing
  • Start-time Fair Queuing(SFQ) is the basis for the scheduling algorithms due to fairness
  • SFQ assigns a tag to each request upon arrival and dispatches the requests in ascending order of tags
  • Fairness stems from method of computing and assigning tags
  • Assigns a start tag and finish tag for each request
    • Start tag:
    • Finish tag:
  • Defines a system notion of virtual time v(t) that advances as active flows progress
  • For example, v(t) advances quickly with less competition in order to use surplus resources
  • Start tag of a flow’s most recent request acts as the flow’s virtual clock
    • Flow with small tag value is behind and will receive priority
    • Flow with large tag value are ahead and may be held back
  • However, newly active flows will have their tag values set by v(t) so that there is fair competition between all active flows
  • Drawback: traditional SFQ [specifically v(t) ] does not work well in the face of concurrency
interposed proportional sharing41
Interposed Proportional Sharing
  • Goal: use a variant of SFQ for interposed scheduling which handles up to D requests concurrently
      • Ideal goal: the interposed scheduler can dispatch enough jobs concurrently to completely use resources
  • Scheduler wants to always have D concurrent outstanding requests
  • This value D represents a tradeoff between server resource utilization and scheduler fairness
      • Example: large D allows server to always have jobs waiting, but also increases the wait time for incoming requests
min sfq
  • Adaptation of SFQ which defines v(t) as the minimum start-tag of any outstanding request
  • Issue:
    • v(t) advances according to slowest active flow f
    • Sudden burst from f will penalize aggressive flows which are using surplus from f’s idle resources
    • If v(t) lags behind, it degrades down to Virtual Clock algorithm
    • If v(t) gets too far ahead, becomes FIFO
      • Both known to be unfair
sfq d
  • Goal is to advance v(t) fairly
  • Solution1: derive v(t) from active flows, not lagging flows
  • v(t) is defined as the start tag of the queued request with the lowest start tag at the last dispatch
  • Still uses the initial rules for determining the tags
sfq d44
  • Since the algorithm for dispatching is strictly SFQ, earlier properties still hold
    • Fairness
    • Bound on lag for different flows
  • Authors prove that SFQ(D) has fairness and lag bounds for requests completed
      • Client’s view of fairness
sfq d45
  • Problems
    • v(t) advances monotonically on request dispatch events, but may not advance on every dispatch
    • Therefore, bursts of requests may get the same start tag regardless of being behind or ahead
    • It is most fair for the scheduler to be biased in these situations against flows that have been using surplus resources
  • Realization: MinSFQ doesn’t suffer from this
fsfq d
  • Refinement of SFQ(D) that favors slow flows over ones that are ahead
  • Four-tag Start-time Fair Queuing
    • Combines fairness policies of SFQ(D) and MinSFQ(D)
    • Adds two new “adjusted “tags derived from MinSFQ(D)
    • The new tags are used to break ties in favor of lagging flows
  • SFQ(D) and FSFQ(D) require a central point of interposition to intercept and schedule all requests
    • Made for network switch or router
    • Introduces single point of failure and complexity
  • Scheduling overhead grows at best logarithmically – limits scalability
  • Authors propose simple decentralized approach called Request Windows (RW)
request windows
Request Windows
  • Credit-based server access scheme
  • Interposed at the client
  • Each flow i is given a number of credits ni
  • Each request from i uses a portion of i’s credit allocation
  • For a given flow i, it’s portion of the total weight D is seen as
request windows50
Request Windows
  • Pros
    • Under light load, a flow will encounter little congestion and complete quickly
    • Similar to self-clocking nature of TCP
  • Cons
    • RW is not fully work-conserving
    • Yields tight fairness bound, but may limit concurrency and ability to use surplus resources
  • As with SFQ(D), able to prove bound on lag between active flows
  • Implemented prototype interposed request scheduler by extending an NFS proxy
    • Implemented SFQ(D)/FSFQ(D), RW(D) and EDF
  • Created a disk array simulator to provide most results
  • Used fstress load generator for workload dominated by random reads on large files
  • Simulated fstress workloads with different D, arrival rate, and weighted shares
  • Goal: evaluate performance isolation
varying shares
Varying Shares
  • Still provides isolation from heavy users
  • Response time insensitive to weight in light load, but sensitive during heavy load
  • FSFQ(D) provides slightly better isolation than SFQ(D)
  • RW(D) provides better isolation than FSFQ(D) but less utilization of idle resources
varying d
Varying D
  • As hypothesized, increase in D weakens fairness in both FSFQ and RW
  • Low values of D have great fairness but lower levels of concurrency
  • Interposed request scheduling can provide a non-intrusive form of fairly sharing resources (performance isolation)
  • FSFQ(D) provides slightly better isolation
  • RW(D) provides tightest fairness bounds but at the expense of under-utilizing the resources
  • It may be appropriate to use some hybrid between FSFQ and RW
  • Is the market-based scheduling reasonable?
  • There are many assumptions that must be made, i.e. known costs, pricing, payment, issuance of currency, that the papers gloss over. Are these papers just concepts (like a Grid) or could they ever actually be used?
  • Balancing Risk and Reward in Market-Based Task Scheduling by David Irwin, Jeff Chase, and Laura Grit. In the Thirteenth International Symposium on High Performance Distributed Computing (HPDC-13), June 2004.
  • Interposed Proportional Sharing for a Storage Service Utility by Wei Jin, Jeff Chase, and Jasleen Kaur. In the Joint International Conference on Measurement and Modeling of Computer Systems (ACM SIGMETRICS / Performance), June 2004.
  • Christopher Lumb, Arif Merchant, and Guillermo A. Alvarez. Facade: Virtual storage devices with performance guarantees. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies, San Francisco, CA, March 2003.
  • David D. Chambliss, Guillermo A. Alvarez, Prashant Pandey, Divyesh Jadav, and Tzongyu P. Lee Jian Xu, Ram Menon. Performance virtualization for large-scale storage systems. In 22nd International Symposium on Reliable Distributed Systems (SRDS '03), October 2003.
  • Pawan Goyal, Harrick M. Vin, and Haichen Chen. Start-time fair queuing: A scheduling algorithm for integrated services packet switching networks.IEEE/ACM Transactions on Networking, 5(5):690.704, October 1997.
  • B. N. Chun and D. E. Culler. User-centric performance analysis of market-based cluster batch schedulers. In 2nd IEEE International Symposium on Cluster Computing and the Grid, May 2002.
  • Duke Internet Storage and Systems Group – ISSG –