Fair share scheduling
This presentation is the property of its rightful owner.
Sponsored Links
1 / 49

Fair Share Scheduling PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on
  • Presentation posted in: General

Fair Share Scheduling. Ethan Bolker Mathematics & Computer Science UMass Boston [email protected] www.cs.umb.edu/~eb Queen’s University March 23, 2001. Acknowledgements. Yiping Ding Jeff Buzen Dan Keefe Oliver Chen Chris Thornley. Aaron Ball Tom Larard Anatoliy Rikun Liying Song.

Download Presentation

Fair Share Scheduling

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Fair Share Scheduling

Ethan Bolker

Mathematics & Computer Science UMass Boston

[email protected]

www.cs.umb.edu/~eb

Queen’s University

March 23, 2001


Acknowledgements

  • Yiping Ding

  • Jeff Buzen

  • Dan Keefe

  • Oliver Chen

  • Chris Thornley

  • Aaron Ball

  • Tom Larard

  • Anatoliy Rikun

  • Liying Song

References

  • www.bmc.com/patrol/fairshare

  • www/cs.umb.edu/~eb/goalmode


Coming Attractions

  • Queueing theory primer

  • Fair share semantics

  • Priority scheduling; conservation laws

  • Predicting response times from shares

    • analytic formula

    • experimental validation

    • applet simulation

  • Implementation geometry


Transaction Workload

  • Stream of jobs visiting a server (ATM, time shared CPU, printer, …)

  • Jobs queue when server is busy

  • Input:

    • Arrival rate:  job/sec

    • Service demand: s sec/job

  • Performance metrics:

    • server utilization: u = s (must be  1)

    • response time: r = ??? sec/job (average)

    • degradation: d = r/s


Response time computations

  • r, d measure queueing delay

    r  s (d  1), unless parallel processing possible

  • Randomness really matters

    r = s (d = 1) if arrivals scheduled (best case, no waiting)

    r >> s for bulk arrivals (worst case, maximum delays)

  • Theorem. If arrivals are Poisson and service is exponentially distributed (M/M/1) then

    d = 1/(1- u) r = s/(1- u)

  • Think: virtual server with speed 1-u


M/M/1

  • Essential nonlinearity often counterintuitive

    • at u = 95% average degradation is 1/(1-0.95) = 20,

    • but 1 customer in 20 has no wait at all (5% idle time)

  • A useful guide even when hypotheses fail

    • accurate enough ( 30%) for real computer systems

    • d depends only on u: many small jobs have same impact as few large jobs

    • faster system  smaller s  smaller u r = s/(1-u)  double win: less service, less wait

    • waiting costly, server cheap (telephones): want u  0

    • server costly (doctors): want u  1 but scheduled


Scheduling for Performance

  • Customers want good response times

  • Decreasing u is expensive

  • High end Unix offerings from HP, IBM, Sun offer fair share scheduling packages that allow an administrator to allocate scarce resources (CPU, processes, bandwidth) among workloads

  • How do these packages behave?

  • Model as a black box, independent of internals

  • Limit study to CPU shares on a uniprocessor


Multiple Job Streams

  • Multiple workloads, utilizations u1, u2, …

  • U =  ui < 1

  • Ifno workload prioritization then all degradations are equal: di = 1/(1-U)

  • Share allocations are de facto prioritizations

  • Study degradation vector V = (d1, d2, …)


Share Semantics

  • Suppose workload w has CPU share fw

  • Normalize shares so that w fw = 1

  • w gets fraction fw of CPU time slices when at least one of its jobs is ready for service

  • Can it use more if competing workloads idle?

    No :think share = cap

    Yes : think share = guarantee


Shares As Caps

  • Good for accounting (sell fraction of web server)

  • Available now from IBM, HP, soon from Sun

  • Straightforward (boring) - workloads are isolated

  • Each runs on a virtual processor with speed *= f

share f

dedicated system

utilization u

u/f need f > u !

response time r r(1  u)/(f  u)


Shares As Guarantees

  • Good for performance + economy (use otherwise idle resources)

  • Shares make a difference only when there are multiple workloads

  • Large share resembles high priority: share may be less than utilization

  • Workload interaction is subtle, often unintuitive, hard to explain


OS

Performance

Goals

response time

report

measure

frequently

update

query

workload

Model

complex

scheduling

software

analytic

algorithms

fast

computation

Modeling


Modeling

  • Real system

    • Complex, dynamic, frequent state changes

    • Hard to tease out cause and effect

  • Model

    • Static snapshot, deals in averages and probabilities

    • Fast enlightening answers to “what if ” questions

  • Abstraction helps you understand real system

  • Start with a study of priority scheduling


Priority Scheduling

  • Priority state: order workloads by priority (ties OK)

    • two workloads, 3 states: 12, 21, [12]

    • three workloads, 13 states:

      • 123 (6 = 3! of these ordered states),

      • [12]3 (3 of these),

      • 1[23] (3 of these),

      • [123] (1 state with no priorities)

    • n wkls, f(n) states, n! ordered (simplex lock combos)

  • p(s) = prob( state = s ) = fraction of time in state s

  • V(s) = degradation vector when state = s (measure this, or compute it using queueing theory)

  • V = s p(s)V(s) (time avg is convex combination)

  • Achievable region is convex hull of vectors V(s)


Two workloads

d1 = d2

d2

V(12) (wkl 1 high prio)

V([12]) (no priorities)

achievable region

V(21)

d1


Two workloads

d1 = d2

d2

V(12) (wkl 1 high prio)

V([12]) (no priorities)

0.5 V(12) + 0.5V(21)

 V([12])

V(21)

d1


Two workloads

d1 = d2

d2

V(12) (wkl 1 high prio)

V([12]) (no priorities)

note: u1 < u2  wkl 2 effect on wkl 1 large

V(21)

d1


Conservation

  • No Free Lunch Theorem. Weighted average degradation is constant, independent of priority scheduling scheme:

    i (ui /U) di = 1/(1-U)

  • Provable from some hypotheses

  • Observable in some real systems

  • Sometimes false: shortest job first minimizes average response time (printer queues, supermarket express checkout lines)


Conservation

  • For any proper set A of workloads

    Imagine giving those workloads top priority.

    Then can pretend other wkls don’t exist. In that case

    i  A (ui /U(A)) di= 1/(1-U(A))

    When wkls in A have lower priorities they have

    higher degradations, so in general

    i  A (ui /U(A)) di 1/(1-U(A))

  • These 2n -2 linear inequalities determine the convex achievable regionR

  • R is a permutahedron: only n! vertices


Two Workloads

conservation law:

(d1, d2 ) lies on the line

d2 : workload 2 degradation

u1d1 + u2d2 = 1/(1-U)

d1 : workload 1 degradation


Two Workloads

d2 : workload 2 degradation

constraint resulting

from workload 1

d1 1/(1- u1 )

d1 : workload 1 degradation


Two Workloads

Workload 1 runs at high priority:

V(1,2) = (1 /(1- u1 ), 1 /(1- u1 )(1-U) )

d2 : workload 2 degradation

constraint resulting

from workload 1

d1  1 /(1- u1 )

d1 : workload 1 degradation


Two Workloads

d2 : workload 2 degradation

u1d1 + u2d2 = 1/(1-U)

d2  1 /(1- u2 )

V(2,1)

d1 : workload 1 degradation


Two Workloads

V(1,2) = (1 /(1- u1 ), 1 /(1- u1 )(1-U) )

d2 : workload 2 degradation

achievable region R

u1d1 + u2d2 = 1/(1-U)

d2  1 /(1- u2 )

V(2,1)

d1  1 /(1- u1 )

d1 : workload 1 degradation


Three Workloads

  • Degradation vector (d1,d2, d3) lies on plane u1 d1 + u2 d2 + u3dr3 = C

  • We know a constraint for each workload w: uw dw Cw

  • Conservation applies to each pair of wkls as well: u1 d1 + u2 d2 C12

  • Achievable region has one vertex for each priority ordering of workloads: 3! = 6 in all

  • Hence its name: the permutahedron


Three Workload Permutahedron

3! = 6 vertices (priority orders)

23 - 2 = 6 edges

(conservation constraints)

u1 r1 + u2 d2 + u3 d3 = C

d3

V(1,2,3)

V(2,1,3)

d2

d1


Experimental evidence


Four workload permutahedron

4! = 24 vertices (ordered states)

24 - 2 = 14 facets (proper subsets)

(conservation constraints)

74 faces (states)

Simplicial geometry and transportation polytopes,

Trans. Amer. Math. Soc. 217 (1976) 138.


Map shares to degradations- two workloads -

  • Suppose f1 and f2 > 0 , f1 + f2 = 1

  • Model: System operates in state

    • 12 with probability f1

    • 21 with probability f2

      (independent of who is on queue)

  • Average degradation vector:

    V = f1 V(12) + f2 V(21)


Predict Degradations From Shares(Two Workloads)

  • Reasonable modeling assumption: f1 = 1, f2 = 0 means workload 1 runs at high priority

  • For arbitrary shares: workload priority order is

    (1,2) with probability f1

    (2,1) with probability f2 (probability = fraction of time)

  • Compute average workload degradation: d1 = f1  (wkl 1 degradation at high priority) + f2 (wkl 1 degradation at low priority )

Fair Share Scheduling


Model validation


Model validation


Map shares to degradations- three (n) workloads -

f1 f2 f3

prob(123) = ------------------------------

(f1 + f2 +f3) (f2 +f3) (f3)

  • Theorem: These n! probabilities sum to 1

    • interesting identity generalizing adding fractions

    • prove by induction, or by coupon collecting

  • V = ordered states s prob(s) V(s)

  • O(n!), (n!), good enough for n  9 (12)


Model validation


Model validation


The Fair Share Applet

  • Screen captures on next slides are from www.bmc.com/patrol/fairshare

  • Experiment with “what if” fair share modeling

  • Watch a simulation

  • Random virtual job generator for the simulation is the same one used to generate random real jobs for our benchmark studies


1

2

3

Three Transaction Workloads

???

  • Three workloads, each with utilization 0.32 jobs/second  1.0 seconds/job = 0.32 = 32%

  • CPU 96% busy, so average (conserved) response time is 1.0/(10.96) = 25 seconds

  • Individual workload average response times depend on shares

???

???


1

sum 80.0

32.0

2

48.0

3

20.0

Three Transaction Workloads

  • Normalized f3 = 0.20 means 20% of the time workload 3 (development) would be dispatched at highest priority

  • During that time, workload priority order is (3,1,2) for 32/80 of the time, (3,2,1) for 48/80

  • Probability( priority order is 312 ) = 0.20(32/80) = 0.08


Three Transaction Workloads

  • Formulas on previous slide

  • Average predicted response time weighted by throughput 25 seconds (as expected)

  • Hard to understand intuitively

  • Software helps


note change from 32%

Three Transaction Workloads


jobs currently on run queue

Simulation


When the Model Fails

  • Real CPU uses round robin scheduling to deliver time slices

  • Short jobs never wait for long jobs to complete

  • That resembles shortest job first, so response time conservation law fails

  • At high utilization, simulation shows smaller response times than predicted by model

  • Response time conservation law yields conservative predictions


Scaling Degradation Predictions

  • V = ordered states s prob(s) V(s)

  • Each s is a permutation of (1,2, … , n)

  • Think of it as a vector in n-space

  • Those n! vectors lie on of a sphere

  • For n large they are pretty densely packed

  • Think of prob(s) as a discrete approximation to a probability distribution on the sphere

  • V is an integral


Monte Carlo

  • loop sampleSize times

    choose a permutation s at random from the distribution determined by the shares

    compute degradation vector V(s)

    accumulate V += prob(s)V(s)

  • sampleSize = 40000 works well independent of n!


Map shares to degradations(geometry)

  • Interpret shares as barycentric coordinates in the n-1 simplex

  • Study the geometry of the map from the simplex to the n-1 dimensional permutahedron

  • Easy when n=2: each is a line segment and map is linear


Mapping a triangle to a hexagon

f3 = 1

f1 = 0

312

132

f1 = 1

M

f3 = 0

321

123

wkl 1 high priority

213

231

wkl 1 low priority


Mapping a triangle to a hexagon

f1 = 0

f1 = 1

{23}


Mapping a triangle to a hexagon


What This Means

  • Add a strong statement that summarizes how you feel or think about this topic

  • Summarize key points you want your audience to remember


  • Login