Fair share scheduling
This presentation is the property of its rightful owner.
Sponsored Links
1 / 49

Fair Share Scheduling PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on
  • Presentation posted in: General

Fair Share Scheduling. Ethan Bolker Mathematics & Computer Science UMass Boston [email protected] www.cs.umb.edu/~eb Queen’s University March 23, 2001. Acknowledgements. Yiping Ding Jeff Buzen Dan Keefe Oliver Chen Chris Thornley. Aaron Ball Tom Larard Anatoliy Rikun Liying Song.

Download Presentation

Fair Share Scheduling

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Fair share scheduling

Fair Share Scheduling

Ethan Bolker

Mathematics & Computer Science UMass Boston

[email protected]

www.cs.umb.edu/~eb

Queen’s University

March 23, 2001


References

Acknowledgements

  • Yiping Ding

  • Jeff Buzen

  • Dan Keefe

  • Oliver Chen

  • Chris Thornley

  • Aaron Ball

  • Tom Larard

  • Anatoliy Rikun

  • Liying Song

References

  • www.bmc.com/patrol/fairshare

  • www/cs.umb.edu/~eb/goalmode


Coming attractions

Coming Attractions

  • Queueing theory primer

  • Fair share semantics

  • Priority scheduling; conservation laws

  • Predicting response times from shares

    • analytic formula

    • experimental validation

    • applet simulation

  • Implementation geometry


Transaction workload

Transaction Workload

  • Stream of jobs visiting a server (ATM, time shared CPU, printer, …)

  • Jobs queue when server is busy

  • Input:

    • Arrival rate:  job/sec

    • Service demand: s sec/job

  • Performance metrics:

    • server utilization: u = s (must be  1)

    • response time: r = ??? sec/job (average)

    • degradation: d = r/s


Response time computations

Response time computations

  • r, d measure queueing delay

    r  s (d  1), unless parallel processing possible

  • Randomness really matters

    r = s (d = 1) if arrivals scheduled (best case, no waiting)

    r >> s for bulk arrivals (worst case, maximum delays)

  • Theorem. If arrivals are Poisson and service is exponentially distributed (M/M/1) then

    d = 1/(1- u) r = s/(1- u)

  • Think: virtual server with speed 1-u


M m 1

M/M/1

  • Essential nonlinearity often counterintuitive

    • at u = 95% average degradation is 1/(1-0.95) = 20,

    • but 1 customer in 20 has no wait at all (5% idle time)

  • A useful guide even when hypotheses fail

    • accurate enough ( 30%) for real computer systems

    • d depends only on u: many small jobs have same impact as few large jobs

    • faster system  smaller s  smaller u r = s/(1-u)  double win: less service, less wait

    • waiting costly, server cheap (telephones): want u  0

    • server costly (doctors): want u  1 but scheduled


Scheduling for performance

Scheduling for Performance

  • Customers want good response times

  • Decreasing u is expensive

  • High end Unix offerings from HP, IBM, Sun offer fair share scheduling packages that allow an administrator to allocate scarce resources (CPU, processes, bandwidth) among workloads

  • How do these packages behave?

  • Model as a black box, independent of internals

  • Limit study to CPU shares on a uniprocessor


Multiple job streams

Multiple Job Streams

  • Multiple workloads, utilizations u1, u2, …

  • U =  ui < 1

  • Ifno workload prioritization then all degradations are equal: di = 1/(1-U)

  • Share allocations are de facto prioritizations

  • Study degradation vector V = (d1, d2, …)


Share semantics

Share Semantics

  • Suppose workload w has CPU share fw

  • Normalize shares so that w fw = 1

  • w gets fraction fw of CPU time slices when at least one of its jobs is ready for service

  • Can it use more if competing workloads idle?

    No :think share = cap

    Yes : think share = guarantee


Shares as caps

Shares As Caps

  • Good for accounting (sell fraction of web server)

  • Available now from IBM, HP, soon from Sun

  • Straightforward (boring) - workloads are isolated

  • Each runs on a virtual processor with speed *= f

share f

dedicated system

utilization u

u/f need f > u !

response time r r(1  u)/(f  u)


Shares as guarantees

Shares As Guarantees

  • Good for performance + economy (use otherwise idle resources)

  • Shares make a difference only when there are multiple workloads

  • Large share resembles high priority: share may be less than utilization

  • Workload interaction is subtle, often unintuitive, hard to explain


Modeling

OS

Performance

Goals

response time

report

measure

frequently

update

query

workload

Model

complex

scheduling

software

analytic

algorithms

fast

computation

Modeling


Modeling1

Modeling

  • Real system

    • Complex, dynamic, frequent state changes

    • Hard to tease out cause and effect

  • Model

    • Static snapshot, deals in averages and probabilities

    • Fast enlightening answers to “what if ” questions

  • Abstraction helps you understand real system

  • Start with a study of priority scheduling


Priority scheduling

Priority Scheduling

  • Priority state: order workloads by priority (ties OK)

    • two workloads, 3 states: 12, 21, [12]

    • three workloads, 13 states:

      • 123 (6 = 3! of these ordered states),

      • [12]3 (3 of these),

      • 1[23] (3 of these),

      • [123] (1 state with no priorities)

    • n wkls, f(n) states, n! ordered (simplex lock combos)

  • p(s) = prob( state = s ) = fraction of time in state s

  • V(s) = degradation vector when state = s (measure this, or compute it using queueing theory)

  • V = s p(s)V(s) (time avg is convex combination)

  • Achievable region is convex hull of vectors V(s)


Two workloads

Two workloads

d1 = d2

d2

V(12) (wkl 1 high prio)

V([12]) (no priorities)

achievable region

V(21)

d1


Two workloads1

Two workloads

d1 = d2

d2

V(12) (wkl 1 high prio)

V([12]) (no priorities)

0.5 V(12) + 0.5V(21)

 V([12])

V(21)

d1


Two workloads2

Two workloads

d1 = d2

d2

V(12) (wkl 1 high prio)

V([12]) (no priorities)

note: u1 < u2  wkl 2 effect on wkl 1 large

V(21)

d1


Conservation

Conservation

  • No Free Lunch Theorem. Weighted average degradation is constant, independent of priority scheduling scheme:

    i (ui /U) di = 1/(1-U)

  • Provable from some hypotheses

  • Observable in some real systems

  • Sometimes false: shortest job first minimizes average response time (printer queues, supermarket express checkout lines)


Conservation1

Conservation

  • For any proper set A of workloads

    Imagine giving those workloads top priority.

    Then can pretend other wkls don’t exist. In that case

    i  A (ui /U(A)) di= 1/(1-U(A))

    When wkls in A have lower priorities they have

    higher degradations, so in general

    i  A (ui /U(A)) di 1/(1-U(A))

  • These 2n -2 linear inequalities determine the convex achievable regionR

  • R is a permutahedron: only n! vertices


Two workloads3

Two Workloads

conservation law:

(d1, d2 ) lies on the line

d2 : workload 2 degradation

u1d1 + u2d2 = 1/(1-U)

d1 : workload 1 degradation


Two workloads4

Two Workloads

d2 : workload 2 degradation

constraint resulting

from workload 1

d1 1/(1- u1 )

d1 : workload 1 degradation


Two workloads5

Two Workloads

Workload 1 runs at high priority:

V(1,2) = (1 /(1- u1 ), 1 /(1- u1 )(1-U) )

d2 : workload 2 degradation

constraint resulting

from workload 1

d1  1 /(1- u1 )

d1 : workload 1 degradation


Two workloads6

Two Workloads

d2 : workload 2 degradation

u1d1 + u2d2 = 1/(1-U)

d2  1 /(1- u2 )

V(2,1)

d1 : workload 1 degradation


Two workloads7

Two Workloads

V(1,2) = (1 /(1- u1 ), 1 /(1- u1 )(1-U) )

d2 : workload 2 degradation

achievable region R

u1d1 + u2d2 = 1/(1-U)

d2  1 /(1- u2 )

V(2,1)

d1  1 /(1- u1 )

d1 : workload 1 degradation


Three workloads

Three Workloads

  • Degradation vector (d1,d2, d3) lies on plane u1 d1 + u2 d2 + u3dr3 = C

  • We know a constraint for each workload w: uw dw Cw

  • Conservation applies to each pair of wkls as well: u1 d1 + u2 d2 C12

  • Achievable region has one vertex for each priority ordering of workloads: 3! = 6 in all

  • Hence its name: the permutahedron


Fair share scheduling

Three Workload Permutahedron

3! = 6 vertices (priority orders)

23 - 2 = 6 edges

(conservation constraints)

u1 r1 + u2 d2 + u3 d3 = C

d3

V(1,2,3)

V(2,1,3)

d2

d1


Fair share scheduling

Experimental evidence


Four workload permutahedron

Four workload permutahedron

4! = 24 vertices (ordered states)

24 - 2 = 14 facets (proper subsets)

(conservation constraints)

74 faces (states)

Simplicial geometry and transportation polytopes,

Trans. Amer. Math. Soc. 217 (1976) 138.


Map shares to degradations two workloads

Map shares to degradations- two workloads -

  • Suppose f1 and f2 > 0 , f1 + f2 = 1

  • Model: System operates in state

    • 12 with probability f1

    • 21 with probability f2

      (independent of who is on queue)

  • Average degradation vector:

    V = f1 V(12) + f2 V(21)


Predict degradations from shares two workloads

Predict Degradations From Shares(Two Workloads)

  • Reasonable modeling assumption: f1 = 1, f2 = 0 means workload 1 runs at high priority

  • For arbitrary shares: workload priority order is

    (1,2) with probability f1

    (2,1) with probability f2 (probability = fraction of time)

  • Compute average workload degradation: d1 = f1  (wkl 1 degradation at high priority) + f2 (wkl 1 degradation at low priority )

Fair Share Scheduling


Model validation

Model validation


Model validation1

Model validation


Map shares to degradations three n workloads

Map shares to degradations- three (n) workloads -

f1 f2 f3

prob(123) = ------------------------------

(f1 + f2 +f3) (f2 +f3) (f3)

  • Theorem: These n! probabilities sum to 1

    • interesting identity generalizing adding fractions

    • prove by induction, or by coupon collecting

  • V = ordered states s prob(s) V(s)

  • O(n!), (n!), good enough for n  9 (12)


Model validation2

Model validation


Model validation3

Model validation


The fair share applet

The Fair Share Applet

  • Screen captures on next slides are from www.bmc.com/patrol/fairshare

  • Experiment with “what if” fair share modeling

  • Watch a simulation

  • Random virtual job generator for the simulation is the same one used to generate random real jobs for our benchmark studies


Three transaction workloads

1

2

3

Three Transaction Workloads

???

  • Three workloads, each with utilization 0.32 jobs/second  1.0 seconds/job = 0.32 = 32%

  • CPU 96% busy, so average (conserved) response time is 1.0/(10.96) = 25 seconds

  • Individual workload average response times depend on shares

???

???


Three transaction workloads1

1

sum 80.0

32.0

2

48.0

3

20.0

Three Transaction Workloads

  • Normalized f3 = 0.20 means 20% of the time workload 3 (development) would be dispatched at highest priority

  • During that time, workload priority order is (3,1,2) for 32/80 of the time, (3,2,1) for 48/80

  • Probability( priority order is 312 ) = 0.20(32/80) = 0.08


Fair share scheduling

Three Transaction Workloads

  • Formulas on previous slide

  • Average predicted response time weighted by throughput 25 seconds (as expected)

  • Hard to understand intuitively

  • Software helps


Three transaction workloads2

note change from 32%

Three Transaction Workloads


Simulation

jobs currently on run queue

Simulation


When the model fails

When the Model Fails

  • Real CPU uses round robin scheduling to deliver time slices

  • Short jobs never wait for long jobs to complete

  • That resembles shortest job first, so response time conservation law fails

  • At high utilization, simulation shows smaller response times than predicted by model

  • Response time conservation law yields conservative predictions


Scaling degradation predictions

Scaling Degradation Predictions

  • V = ordered states s prob(s) V(s)

  • Each s is a permutation of (1,2, … , n)

  • Think of it as a vector in n-space

  • Those n! vectors lie on of a sphere

  • For n large they are pretty densely packed

  • Think of prob(s) as a discrete approximation to a probability distribution on the sphere

  • V is an integral


Monte carlo

Monte Carlo

  • loop sampleSize times

    choose a permutation s at random from the distribution determined by the shares

    compute degradation vector V(s)

    accumulate V += prob(s)V(s)

  • sampleSize = 40000 works well independent of n!


Map shares to degradations geometry

Map shares to degradations(geometry)

  • Interpret shares as barycentric coordinates in the n-1 simplex

  • Study the geometry of the map from the simplex to the n-1 dimensional permutahedron

  • Easy when n=2: each is a line segment and map is linear


Mapping a triangle to a hexagon

Mapping a triangle to a hexagon

f3 = 1

f1 = 0

312

132

f1 = 1

M

f3 = 0

321

123

wkl 1 high priority

213

231

wkl 1 low priority


Mapping a triangle to a hexagon1

Mapping a triangle to a hexagon

f1 = 0

f1 = 1

{23}


Fair share scheduling

Mapping a triangle to a hexagon


What this means

What This Means

  • Add a strong statement that summarizes how you feel or think about this topic

  • Summarize key points you want your audience to remember


  • Login