1 / 32

Scheduling for Performance

Scheduling for Performance. UMass-Boston Ethan Bolker April 21, 1999. Acknowledgements. Joint work with Jeff Buzen (BMC Software) BMC Software Dan Keefe Yefim Somin Chen Xiaogang (oliver@cs.umb.edu). Outline. Impossibly much to cover Performance metrics for workloads

Download Presentation

Scheduling for Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scheduling for Performance UMass-Boston Ethan Bolker April 21, 1999

  2. Acknowledgements • Joint work with Jeff Buzen (BMC Software) • BMC Software • Dan Keefe • Yefim Somin • Chen Xiaogang (oliver@cs.umb.edu) Scheduling for Performance

  3. Outline • Impossibly much to cover • Performance metrics for workloads • Beyond priorities • Modeling. Degradation as a performance metric • Conservation laws and the permutahedron • Specifying response times (IBM goal mode) • Specifying CPU shares (Sun Fair Share) • Priority distributions • Work in progress Scheduling for Performance

  4. Workload Performance Metrics • Transaction (open) workload: jobs arrive at random from an external source • web or database server, eris with many interactive users • inputs: job arrival rate (throughput), service time • performance metric: response time • Batch (closed) workload: jobs always waiting (latent demand) • weather prediction, data mining • input: job service time • performance metrics: response time, throughput Scheduling for Performance

  5. Beyond priorities • User wants performance assurance response time (open wkls), throughput (closed wkls) • Single workload: performance depends on resources available (CPU, IO, network) • Multiple workloads: prioritize resource access • Nice isn’t nice - hard to predict performance from priorities • Better: set performance goals, system tunes itself • Examples: IBM Goal Mode, Sun Fair Share, Eclipse, SMART, ... Scheduling for Performance

  6. Tuning by Tinkering Workload Performance (Response Time) Administrator Priority Assignments

  7. Scheduling software Scheduling for Performance Administrator Performance Goals rarely change measure frequently Workload Performance (Response Time) Priority Assignments

  8. Modeling • System is dynamic, state changes frequently • Model is a static snapshot, deals in averages and probabilities • Can ask “what if?” inexpensively • Modeler’s measure of performance: degradation = (elapsed time)/(service time) • deg  1, deg = 1 when no contention (deg < 1 if parallel computation possible) • deg = n for n closed workloads (no priorities) Scheduling for Performance

  9. Modeling One Open Workload • arrival rate  (job/sec) (Poisson) • service time s (sec/job) (exponential dist’n) • utilization u = s, 0  u < 1 • Theorem: deg = 1/(1-u) • Often a useful guide even when hypotheses fail • depends only on u: many small jobs == few large jobs • faster system  smaller s  smaller u  smaller deg • want u small when waiting is costly (telephones) • want u near 1 when system is costly (supercomputers) Scheduling for Performance

  10. Multiple (open) workloads • Priority state: order workloads by priority (ties OK) • two workloads, 3 states: 12, 21, [12] • three workloads, 13 states: 123 (3! = 6 ordered states), [12]3 (3 of these), 1[23] (3 of these), [123] • n wkls, f(n) states (simplex lock combos), n! ordered • At each time instant, system runs in some state s, V(s) = vector of workload degradations • Measure or model V(s) (operational analysis) • p(s) = prob( state = s ) = fraction of time in state s • V = s p(s)V(s) (time average, convex combination) Scheduling for Performance

  11. Two workloads (general case) wkl 2 degradation V(12) (wkl 1 high prio)  achievable region V([12]) (no priorities)   0.5 V(12) + 0.5V(21) note: u1 < u2  V(21) wkl 1 degradation

  12. Two workloads (conservation) wkl 2 degradation V(12)  d1 = d2 V([12]) (no priorities, = degradation)   0.5 V(12) + 0.5V(21) achievable region u1 d1 + u2 d2 --------------- = constant avg degradation u1 + u2  V(21) wkl 1 degradation

  13. Conservation • Theorem: For any priority assignments (1/util)wkls wutil(w)deg(w) = constant avg deg • Provable from some hypotheses, observable (false for printer queues) • For any set A of workloads imagine giving those workloads top priority discover (measure or model) avg degradation deg(A) (1/util(A))w A util(w)deg(w)  deg(A) • These linear inequalities determine the convex achievable region Scheduling for Performance

  14. Two workloads (conservation) u1 d1 + u2 d2 --------------- = constant avg degradation u1 + u2 V(12)  achievable region d2 V([12]))  d1  1/(1- u1 ) d2  1/(1- u2 )  V(21) d1

  15. Three workloads d3 u1 d1 + u2 d2 + u3 d3 ----------------------- = avg degradation u1 + u2 + u3 V(123)   V(213) d2 d1 Scheduling for Performance

  16. Three workload permutahedron d2 d1 = d2 [13]2 312 132 3[12] 1[23] [123] {3} 321 123 [23]1 {12} [12]3 231 d2 = d3 213 2[13] d1 Scheduling for Performance

  17. Four workload permutahedron 4! = 24 vertices (ordered states) 24 - 2 = 14 facets (proper subsets) (conservation constraints) 74 faces (states) Simplicial geometry and transportation polytopes, Trans. Amer. Math. Soc. 217 (1976) 138. Scheduling for Performance

  18. Scheduling for Performance • Administrator specifies goals - e.g. degradations • Software determines priorities, trying to meet goals • Model maps goals to achievable degradations workload performance goals achievable region Scheduling for Performance

  19. IBM OS390 Goal Mode Administrator specifies workload degradation goals wkl 2 degradation too generous  achievable region  too ambitious  wkl 1 degradation

  20. Modeling Goal Mode • Find right point in permutahedron for given V • Linear programming solution (Coffman & Mitrani) • Algorithm modeling problem more closely: for each subset A of workloads scale(A) = factor to force conservation true for A for each workload w scale(w) = min { scale(A) | scale(A) < 1 && w A } V(w) *= scale // inequalities now OK, scale back to p’hedron if necessary • O(2n), fast enough, conjecture (2n) • Refinements for workload importance Scheduling for Performance

  21. SUN SRM (Solaris Resource Manager) • Administrator specifies workload CPU shares • Share f (0 < f < 1) means wkl guaranteed fraction f of CPU when it’s on run queue, can get more if no competition • Share = utilization only for closed workloads • Model: f1 = 1, f2 = f3 = … = 0 means wkl 1 has preemptive highest priority • Two wkls: V = f1 V(12) + f2 V(21) Scheduling for Performance

  22. Map Shares to Degradations • Three (n) workloads f1 f2 f3 weight(123) = ------------------------------ (f1 + f2 +f3) (f2 +f3) (f3) V = ordered states s weight(s) V(s) • Theorem: weights sum to 1 • interesting identity generalizing adding fractions • prove by induction, or by coupon collecting • O(n!), (n!), fast enough for n < 9 (12) Scheduling for Performance

  23. Three workload example Scheduling for Performance

  24. Map Shares to Degradations • Normalize: f1 + f2 +f3 = 1 (barycentric coordinates) f1 = 1 achievable region f1 = 0 Scheduling for Performance

  25. Experimental results for 3 workloads Scheduling for Performance

  26. Mapping a triangle to a hexagon f1 = 1 f2 = 0 [13]2 312 132 3[12] 1[23] f2 = 1 f1 = 0 {3} 321 123 [23]1 {12} [12]3 wkl 1 high priority 231 213 2[13] wkl 1 low priority Scheduling for Performance

  27. Scheduling for Performance

  28. Map Goals to Shares • For open workloads, specifying shares is as as unintuitive as specifying priorities • Specify degradation goals • Map to achievable region • Reverse map from achievable region to shares: do guess shares // bisection argument compute degradations until error is acceptably small • 10 * O(n!) is good to 1% Scheduling for Performance

  29. Map degradations to priorities • Real system works with priorities • pdist(w,p) = prob( wkl w at prio p) = time fraction pdist space (dim n(n-1) achievable region (dim n-1) Scheduling for Performance

  30. Pdists to degradations and back d2 6 pieces, each combinatorially a square d1 = d2 1[23] [123] 123 [12]3 d2 = d3 d1 Scheduling for Performance

  31. Pdists to degradations and back 1 0 0 0 .5 .5 0 .5 .5 .33 .33 .33 .33 .33 .33 .33 .33 .33 1[23] [123] 1 0 0 0 1 0 0 0 1 .5 .5 0 .5 .5 0 0 0 1 123 [12]3 Scheduling for Performance

  32. Work in progress • Model mixed open and closed workloads • Prove algorithms correct • Solaris benchmark studies (under way) • OS390 validation - does data exist? • Write the paper ... • Build a product for IBM/Sun/BMC customers Scheduling for Performance

More Related