1 / 27

Designing Parallel Operating Systems using Modern Interconnects

Designing Parallel Operating Systems using Modern Interconnects. Pitfalls in Parallel Job Scheduling Evaluation. Eitan Frachtenberg and Dror Feitelson. Computer and Computational Sciences Division Los Alamos National Laboratory. Ideas that change the world. Scope.

Download Presentation

Designing Parallel Operating Systems using Modern Interconnects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing Parallel Operating Systems using Modern Interconnects Pitfalls in Parallel Job Scheduling Evaluation Eitan Frachtenberg and Dror Feitelson Computer and Computational Sciences Division Los Alamos National Laboratory Ideas that change the world

  2. Scope • Numerous methodological issues occur with the evaluation of parallel job schedulers: • Experiment theory and design • Workloads and applications • Implementation issues and assumptions • Metrics and statistics • Paper covers 32 recurring pitfalls, organized into topics and sorted by severity • Talk will describe a real case study, and the heroic attempts to avoid most such pitfalls …as well as the less-heroic oversight of several others

  3. Evaluation Paths • Theoretical Analysis (queuing theory): • Reproducible, rigorous, and resource-friendly • Hard for time slicing due to unknown parameters, application structure, and feedbacks • Simulation: • Relatively simple and flexible • Many assumptions, not all known/reported; hard to reproduce; rarely factors application characteristics • Experiments with real sites and workloads: • Most representative (at least locally) • Largely impractical and irreproducible • Emulation

  4. Emulation Environment • Experimental platform consisting of three clusters with high-end network • Software: several job scheduling algorithms implemented on top of STORM: • Batch / space sharing, with optional EASY backfilling • Gang Scheduling, Implicit Coscheduling (SB), Flexible Coscheduling • Results described in [JSSPP’03] and [TPDS’05]

  5. Step One: Choosing Workload • Static vs. Dynamic • Size of workload • How many different workloads are needed? • Use trace data? • Different sites have different workload characteristics • Inconvenient sizes may require imprecise scaling • “Polluted” data, flurries • Use model-generated data? • Several models exist, with different strengths • By trying to capture everything, may capture nothing

  6. Static Workloads • We start with a synthetic application & static workloads • Simple enough to model, debug, and calibrate • Bulk-synchronous application • Can control: granularity, variability and Communication pattern

  7. Synthetic Scenarios Balanced ComplementingImbalancedMixed

  8. Example: Turnaround Time

  9. Dynamic Workloads • We chose Lublin’s model [JPDC’03] • 1000 jobs per workload • Multiplying run-times AND arrival times by constant to “shrink” run time (2-4 hours) • Shrinking too much is problematic (system constants) • Multiplying arrival times by a range of factors to modify load • Unrepresentative, since deviates from “real” correlations with run times and job sizes. • Better solution is to use different workloads

  10. Step Two: Choosing Applications • Synthetic applications are easy to control, but: • Some characteristics are ignored (e.g., I/O, memory) • Others may not be representative, in particular communication, which is salient of parallel apps. • Granularity, pattern, network performance • If not sure, conduct sensitivity analysis • Might be assumed malleable, moldable, or with linear speedup, which many MPI applications are not • Real applications have no hidden assumptions • But may also have limited generality

  11. Example: Sensitivity Analysis

  12. Application Choices • Synthetic applications on first set • Allows control over more parameters • Allows testing unrealistic but interesting conditions (e.g., high multiprogramming level) • LANL applications on second set (Sweep3D, Sage) • Real memory and communication use (MPL=2) • Important applications for LANL’s evaluations • But probably only for LANL… • Runtime estimate: f-model on batch, MPL on others

  13. Step Three: Choosing Parameters • What are reasonable input parameters to use in the evaluation? • Maximum multiprogramming level (MPL) • Timeslice quantum • Input load • Backfilling method and effect on multiprogramming • Run time estimate factor (not tested) • Algorithm constants, tuning, etc.

  14. Example 1: MPL • Verified with different offered loads

  15. Example 2: Timeslice • Dividing to quantiles allows analysis of effect on different job types

  16. Considerations for Parameters • Realistic MPLs • Scaling traces to different machine sizes • Scaling offered load • Artificial user estimates and multiprogramming estimates

  17. Step Four: Choosing Metrics • Not all metrics are easily comparable: • Absolute times, slowdown with time slicing, etc. • Metrics may need to be limited to a relevant context • Use multiple metrics to understand characteristics • Measuring utilization for an open model • Direct measure of offered load till saturation • Same goes for throughput and makespan • Better metrics: slowdown, response time, wait time • Using mean with asymmetric distributions • Inferring scalability from O(1) nodes

  18. Example: Bounded Slowdown

  19. Example (continued)

  20. Response Time

  21. Bounded Slowdown

  22. Step Five: Measurement • Never measure saturated workloads • When arrival rate is higher than service rate, queues grow to infinity; all metrics become meaningless • …but finding saturation point can be tricky • Discard warm-up and cool-down results • May need to measure subgroups separately (long/short, day/night, weekday/weekend,…) • Measurement should still have enough data points for statistical meaning, especially workload length

  23. Example: Saturation Point

  24. Example: Shortest jobs CDF

  25. Example: Longest jobs CDF

  26. Conclusion • Parallel Job Scheduling Evaluation is complex • …but we can avoid past mistakes • Paper can be used as a checklist to work with when designing and executing evaluations • Additional information in paper: • Pitfalls, examples, and scenarios • Suggestions on how to avoid pitfalls • Open research questions (for next JSSPP?) • Many references to positive examples • Be cognizant when Choosing your compromises

  27. References • Workload archive: http://www.cs.huji.ac.il/~feit/worklad Contains several workload traces and models • Dror’s publication page http://www.cs.huji.ac.il/~feit/pub.html • Eitan’s publication page http://www.cs.huji.ac.il/~etcs/pubs • Email: eitanf@lanl.gov

More Related