1 / 7

Condor and Multi-core Scheduling

Condor and Multi-core Scheduling. Dan Bradley dan@hep.wisc.edu University of Wisconsin. with generous support from the National Science Foundation and productive collaboration with Red Hat. What Exists Today. Parallel scheduling Designed for MPI-type jobs

Download Presentation

Condor and Multi-core Scheduling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Condor and Multi-core Scheduling Dan Bradley dan@hep.wisc.edu University of Wisconsin with generous support from the National Science Foundation and productive collaboration with Red Hat

  2. What Exists Today Parallel scheduling Designed for MPI-type jobs No way to limit matchmaking to slots on same machine Custom Batch Slots One “slot” can be <> one core Slot policy can depend on type of job

  3. Example: Custom Batch Slots Slot 1, 2, 3 only accept 1-core jobs Slot 4 when claimed by normal 1-core job, behaves normally when claimed by 4-core job stop accepting jobs for slots 1, 2, and 3 suspends 4-core job until slots 1, 2, and 3 drain Related Condor How-to: http://nmi.cs.wisc.edu/node/1482

  4. shortcomings Awkward to extend to N-core jobs, but doable Very awkward to extend to N-core jobs and also support dynamic partitioning of memory, etc. Requires custom configuration by admins; no standard JDL for submitting grid jobs to sites Accounting does not charge multi-core job at higher rate than 1-core job

  5. in development: dynamic slots Start with one “batch slot” representing whole machine Trim slot to what job requires (cores, memory, disk, network, etc.) Leftovers assigned to new slot

  6. TODO improve accounting and any other monitoring to support multi-core slots support standard JDL for requesting multiple cores deal with preemption and/or mechanism for preventing starvation of multi-core jobs speed up dynamic slot creation

  7. Questions Is it desirable to have the capability to lock a job to a CPU? Does Globus provide RSL attributes with well-defined semantics for multi-core jobs? example: (count=N)(jobtype=single) in my experience, this is not well-defined can mean N cores x 1 job or N x job

More Related