1 / 25

Multiprocessor Scheduling

Multiprocessor Scheduling. Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: http://www.cs.uah.edu/~weisskop/osnotes_html/M8.html. Classifications of Multiprocessor Systems. Loosely coupled multiprocessor, or clusters

hedia
Download Presentation

Multiprocessor Scheduling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: http://www.cs.uah.edu/~weisskop/osnotes_html/M8.html

  2. Classifications of Multiprocessor Systems • Loosely coupled multiprocessor, or clusters • Each processor has its own memory and I/O channels. • Functionally specialized processors • Such as I/O processor • Nvidia GPGPU • Controlled by a master processor • Tightly coupled multiprocessing • MCMP • Processors share main memory • Controlled by operating system • More economical than clusters

  3. Types of parallelism • Bit-level parallelism • Instruction-level parallelism • Data parallelism • Task parallelism our focus

  4. Synchronization Granuality • Refers to frequency of synchronization or parallelism among processes in the system • Five classes exist • Independent (SI is not applicable) • Very coarse (2000 < SI < 1M) • Course (200 < SI < 2000) • Medium (20 < SI < 200) • Fine (SI < 20) SIis called the Synchronization Interval, and measured in instructions.

  5. Wikipedia on Fine-grained, coarse-grained, and embarrassing parallelism • Applications are often classified according to how often their subtasks need to synchronize or communicate with each other. • An application exhibits fine-grained parallelism if its subtasks must communicate many times per second; • it exhibits coarse-grained parallelism if they do not communicate many times per second, • and it is embarrassingly parallel if they rarely or never have to communicate. Embarrassingly parallel applications are considered the easiest to parallelize.

  6. Independent Parallelism • Multiple unrelated processes • Separate application or job, e.g. spreadsheet, word processor, etc. • No synchronization • More than one processor is available • Average response time to users is less

  7. Coarse and Very Coarse-Grained Parallelism • Very coarse: distributed processing across network nodes to form a single computing environment • Coarse: Synchronization among processes at a very gross level • Good for concurrent processes running on a multiprogrammed uniprocessor • Can by supported on a multiprocessor with little change

  8. Medium-Grained Parallelism • Parallel processing or multitasking within a single application • Single application is a collection of threads • Threads usually interact frequently, leading to a medium-level synchronization.

  9. Fine-Grained Parallelism • Highly parallel applications • Synchronization every few instructions (on very short events). • Fill the gap between ILP (instruction level parallelism) and Medium-grained parallelism. • Can be found in small inner loops • Use of MPI and OpenMP programming languages • OS should not intervene. Usually done by HW • In practice, this is very specialized and fragmented area

  10. Scheduling • Scheduling on a multiprocessor involves 3 interrelated design issues: • Assignment of processes to processors • Use of multiprogramming on individual processors • Makes sense for processes (coarse-grained) • May not be good for threads (medium-grained) • Actual dispatching of a process • What scheduling policy should we use: FCFS, RR, etc. Sometimes a very sophisticated policy becomes counter productive.

  11. Assignment of Processes to Processors • Treat processors as a pooled resource and assign process to processors on demand • Permanently assign process to a processor • Dedicate short-term queue for each processor • Less overhead. Each does its own scheduling on its queue. • Disadvantage: Processor could be idle (has an empty queue) while another processor has a backlog.

  12. Assignment of Processes to Processors • Global queue • Schedule to any available processor • During the lifetime of the process, process may run on different processors at different times. • In SMP architecture, context switching can be done with small cost. • Master/slave architecture • Key kernel functions always run on a particular processor • Master is responsible for scheduling • Slave sends service request to the master • Synchronization is simplified • Disadvantages • Failure of master brings down whole system • Master can become a performance bottleneck

  13. Assignment of Processes to Processors • Peer architecture • Operating system can execute on any processor • Each processor does self-scheduling from a pool of available processes • Complicates the operating system • Make sure two processors do not choose the same process • Needs lots of synchronization

  14. Process Scheduling in Today’s SMP • M/M/M/K Queueing system • Single queue for all processes • Multiple queues are used for priorities • All queues feed to the common pool of processors • Specific scheduling disciplines is less important with more than one processor • A simple FCFS discipline with a static priority may suffice for a multi-processor system. • Illustrate using graph, p460 • In conclusion, specific scheduling discipline is much less important with SMP than UP

  15. Threads • Executes separate from the rest of the process • An application can be a set of threads that cooperate and execute concurrently in the same address space • Threads running on separate processors yields a dramatic gain in performance

  16. Multiprocessor Thread Scheduling --Four General Approaches (1/2) • Load sharing • Processes are not assigned to a particular processor • Gang scheduling • A set of related threads is scheduled to run on a set of processors at the same time

  17. Multiprocessor Thread Scheduling --Four General Approaches (2/2) • Dedicated processor assignment • Threads are assigned to a specific processor. (each thread can be run on a processor.) • When program terminates, processors are returned to the processor available pool • Dynamic scheduling • Number of threads can be altered during course of execution

  18. Load Sharing • Load is distributed evenly across the processors • No centralized scheduler required • OS runs on every processor to select the next thread. • Use global queues for ready threads • Usually FCFS policy

  19. Disadvantages of Load Sharing • Central queue needs mutual exclusion • May be a bottleneck when more than one processor looks for work at the same time • A noticeable problem when there are many processors. • Preemptive threads will unlikely resume execution on the same processor • Cache use is less efficient • If all threads are in the global queue, all threads of a program will not gain access to the processors at the same time. • Performance is compromised if coordination among threads is high.

  20. Gang Scheduling • Simultaneous scheduling of threads that make up a single process • Useful for applications where performance severely degrades when any part of the application is not running • Threads often need to synchronize with each other

  21. Advantages • Closely-related threads execute in parallel • Synchronization blocking is reduced • Less context switching • Scheduling overhead is reduced, as a single sync to signal() may affect many threads

  22. Scheduling Groups – two time slicing divisions

  23. Dedicated Processor Assignment – Affinitization (1/2) • When application is scheduled, each of its threads is assigned a processor that remains dedicated/affinitized to that thread until the application runs to completion • No multiprogramming of processors, i.e. one processor per a specific thread, and no other thread. • Some processors may be idle as threads may block • Eliminates context switches; with certain types of applications this can save enough time to compensate for the possible idle time penalty. • However, in a highly parallel environment with 100 of processors, utilization of processors is not a major issue, but performance is. The total avoidance of switching results in substantial speedup.

  24. Dedicated Processor Assignment – Affinitization (2/2) • In Dedicated Assignment, when the number of threads exceeds the number of available processors, efficiency drops. • Due to thread preemption, context switching, suspending of other threads, and cache pollution • Notice that both gang scheduling and dedicated assignment are more concerned with allocation issues than scheduling issues. • The important question becomes "How many processors should a process be assigned?" rather than "How shall I choose the next process?"

  25. Dynamic Scheduling • Threads within a process are variable • Sharing the work between OS and application. • When a job originates, it requests a certain number of processors. The OS grants some or all of the request, based on the number of processors currently available. • Then the application itself can decide which threads run when on which processors. This requires language support as would be provided with thread libraries. • When processors become free due to the termination of threads or processes, the OS can allocate them as needed to satisfy pending requests. • Simulations have shown that this approach is superior to gang scheduling and or dedicated scheduling.

More Related