Multiprocessor scheduling
1 / 25

Multiprocessor Scheduling - PowerPoint PPT Presentation

  • Uploaded on

Multiprocessor Scheduling. Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: Classifications of Multiprocessor Systems. Loosely coupled multiprocessor, or clusters

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Multiprocessor Scheduling' - hedia

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Multiprocessor scheduling

Multiprocessor Scheduling

Module 3.1

For a good summary on multiprocessor and real-time scheduling,


Classifications of multiprocessor systems
Classifications of Multiprocessor Systems

  • Loosely coupled multiprocessor, or clusters

    • Each processor has its own memory and I/O channels.

  • Functionally specialized processors

    • Such as I/O processor

      • Nvidia GPGPU

    • Controlled by a master processor

  • Tightly coupled multiprocessing

    • MCMP

    • Processors share main memory

    • Controlled by operating system

    • More economical than clusters

Types of parallelism
Types of parallelism

  • Bit-level parallelism

  • Instruction-level parallelism

  • Data parallelism

  • Task parallelism our focus

Synchronization granuality
Synchronization Granuality

  • Refers to frequency of synchronization or parallelism among processes in the system

  • Five classes exist

    • Independent (SI is not applicable)

    • Very coarse (2000 < SI < 1M)

    • Course (200 < SI < 2000)

    • Medium (20 < SI < 200)

    • Fine (SI < 20)

SIis called the Synchronization Interval, and measured in instructions.

Wikipedia on fine grained coarse grained and embarrassing parallelism
Wikipedia on Fine-grained, coarse-grained, and embarrassing parallelism

  • Applications are often classified according to how often their subtasks need to synchronize or communicate with each other.

  • An application exhibits fine-grained parallelism if its subtasks must communicate many times per second;

  • it exhibits coarse-grained parallelism if they do not communicate many times per second,

  • and it is embarrassingly parallel if they rarely or never have to communicate. Embarrassingly parallel applications are considered the easiest to parallelize.

Independent parallelism
Independent Parallelism parallelism

  • Multiple unrelated processes

  • Separate application or job, e.g. spreadsheet, word processor, etc.

  • No synchronization

  • More than one processor is available

    • Average response time to users is less

Coarse and very coarse grained parallelism
Coarse and Very Coarse-Grained Parallelism parallelism

  • Very coarse: distributed processing across network nodes to form a single computing environment

  • Coarse: Synchronization among processes at a very gross level

  • Good for concurrent processes running on a multiprogrammed uniprocessor

    • Can by supported on a multiprocessor with little change

Medium grained parallelism
Medium-Grained Parallelism parallelism

  • Parallel processing or multitasking within a single application

  • Single application is a collection of threads

  • Threads usually interact frequently, leading to a medium-level synchronization.

Fine grained parallelism
Fine-Grained Parallelism parallelism

  • Highly parallel applications

  • Synchronization every few instructions (on very short events).

  • Fill the gap between ILP (instruction level parallelism) and Medium-grained parallelism.

  • Can be found in small inner loops

    • Use of MPI and OpenMP programming languages

  • OS should not intervene. Usually done by HW

  • In practice, this is very specialized and fragmented area

Scheduling parallelism

  • Scheduling on a multiprocessor involves 3 interrelated design issues:

    • Assignment of processes to processors

    • Use of multiprogramming on individual processors

      • Makes sense for processes (coarse-grained)

      • May not be good for threads (medium-grained)

    • Actual dispatching of a process

      • What scheduling policy should we use: FCFS, RR, etc. Sometimes a very sophisticated policy becomes counter productive.

Assignment of processes to processors
Assignment of Processes to Processors parallelism

  • Treat processors as a pooled resource and assign process to processors on demand

  • Permanently assign process to a processor

    • Dedicate short-term queue for each processor

    • Less overhead. Each does its own scheduling on its queue.

    • Disadvantage: Processor could be idle (has an empty queue) while another processor has a backlog.

Assignment of processes to processors1
Assignment of Processes to Processors parallelism

  • Global queue

    • Schedule to any available processor

    • During the lifetime of the process, process may run on different processors at different times.

    • In SMP architecture, context switching can be done with small cost.

  • Master/slave architecture

    • Key kernel functions always run on a particular processor

    • Master is responsible for scheduling

    • Slave sends service request to the master

    • Synchronization is simplified

    • Disadvantages

      • Failure of master brings down whole system

      • Master can become a performance bottleneck

Assignment of processes to processors2
Assignment of Processes to Processors parallelism

  • Peer architecture

    • Operating system can execute on any processor

    • Each processor does self-scheduling from a pool of available processes

    • Complicates the operating system

      • Make sure two processors do not choose the same process

      • Needs lots of synchronization

Process scheduling in today s smp
Process Scheduling in Today’s SMP parallelism

  • M/M/M/K Queueing system

    • Single queue for all processes

    • Multiple queues are used for priorities

    • All queues feed to the common pool of processors

  • Specific scheduling disciplines is less important with more than one processor

    • A simple FCFS discipline with a static priority may suffice for a multi-processor system.

    • Illustrate using graph, p460

    • In conclusion, specific scheduling discipline is much less important with SMP than UP

Threads parallelism

  • Executes separate from the rest of the process

  • An application can be a set of threads that cooperate and execute concurrently in the same address space

  • Threads running on separate processors yields a dramatic gain in performance

Multiprocessor thread scheduling four general approaches 1 2
Multiprocessor Thread Scheduling -- parallelismFour General Approaches (1/2)

  • Load sharing

    • Processes are not assigned to a particular processor

  • Gang scheduling

    • A set of related threads is scheduled to run on a set of processors at the same time

Multiprocessor thread scheduling four general approaches 2 2
Multiprocessor Thread Scheduling -- parallelismFour General Approaches (2/2)

  • Dedicated processor assignment

    • Threads are assigned to a specific processor. (each thread can be run on a processor.)

    • When program terminates, processors are returned to the processor available pool

  • Dynamic scheduling

    • Number of threads can be altered during course of execution

Load sharing
Load Sharing parallelism

  • Load is distributed evenly across the processors

  • No centralized scheduler required

    • OS runs on every processor to select the next thread.

  • Use global queues for ready threads

    • Usually FCFS policy

Disadvantages of load sharing
Disadvantages of Load Sharing parallelism

  • Central queue needs mutual exclusion

    • May be a bottleneck when more than one processor looks for work at the same time

    • A noticeable problem when there are many processors.

  • Preemptive threads will unlikely resume execution on the same processor

    • Cache use is less efficient

  • If all threads are in the global queue, all threads of a program will not gain access to the processors at the same time.

    • Performance is compromised if coordination among threads is high.

Gang scheduling
Gang Scheduling parallelism

  • Simultaneous scheduling of threads that make up a single process

  • Useful for applications where performance severely degrades when any part of the application is not running

  • Threads often need to synchronize with each other

Advantages parallelism

  • Closely-related threads execute in parallel

  • Synchronization blocking is reduced

  • Less context switching

  • Scheduling overhead is reduced, as a single sync to signal() may affect many threads

Dedicated processor assignment affinitization 1 2
Dedicated Processor Assignment – Affinitization (1/2) parallelism

  • When application is scheduled, each of its threads is assigned a processor that remains dedicated/affinitized to that thread until the application runs to completion

  • No multiprogramming of processors, i.e. one processor per a specific thread, and no other thread.

  • Some processors may be idle as threads may block

  • Eliminates context switches; with certain types of applications this can save enough time to compensate for the possible idle time penalty.

  • However, in a highly parallel environment with 100 of processors, utilization of processors is not a major issue, but performance is. The total avoidance of switching results in substantial speedup.

Dedicated processor assignment affinitization 2 2
Dedicated Processor Assignment – Affinitization (2/2) parallelism

  • In Dedicated Assignment, when the number of threads exceeds the number of available processors, efficiency drops.

    • Due to thread preemption, context switching, suspending of other threads, and cache pollution

  • Notice that both gang scheduling and dedicated assignment are more concerned with allocation issues than scheduling issues.

  • The important question becomes "How many processors should a process be assigned?" rather than "How shall I choose the next process?"

Dynamic scheduling
Dynamic Scheduling parallelism

  • Threads within a process are variable

  • Sharing the work between OS and application.

  • When a job originates, it requests a certain number of processors. The OS grants some or all of the request, based on the number of processors currently available.

  • Then the application itself can decide which threads run when on which processors. This requires language support as would be provided with thread libraries.

  • When processors become free due to the termination of threads or processes, the OS can allocate them as needed to satisfy pending requests.

  • Simulations have shown that this approach is superior to gang scheduling and or dedicated scheduling.