multiprocessor scheduling n.
Skip this Video
Download Presentation
Multiprocessor Scheduling

Loading in 2 Seconds...

play fullscreen
1 / 25

Multiprocessor Scheduling - PowerPoint PPT Presentation

  • Uploaded on

Multiprocessor Scheduling. Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: Classifications of Multiprocessor Systems. Loosely coupled multiprocessor, or clusters

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Multiprocessor Scheduling' - hedia

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
multiprocessor scheduling

Multiprocessor Scheduling

Module 3.1

For a good summary on multiprocessor and real-time scheduling,


classifications of multiprocessor systems
Classifications of Multiprocessor Systems
  • Loosely coupled multiprocessor, or clusters
    • Each processor has its own memory and I/O channels.
  • Functionally specialized processors
    • Such as I/O processor
      • Nvidia GPGPU
    • Controlled by a master processor
  • Tightly coupled multiprocessing
    • MCMP
    • Processors share main memory
    • Controlled by operating system
    • More economical than clusters
types of parallelism
Types of parallelism
  • Bit-level parallelism
  • Instruction-level parallelism
  • Data parallelism
  • Task parallelism our focus
synchronization granuality
Synchronization Granuality
  • Refers to frequency of synchronization or parallelism among processes in the system
  • Five classes exist
    • Independent (SI is not applicable)
    • Very coarse (2000 < SI < 1M)
    • Course (200 < SI < 2000)
    • Medium (20 < SI < 200)
    • Fine (SI < 20)

SIis called the Synchronization Interval, and measured in instructions.

wikipedia on fine grained coarse grained and embarrassing parallelism
Wikipedia on Fine-grained, coarse-grained, and embarrassing parallelism
  • Applications are often classified according to how often their subtasks need to synchronize or communicate with each other.
  • An application exhibits fine-grained parallelism if its subtasks must communicate many times per second;
  • it exhibits coarse-grained parallelism if they do not communicate many times per second,
  • and it is embarrassingly parallel if they rarely or never have to communicate. Embarrassingly parallel applications are considered the easiest to parallelize.
independent parallelism
Independent Parallelism
  • Multiple unrelated processes
  • Separate application or job, e.g. spreadsheet, word processor, etc.
  • No synchronization
  • More than one processor is available
    • Average response time to users is less
coarse and very coarse grained parallelism
Coarse and Very Coarse-Grained Parallelism
  • Very coarse: distributed processing across network nodes to form a single computing environment
  • Coarse: Synchronization among processes at a very gross level
  • Good for concurrent processes running on a multiprogrammed uniprocessor
    • Can by supported on a multiprocessor with little change
medium grained parallelism
Medium-Grained Parallelism
  • Parallel processing or multitasking within a single application
  • Single application is a collection of threads
  • Threads usually interact frequently, leading to a medium-level synchronization.
fine grained parallelism
Fine-Grained Parallelism
  • Highly parallel applications
  • Synchronization every few instructions (on very short events).
  • Fill the gap between ILP (instruction level parallelism) and Medium-grained parallelism.
  • Can be found in small inner loops
    • Use of MPI and OpenMP programming languages
  • OS should not intervene. Usually done by HW
  • In practice, this is very specialized and fragmented area
  • Scheduling on a multiprocessor involves 3 interrelated design issues:
    • Assignment of processes to processors
    • Use of multiprogramming on individual processors
      • Makes sense for processes (coarse-grained)
      • May not be good for threads (medium-grained)
    • Actual dispatching of a process
      • What scheduling policy should we use: FCFS, RR, etc. Sometimes a very sophisticated policy becomes counter productive.
assignment of processes to processors
Assignment of Processes to Processors
  • Treat processors as a pooled resource and assign process to processors on demand
  • Permanently assign process to a processor
    • Dedicate short-term queue for each processor
    • Less overhead. Each does its own scheduling on its queue.
    • Disadvantage: Processor could be idle (has an empty queue) while another processor has a backlog.
assignment of processes to processors1
Assignment of Processes to Processors
  • Global queue
    • Schedule to any available processor
    • During the lifetime of the process, process may run on different processors at different times.
    • In SMP architecture, context switching can be done with small cost.
  • Master/slave architecture
    • Key kernel functions always run on a particular processor
    • Master is responsible for scheduling
    • Slave sends service request to the master
    • Synchronization is simplified
    • Disadvantages
      • Failure of master brings down whole system
      • Master can become a performance bottleneck
assignment of processes to processors2
Assignment of Processes to Processors
  • Peer architecture
    • Operating system can execute on any processor
    • Each processor does self-scheduling from a pool of available processes
    • Complicates the operating system
      • Make sure two processors do not choose the same process
      • Needs lots of synchronization
process scheduling in today s smp
Process Scheduling in Today’s SMP
  • M/M/M/K Queueing system
    • Single queue for all processes
    • Multiple queues are used for priorities
    • All queues feed to the common pool of processors
  • Specific scheduling disciplines is less important with more than one processor
    • A simple FCFS discipline with a static priority may suffice for a multi-processor system.
    • Illustrate using graph, p460
    • In conclusion, specific scheduling discipline is much less important with SMP than UP
  • Executes separate from the rest of the process
  • An application can be a set of threads that cooperate and execute concurrently in the same address space
  • Threads running on separate processors yields a dramatic gain in performance
multiprocessor thread scheduling four general approaches 1 2
Multiprocessor Thread Scheduling --Four General Approaches (1/2)
  • Load sharing
    • Processes are not assigned to a particular processor
  • Gang scheduling
    • A set of related threads is scheduled to run on a set of processors at the same time
multiprocessor thread scheduling four general approaches 2 2
Multiprocessor Thread Scheduling --Four General Approaches (2/2)
  • Dedicated processor assignment
    • Threads are assigned to a specific processor. (each thread can be run on a processor.)
    • When program terminates, processors are returned to the processor available pool
  • Dynamic scheduling
    • Number of threads can be altered during course of execution
load sharing
Load Sharing
  • Load is distributed evenly across the processors
  • No centralized scheduler required
    • OS runs on every processor to select the next thread.
  • Use global queues for ready threads
    • Usually FCFS policy
disadvantages of load sharing
Disadvantages of Load Sharing
  • Central queue needs mutual exclusion
    • May be a bottleneck when more than one processor looks for work at the same time
    • A noticeable problem when there are many processors.
  • Preemptive threads will unlikely resume execution on the same processor
    • Cache use is less efficient
  • If all threads are in the global queue, all threads of a program will not gain access to the processors at the same time.
    • Performance is compromised if coordination among threads is high.
gang scheduling
Gang Scheduling
  • Simultaneous scheduling of threads that make up a single process
  • Useful for applications where performance severely degrades when any part of the application is not running
  • Threads often need to synchronize with each other
  • Closely-related threads execute in parallel
  • Synchronization blocking is reduced
  • Less context switching
  • Scheduling overhead is reduced, as a single sync to signal() may affect many threads
dedicated processor assignment affinitization 1 2
Dedicated Processor Assignment – Affinitization (1/2)
  • When application is scheduled, each of its threads is assigned a processor that remains dedicated/affinitized to that thread until the application runs to completion
  • No multiprogramming of processors, i.e. one processor per a specific thread, and no other thread.
  • Some processors may be idle as threads may block
  • Eliminates context switches; with certain types of applications this can save enough time to compensate for the possible idle time penalty.
  • However, in a highly parallel environment with 100 of processors, utilization of processors is not a major issue, but performance is. The total avoidance of switching results in substantial speedup.
dedicated processor assignment affinitization 2 2
Dedicated Processor Assignment – Affinitization (2/2)
  • In Dedicated Assignment, when the number of threads exceeds the number of available processors, efficiency drops.
    • Due to thread preemption, context switching, suspending of other threads, and cache pollution
  • Notice that both gang scheduling and dedicated assignment are more concerned with allocation issues than scheduling issues.
  • The important question becomes "How many processors should a process be assigned?" rather than "How shall I choose the next process?"
dynamic scheduling
Dynamic Scheduling
  • Threads within a process are variable
  • Sharing the work between OS and application.
  • When a job originates, it requests a certain number of processors. The OS grants some or all of the request, based on the number of processors currently available.
  • Then the application itself can decide which threads run when on which processors. This requires language support as would be provided with thread libraries.
  • When processors become free due to the termination of threads or processes, the OS can allocate them as needed to satisfy pending requests.
  • Simulations have shown that this approach is superior to gang scheduling and or dedicated scheduling.