Parallel programming
1 / 35

Parallel Programming - PowerPoint PPT Presentation

  • Uploaded on

Parallel Programming. Introduction. Idea has been around since 1960’s pseudo parallel systems on multiprogram-able computers True parallelism Many processors connected to run in concert Multiprocessor system Distributed system stand-alone systems connected

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Parallel Programming' - mark-olsen

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


  • Idea has been around since 1960’s

    • pseudo parallel systems on multiprogram-able computers

  • True parallelism

    • Many processors connected to run in concert

      • Multiprocessor system

      • Distributed system

        • stand-alone systems connected

        • More complex with high-speed networks

Programming languages
Programming Languages

  • Used to express algorithms to solve problems presented by parallel processing systems

  • Used to write OSs that implement these solutions

  • Used to harness capabilities of multiple processors efficiently

  • Used to implement and express communication across networks

Two kinds of parallelism
Two kinds of parallelism

  • Existing in underlying hardware

  • As expressed in programming language

    • May not result in actual parallel processing

    • Could be implemented with pseudo parallelism

    • Concurrent programming – expresses only potential for parallelism

Some basics
Some Basics

  • Process

    • An instance of a program or program part that has been scheduled for independent execution

  • Heavy-weight process

    • full-fledged independent entity with all the memory and other resources that are ordinarily allocated by OS

  • Light-weight process or thread

    • shares resources with program it came from

Primary requirements for organization
Primary requirements for organization

  • Must be a way for processors to synchronize their activities

    • 1st processor input and sorts data

    • 2nd processor waits to perform computations on sorted data

  • Must be a way for processors to communicate data among themselves

    • 2nd processor needs data


  • SIMD (single-instruction, multiple-data)

    • One processor is controller

    • All processors execute same instructions on respective registers or data sets

    • Multiprocessing

    • Synchronous (all processors operate at same speed)

    • Implicit solution to synchronization problem

  • MIMD (multiple-instruction, multiple-data)

    • All processors act independently

    • Multiprocessor or distributed processor systems

    • Asynchronous (synchronization critical problem)

Os requirements for parallelism
OS requirements for Parallelism

  • Means of creating and destroying processes

  • Means of managing the number of processors used by processes

  • Mechanism for ensuring mutual exclusion on shared-memory systems

  • Mechanism for creating and maintaining communication channels between processors on distributed-memory systems

Language requirements
Language requirements

  • Machine independence

  • Adhere to language design principles

  • Some languages use shared-memory model and provide facilities for mutual exclusion through a library

  • Some assume distributed-memory model and provide communication facilities

  • A few include both

Common mechanisms
Common mechanisms

  • Threads

  • Semaphores

  • Monitors

  • Message passing

2 common sample problems
2 common sample problems

  • Bounded buffer problem

    • similar to producer-consumer problem

  • Parallel matrix multiplication

    • N3 algorithm

    • Assign a process to compute each element, each process on a separate processor  N steps

Without explicit language facilities
Without explicit language facilities

  • One approach is not to be explicit

    • Possible in some functional, logical, and OO languages

    • Certain inherent parallelism implicit

  • Language translators use optimization techniques to make use automatically of OS utilities to assign different processors to different parts of program

  • Suboptimal

Another alternative without explicit language facilities
Another alternative without explicit language facilities

  • Translator offers compiler options to allow explicit indicating of areas where parallelism is called for.

  • Most effective in nested loops

  • Example: Fortran

Parallel programming

m_set_procs –sets the

number of processes

share – access by

all processes

local – local to


compiler directive

synchronizes the

processes, all processes

wait for entire loop to

finish; one process

continues after loop

integer a(100, 100), b(100, 100), c(100,100)

integer i, j, k, numprocs, err

numprocs = 10

C code to read in a and b goes here

err = m_set_procs (numprocs)

C$doacross share (a, b, c), local (j, k)

do 10 i = 1, 100

do 10 j = 1, 100

c(i,j) = 0

do 10 k = 1, 100

c(i, j) = c(i,j) + a(i, k) * b (k, j)

10 continue

call m_kill_procs

C code to write out c goes here


3 rd way with explicit constructs
3rd way with explicit constructs

  • Provide a library of functions

  • This passes facilities provided by OS directly to programmer

  • (This is the same as providing it in language)

  • Example: C with library parallel.h

Parallel programming

m_set_procs –creates the

10 processes, all instances of


#include <parallel.h>

#define size 100

#define NUMPROCS 10

shared int a[SIZE][SIZE], b[SIZE][SIZE], c [SIZE] [SIZE]

void multiply (void)

{ int i, j, k;

for (i=m_get_myid(); i < SIZE; i += NUMPROCS)

for (j=0; j < SIZE; j++)

for (k=0; k < SIZE; k++)

c(i, j) += a(i, k) * b (k, j);


main ()

{ int err;

// code to read in a and b goes here

m_set_procs (NUMPROCS);

m_fork (multiply);

m_kill_procs ();

// C code to write out c goes here

return 0;


4 th final alternative
4th final alternative

  • Simply rely on OS

  • Example:

    • pipes in Unix OS

      ls | grep “java”

    • runs ls and grep in parallel

    • output of ls is piped to grep

Language with explicit mechanism
Language with explicit mechanism

  • 2 basic ways to create new processes

    • SPMD (single program multiple data)

      • split the current process into 2 or more that execute copies of the same program

    • MPMD (multiple program multiple data)

      • a segment of code associated with each new process

      • typical case fork-join model, in which a process creates several child processes, each with its own code (a fork), and then waits for the children to complete their execution (a join)

      • last example similar, but m_kill_procs takes place of join


  • Size of code assignable to separate processes

    • fine-grained: statement-level parallelism

    • medium-grained: procedure-level parallelism

    • large-grained: program-level parallelism

  • Can be an issue in program efficiency

    • small-grained: overhead

    • large-grained: may not exploit all opportunities for parallelism


  • fine-grained or medium-grained without overhead of full-blown process creation


  • Does parent suspend execution while child processes are executing, or does it continue to execute alongside them?

  • What memory, if any, does a parent share with its children or the children share among themselves?

Answers in last example
Answers in Last example

  • parent process suspended execution

  • indicate explicitly global variables shared by all processes

Process termination
Process Termination

  • Simplest case

    • a process executes its code to completion then ceases to exist

  • Complex case

    • process may need to continue executing until a certain condition is met and then terminate

Statement level parallelism ada
Statement-Level Parallelism (Ada)






Statement level parallelism fortran95
Statement-Level Parallelism (Fortran95)

FORALL (I = 1:100, J=1:100)

C(I,J) = 0;

DO 10 K = 1,100

C(I,J) = C(I,J) + A(I,k) * B(K,j)



Procedure level parallelism ada
Procedure-Level Parallelism (Ada)

x = newprocess(p);


  • where p is declared procedure and x is a process designator

  • similar to tasks in Ada

Program level parallelism unix
Program-Level Parallelism (Unix)

  • fork creates a process that is an

  • exact copy of calling process

    if (fork ( ) == 0)

    { /*..child executes this part */}


    { /* ..parent executes this part */}

  • a returned 0-value indicates process is the child

Java threads
Java threads

  • built into Java

  • Thread class part of java.lang package

  • reserved word synchronize

    • establish mutual exclusion

  • create an instance of Thread object

  • define its run method that will execute when thread starts

Java threads1
Java threads

  • 2 ways (I’ll show you second more versatile way)

  • Define a class that implements Runnable interface (define run method)

  • Then pass an object of this class to the Thread constructor

  • Note: Every Java program is already executing inside a thread whose run method is main.

Java thread example
Java Thread Example

class MyRunner implements Runnable

{ public void run()

{ … }


MyRunner m = new MyRunner ();

Thread t = new Thread (m);

t.start (); //t will now execute the run


Destroying threads
Destroying threads

  • let each thread run to completion

  • wait for other threads to finish

    t.start ();

    //do some other work

    t.join () //wait for t to finish

  • interrupt it

    t.start ();

    //do some other work

    t.interrupt() //tell t we are waiting…

    t.join () //wait for t to finish

Mutual exclusion
Mutual exclusion

class Queue

{ …

synchronized public Object dequeue ()

{ if (empty()) throw …


synchronized public Object enqueue (Object obj)

{ …



Mutual exclusion1
Mutual exclusion

class Remover implements Runnable

{ public Remover (Queue q) { ..}

public void run( ) { …q.dequeue() …}


class Insert implements Runnable

{ public Insert (Queue q) {…}

public void run () { …q.enqueue (…) …}


Mutual exclusion2
Mutual exclusion

Queue myqueue = new Queue(..);

Remover r = new Remover (q);

Inserter i = new Insert (q);

Thread t1 = new Thread (r);

Thread t2 = new Thread (i);



Manually stalling a thread and then reawakening it
Manually stalling a thread and then reawakening it

class Queue

{ …

synchronized public Object dequeue ()

{ try

{ while (empty()) wait();


catch (InterruptedException e) //reset interrupt

{ … }


synchronized public Object enqueue (Object obj)

{ …