Parallel programming
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Parallel Programming PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on
  • Presentation posted in: General

Parallel Programming. Introduction. Idea has been around since 1960’s pseudo parallel systems on multiprogram-able computers True parallelism Many processors connected to run in concert Multiprocessor system Distributed system stand-alone systems connected

Download Presentation

Parallel Programming

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Parallel programming

Parallel Programming


Introduction

Introduction

  • Idea has been around since 1960’s

    • pseudo parallel systems on multiprogram-able computers

  • True parallelism

    • Many processors connected to run in concert

      • Multiprocessor system

      • Distributed system

        • stand-alone systems connected

        • More complex with high-speed networks


Programming languages

Programming Languages

  • Used to express algorithms to solve problems presented by parallel processing systems

  • Used to write OSs that implement these solutions

  • Used to harness capabilities of multiple processors efficiently

  • Used to implement and express communication across networks


Two kinds of parallelism

Two kinds of parallelism

  • Existing in underlying hardware

  • As expressed in programming language

    • May not result in actual parallel processing

    • Could be implemented with pseudo parallelism

    • Concurrent programming – expresses only potential for parallelism


Some basics

Some Basics

  • Process

    • An instance of a program or program part that has been scheduled for independent execution

  • Heavy-weight process

    • full-fledged independent entity with all the memory and other resources that are ordinarily allocated by OS

  • Light-weight process or thread

    • shares resources with program it came from


Primary requirements for organization

Primary requirements for organization

  • Must be a way for processors to synchronize their activities

    • 1st processor input and sorts data

    • 2nd processor waits to perform computations on sorted data

  • Must be a way for processors to communicate data among themselves

    • 2nd processor needs data


Architectures

Architectures

  • SIMD (single-instruction, multiple-data)

    • One processor is controller

    • All processors execute same instructions on respective registers or data sets

    • Multiprocessing

    • Synchronous (all processors operate at same speed)

    • Implicit solution to synchronization problem

  • MIMD (multiple-instruction, multiple-data)

    • All processors act independently

    • Multiprocessor or distributed processor systems

    • Asynchronous (synchronization critical problem)


Os requirements for parallelism

OS requirements for Parallelism

  • Means of creating and destroying processes

  • Means of managing the number of processors used by processes

  • Mechanism for ensuring mutual exclusion on shared-memory systems

  • Mechanism for creating and maintaining communication channels between processors on distributed-memory systems


Language requirements

Language requirements

  • Machine independence

  • Adhere to language design principles

  • Some languages use shared-memory model and provide facilities for mutual exclusion through a library

  • Some assume distributed-memory model and provide communication facilities

  • A few include both


Common mechanisms

Common mechanisms

  • Threads

  • Semaphores

  • Monitors

  • Message passing


2 common sample problems

2 common sample problems

  • Bounded buffer problem

    • similar to producer-consumer problem

  • Parallel matrix multiplication

    • N3 algorithm

    • Assign a process to compute each element, each process on a separate processor  N steps


Without explicit language facilities

Without explicit language facilities

  • One approach is not to be explicit

    • Possible in some functional, logical, and OO languages

    • Certain inherent parallelism implicit

  • Language translators use optimization techniques to make use automatically of OS utilities to assign different processors to different parts of program

  • Suboptimal


Another alternative without explicit language facilities

Another alternative without explicit language facilities

  • Translator offers compiler options to allow explicit indicating of areas where parallelism is called for.

  • Most effective in nested loops

  • Example: Fortran


Parallel programming

m_set_procs –sets the

number of processes

share – access by

all processes

local – local to

process

compiler directive

synchronizes the

processes, all processes

wait for entire loop to

finish; one process

continues after loop

integer a(100, 100), b(100, 100), c(100,100)

integer i, j, k, numprocs, err

numprocs = 10

C code to read in a and b goes here

err = m_set_procs (numprocs)

C$doacross share (a, b, c), local (j, k)

do 10 i = 1, 100

do 10 j = 1, 100

c(i,j) = 0

do 10 k = 1, 100

c(i, j) = c(i,j) + a(i, k) * b (k, j)

10 continue

call m_kill_procs

C code to write out c goes here

end


3 rd way with explicit constructs

3rd way with explicit constructs

  • Provide a library of functions

  • This passes facilities provided by OS directly to programmer

  • (This is the same as providing it in language)

  • Example: C with library parallel.h


Parallel programming

m_set_procs –creates the

10 processes, all instances of

multiply

#include <parallel.h>

#define size 100

#define NUMPROCS 10

shared int a[SIZE][SIZE], b[SIZE][SIZE], c [SIZE] [SIZE]

void multiply (void)

{ int i, j, k;

for (i=m_get_myid(); i < SIZE; i += NUMPROCS)

for (j=0; j < SIZE; j++)

for (k=0; k < SIZE; k++)

c(i, j) += a(i, k) * b (k, j);

}

main ()

{ int err;

// code to read in a and b goes here

m_set_procs (NUMPROCS);

m_fork (multiply);

m_kill_procs ();

// C code to write out c goes here

return 0;

}


4 th final alternative

4th final alternative

  • Simply rely on OS

  • Example:

    • pipes in Unix OS

      ls | grep “java”

    • runs ls and grep in parallel

    • output of ls is piped to grep


Language with explicit mechanism

Language with explicit mechanism

  • 2 basic ways to create new processes

    • SPMD (single program multiple data)

      • split the current process into 2 or more that execute copies of the same program

    • MPMD (multiple program multiple data)

      • a segment of code associated with each new process

      • typical case fork-join model, in which a process creates several child processes, each with its own code (a fork), and then waits for the children to complete their execution (a join)

      • last example similar, but m_kill_procs takes place of join


Granularity

Granularity

  • Size of code assignable to separate processes

    • fine-grained: statement-level parallelism

    • medium-grained: procedure-level parallelism

    • large-grained: program-level parallelism

  • Can be an issue in program efficiency

    • small-grained: overhead

    • large-grained: may not exploit all opportunities for parallelism


Thread

Thread

  • fine-grained or medium-grained without overhead of full-blown process creation


Issues

Issues

  • Does parent suspend execution while child processes are executing, or does it continue to execute alongside them?

  • What memory, if any, does a parent share with its children or the children share among themselves?


Answers in last example

Answers in Last example

  • parent process suspended execution

  • indicate explicitly global variables shared by all processes


Process termination

Process Termination

  • Simplest case

    • a process executes its code to completion then ceases to exist

  • Complex case

    • process may need to continue executing until a certain condition is met and then terminate


Statement level parallelism ada

Statement-Level Parallelism (Ada)

parbegin

S1;

S2;

Sn;

parend;


Statement level parallelism fortran95

Statement-Level Parallelism (Fortran95)

FORALL (I = 1:100, J=1:100)

C(I,J) = 0;

DO 10 K = 1,100

C(I,J) = C(I,J) + A(I,k) * B(K,j)

10 CONTINUE

END FORALL


Procedure level parallelism ada

Procedure-Level Parallelism (Ada)

x = newprocess(p);

killprocess(x);

  • where p is declared procedure and x is a process designator

  • similar to tasks in Ada


Program level parallelism unix

Program-Level Parallelism (Unix)

  • fork creates a process that is an

  • exact copy of calling process

    if (fork ( ) == 0)

    { /*..child executes this part */}

    else

    { /* ..parent executes this part */}

  • a returned 0-value indicates process is the child


Java threads

Java threads

  • built into Java

  • Thread class part of java.lang package

  • reserved word synchronize

    • establish mutual exclusion

  • create an instance of Thread object

  • define its run method that will execute when thread starts


Java threads1

Java threads

  • 2 ways (I’ll show you second more versatile way)

  • Define a class that implements Runnable interface (define run method)

  • Then pass an object of this class to the Thread constructor

  • Note: Every Java program is already executing inside a thread whose run method is main.


Java thread example

Java Thread Example

class MyRunner implements Runnable

{ public void run()

{ … }

}

MyRunner m = new MyRunner ();

Thread t = new Thread (m);

t.start (); //t will now execute the run

//method


Destroying threads

Destroying threads

  • let each thread run to completion

  • wait for other threads to finish

    t.start ();

    //do some other work

    t.join () //wait for t to finish

  • interrupt it

    t.start ();

    //do some other work

    t.interrupt() //tell t we are waiting…

    t.join () //wait for t to finish


Mutual exclusion

Mutual exclusion

class Queue

{ …

synchronized public Object dequeue ()

{ if (empty()) throw …

}

synchronized public Object enqueue (Object obj)

{ …

}

}


Mutual exclusion1

Mutual exclusion

class Remover implements Runnable

{ public Remover (Queue q) { ..}

public void run( ) { …q.dequeue() …}

}

class Insert implements Runnable

{ public Insert (Queue q) {…}

public void run () { …q.enqueue (…) …}

}


Mutual exclusion2

Mutual exclusion

Queue myqueue = new Queue(..);

Remover r = new Remover (q);

Inserter i = new Insert (q);

Thread t1 = new Thread (r);

Thread t2 = new Thread (i);

t1.start();

t2.start();


Manually stalling a thread and then reawakening it

Manually stalling a thread and then reawakening it

class Queue

{ …

synchronized public Object dequeue ()

{ try

{ while (empty()) wait();

}

catch (InterruptedException e) //reset interrupt

{ … }

}

synchronized public Object enqueue (Object obj)

{ …

notifyAll();

}

}


  • Login