1 / 24

Parallel Programming

Parallel Programming. Aaron Bloomfield CS 415 Fall 2005. Why Parallel Programming?. Predict weather Predict spread of SARS Predict path of hurricanes Predict oil slick propagation Model growth of bio-plankton/fisheries Structural simulations Predict path of forest fires

oriole
Download Presentation

Parallel Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Programming Aaron Bloomfield CS 415 Fall 2005

  2. Why Parallel Programming? • Predict weather • Predict spread of SARS • Predict path of hurricanes • Predict oil slick propagation • Model growth of bio-plankton/fisheries • Structural simulations • Predict path of forest fires • Model formation of galaxies • Simulate nuclear explosions

  3. Code that can be parallelized do i= 1 to max, a[i] = b[i] + c[i] * d[i] end do

  4. Parallel Computers • Programming mode types • Shared memory • Message passing

  5. Memory Memory Memory ... Pn P1 P0 P0 Communication Interconnect Distributed Memory Architecture • Each Processor has direct access only to its local memory • Processors are connected via high-speed interconnect • Data structures must be distributed • Data exchange is done via explicit processor-to-processor communication: send/receive messages • Programming Models • Widely used standard: MPI • Others: PVM, Express, P4, Chameleon, PARMACS, ...

  6. Message Passing Interface MPI provides: • Point-to-point communication • Collective operations • Barrier synchronization • gather/scatter operations • Broadcast, reductions • Different communication modes • Synchronous/asynchronous • Blocking/non-blocking • Buffered/unbuffered • Predefined and derived datatypes • Virtual topologies • Parallel I/O (MPI 2) • C/C++ and Fortran bindings • http://www.mpi-forum.org

  7. ... Pn P1 P0 P0 Cache Cache Cache Cache Shared Bus Global Shared Memory Shared Memory Architecture • Processors have direct access to global memory and I/O through bus or fast switching network • Cache Coherency Protocol guarantees consistency of memory and I/O accesses • Each processor also has its own memory (cache) • Data structures are shared in global address space • Concurrent access to shared memory must be coordinated • Programming Models • Multithreading (Thread Libraries) • OpenMP

  8. OpenMP • OpenMP: portable shared memory parallelism • Higher-level API for writing portable multithreaded applications • Provides a set of compiler directives and library routines for parallel application programmers • API bindings for Fortran, C, and C++ http://www.OpenMP.org

  9. Approaches • Parallel Algorithms • Parallel Language • Message passing (low-level) • Parallelizing compilers

  10. Parallel Languages • CSP - Hoare’s notation for parallelism as a network of sequential processes exchanging messages. • Occam - Real language based on CSP. Used for the transputer, in Europe.

  11. Fortran for parallelism • Fortran 90 - Array language. Triplet notation for array sections. Operations and intrinsic functions possible on array sections. • High Performance Fortran (HPF) - Similar to Fortran 90, but includes data layout specifications to help the compiler generate efficient code.

  12. More parallel languages • ZPL - array-based language at UW. Compiles into C code (highly portable). • C* - C extended for parallelism

  13. Object-Oriented • Concurrent Smalltalk • Threads in Java, Ada, thread libraries for use in C/C++ • This uses a library of parallel routines

  14. Functional • NESL, Multiplisp • Id & Sisal (more dataflow)

  15. Parallelizing Compilers Automatically transform a sequential program into a parallel program. • Identify loops whose iterations can be executed in parallel. • Often done in stages. Q: Which loops can be run in parallel? Q: How should we distribute the work/data?

  16. Data Dependences Flow dependence - RAW. Read-After-Write. A "true" dependence. Read a value after it has been written into a variable. Anti-dependence - WAR. Write-After-Read. Write a new value into a variable after the old value has been read. Output dependence - WAW. Write-After-Write. Write a new value into a variable and then later on write another value into the same variable.

  17. Example 1: A = 90; 2: B = A; 3: C = A + D 4: A = 5;

  18. Dependencies A parallelizing compiler must identify loops that do not have dependences BETWEEN ITERATIONS of the loop. Example: do I = 1, 1000 A(I) = B(I) + C(I) D(I) = A(I) end do

  19. Example Fork one thread for each processor Each thread executes the loop: do I = my_lo, my_hi A(I) = B(I) + C(I) D(I) = A(I) end do Wait for all threads to finish before proceeding.

  20. Another Example do I = 1, 1000 A(I) = B(I) + C(I) D(I) = A(I+1) end do

  21. Yet Another Example do I = 1, 1000 A( X(I) ) = B(I) + C(I) D(I) = A( X(I) ) end do

  22. Parallel Compilers • Two concerns: • Parallelizing code • Compiler will move code around to uncover parallel operations • Data locality • If a parallel operation has to get data from another processor’s memory, that’s bad

  23. Distributed computing • Take a big task that has natural parallelism • Split it up to may different computers across a network • Examples: SETI@Home, prime number searches, Google Compute, etc. • Distributed computing is a form of parallel computing

More Related