Parallel programming
Download
1 / 22

Parallel Programming - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

Parallel Programming. Yang Xianchun Department of Computer Science and Technology Nanjing University. Introduction. Agenda. About the Course Evolution of Computing Technology Grand Challenge Problem Examples Motivations for Parallelism Parallel Computation A Road Map of Topics

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Parallel Programming' - sanjiv


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Parallel programming

Parallel Programming

Yang Xianchun

Department of Computer Science and Technology

Nanjing University

Introduction


Agenda
Agenda

  • About the Course

  • Evolution of Computing Technology

  • Grand Challenge Problem Examples

  • Motivations for Parallelism

  • Parallel Computation

  • A Road Map of Topics

  • Parallel Computing Terminology

  • An Illustrative Example

    • Control-Parallel Approach

    • Data-Parallel Approach

    • Performance Analysis


About the course
About the course

  • Description

    • An introduction course to parallel programming concepts and techniques

    • No prior parallel computation background is necessary

  • Prerequisites

    • Working knowledge of computer systems

    • Adequate programming skill in C or C++

  • Textbook

    • Harry Jordan and Gita Alaghband, “Fundamentals of Parallel Processing,” Prentice Hall, 2003.


About the course cont
About the course(cont.)

  • Topics

    • Introduction

    • Parallel architectures

    • Data parallel computing

      • Fortran 90 / 95

    • Shared memory computing

      • Pthreads

      • OpenMP

    • Message passing programming

      • PVM

      • MPI

    • Applications

      • Sorting

      • Numerical algorithms

      • Graph algorithms

    • Interconnection network

    • Performance

    • Models


About the course cont1
About the course(cont.)

  • Organization

    • Lectures

    • Programming projects

      • using parallel programming tools PVM and MPI on Linux / Unix machines

    • Homeworks

    • Class Tests and Final Exam (Open book)


Evolution of computing technology
Evolution of Computing Technology

  • Hardware

    • Vacuum tubes, relay memory

    • Discrete transistors, core memory

    • Integrated circuits, pipelined CPU

    • VLSI microprocessors, solid state memory

  • Languages and Software

    • Machine / Assembly languages

    • Algol / Fortran with compilers, batch processing OS

    • C, multiprocessing, timesharing OS

    • C++ / Java, parallelizing compilers, distributed OS


Evolution of computing technology cont
Evolution of Computing Technology(cont.)

  • The Driving Force Behind the Technology Advances

    • The ever-increasing demands on computing power

      • Scientific computing (e.g. Large-scale simulations)

      • Commercial computing (e.g. Databases)

      • 3D graphics and realistic animation

      • Multimedia internet applications


Grand challenge problem examples
Grand Challenge Problem Examples

  • Simulations of the earth’s climate

    • Resolution: 10 kilometers

    • Period: 1 year

    • Ocean and biosphere models: simple

  • Total requirements: 1016 floating-point operations per second

  • With a supercomputer capable of 10 Giga FLOPS, it will take 10 days to execute

  • Real-time processing of 3D graphics

    • Number of data elements: 109 (1024 in each dimension)

    • Number of operations per element : 200

    • Update rate: 30 times per second

  • Total requirements: 6.4 x 1012 operations per second

  • With processor capable of 10 Giga IOPS, we need 640 of them


  • Motivations for parallelism
    Motivations for Parallelism

    • Conventional computers and sequential

      • a single CPU

      • a single stream of instructions

      • executing one instruction at a time (not completely true)

    • Single-CPU processor has a performance limit

      • Moore’s Law can’t go on forever

  • How to increase computing power?

    • Better processor design

      • More transistors, larger caches, advanced architectures

    • Better system design

      • Faster / larger memory, faster buses, better OS

    • Scale up the computer (parallelism)

      • Replicate hardware at component or whole computer levels

  • Parallel processor’s power is virtually unlimited

    • 10 processor @ 500 Mega FLOPS each = 5 Giga FLOPS

    • 100 processor @ 500 Mega FLOPS each = 50 Giga FLOPS

    • 1,000 processor @ 500 Mega FLOPS each = 500 Giga FLOPS

    • ...


  • Motivations for parallelism cont
    Motivations for Parallelism(cont.)

    • Additional Motivations

      • Solving bigger problems

      • Lowering cost


    Parallel computation
    Parallel Computation

    • Parallel computation means

      • multiple CPUs

      • single or multiple streams of instructions

      • executing multiple instructions at a time

    • Typical process

      • Breaking a problem into pieces and arranging for all pieces to be solved simultaneously on a multi-CPU computer system

    • Requirements

      • Parallel algorithms

        • only parallelizable applications can benefit from parallel implementation

      • Parallel languages and/or constructs

        • expressing parallelism

      • Parallel architectures

        • provide hardware support


    A road map of topics
    A Road Map of Topics

    Machine Models

    • PRAM

    • LogP

    Parallel Architectures

    • Vector / SIMD / MIMD

    • design issues

    Programming Models

    • data parallel

    • shared memory

    • message passing

    Parallel Algorithms

    • master-slave

    • divide-conquer

    • control vs. Data

    • complexity analysis

    Programming Languages

    • Fortran 90 / 95

    • Pthreads, OpenMP

    • PVM, MPI

    Applications

    • Scientific computations

    • data-intensive problems

    • performance measurement


    Parallel computing terminology 1
    Parallel Computing Terminology (1)

    • Hardware

      • Multicomputers

        • tightly networked, multiple uniform computers

      • Multiprocessors

        • tightly networked, multiple uniform processors with additional memory units

      • Supercomputers

        • general purpose and high-performance, nowadays almost always parallel

      • Clusters

        • Loosely networked commodity computers


    Parallel computing terminology 2
    Parallel Computing Terminology (2)

    • Programming

      • Pipelining

        • divide computation into stages (segments)

        • assign separate functional units to each stage

      • Data Parallelism

        • multiple (uniform) functional units

        • apply same operation simultaneously to different elements of data set

      • Control Parallelism

        • multiple (specialized) functional units

        • apply distinct operations to data elements concurrently


    Parallel computing terminology 3
    Parallel Computing Terminology (3)

    • Performance

      • Throughput

        • number of results per unit time

      • Speedup

        Time needed for the most efficient sequential algorithm

        S= ———————————————————————————

        Time needed on a pipelined / parallel machine

      • Scalability

        • An algorithm is scalable if the available parallelism increases at least linearly with problem size

        • An architecture is scalable if it gives same performance per processor, as the number of processors and the size of the problem are both increased

        • Data-parallel algorithms tend to be more scalable than control-parallel algorithms


    An illustrative example
    An Illustrative Example

    • Problem

      • Find all primes less than or equal to some positive integer n

    • Method (the sieve algorithm)

      • Write down all th integers from 1 to n

      • Cross out from the list all multiples of 2, 3, 5, 7, … up to sqrt (n)


    An Illustrative Example (cont.)

    • sequential Implementation

      • Boolean array representing the integers from 1 to n

      • Buffer for holding current prime

      • Index for loop iterating through the array


    An illustrative example cont
    An Illustrative Example (cont.)

    • Control-Parallel Approach

      • Different processors strike out multiples of different primes

      • The boolean array and the current prime is shared; each processor has its own private copy of loop index


    An illustrative example cont1
    An Illustrative Example (cont.)

    • Control-Parallel Approach (cont.)

      • Potential Problem — Race Conditions

        • Race 1: More than one processor may sieve multiples of the same prime

          • a processor reads the current prime, p, and goes off to sieve multiples of p ; later it finds a new prime and updates the current prime buffer

          • before the current prime is updated, another processor comes and reads p and goes off to do the same thing

        • Race 2: A processor may sieve multiples of a composite number

          • processor A start marking multiples of 2

          • before it can mark any cells, processor B finds an unmarked cell, 3, and starts marking multiples of 3

          • then processor C comes in and finds the next unmarked cell, 4, and starts marking multiples of 4

        • These two race conditions would not cause incorrect result, but they will cause inefficiency


    An illustrative example cont2
    An Illustrative Example (cont.)

    • Data-Parallel Approach

      • Each processor responsible for a unique range of the integers, it does all the striking in that range

      • Processor 1 is responsible for broadcasting its findings to other processors

      • Potential Problem

        • If [n/p] < sqrt(n), more than one processor need to broadcast their findings


    An illustrative example cont3
    An Illustrative Example (cont.)

    • Performance Analysis

      • Sequential Algorithm

        • Cost of sieving multiples of 2: [(n-3)/2]

        • Cost of sieving multiples of 3: [(n-8)/3]

        • Cost of sieving multiples of 5: [(n-24)/5]

        • ...

        • For n=1,000, T=1,411

      • Control-Parallel Algorithm

        • For p=2, n=1,000, T=706

        • For p=3, n=1,000, T=499

        • For p=4, n=1,000, T=499


    An illustrative example cont4
    An Illustrative Example (cont.)

    • Performance Analysis (cont.)

      • Data-Parallel Algorithm

        • Cost of broadcasting: k(P-1)l

        • Cost of striking:

          ([(n/p)/2]+ [(n/p)/3]+ … + [(n/p)/pk])χ

        • For p=2, n=1,000, T≈781

        • For p=3, n=1,000, T≈ 471

        • For p=4, n=1,000, T≈ 337


    ad