- 98 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Parallel Programming' - sanjiv

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Parallel Programming

Yang Xianchun

Department of Computer Science and Technology

Nanjing University

Introduction

Agenda

- About the Course
- Evolution of Computing Technology
- Grand Challenge Problem Examples
- Motivations for Parallelism
- Parallel Computation
- A Road Map of Topics
- Parallel Computing Terminology
- An Illustrative Example
- Control-Parallel Approach
- Data-Parallel Approach
- Performance Analysis

About the course

- Description
- An introduction course to parallel programming concepts and techniques
- No prior parallel computation background is necessary
- Prerequisites
- Working knowledge of computer systems
- Adequate programming skill in C or C++
- Textbook
- Harry Jordan and Gita Alaghband, “Fundamentals of Parallel Processing,” Prentice Hall, 2003.

About the course(cont.)

- Topics
- Introduction
- Parallel architectures
- Data parallel computing
- Fortran 90 / 95
- Shared memory computing
- Pthreads
- OpenMP
- Message passing programming
- PVM
- MPI
- Applications
- Sorting
- Numerical algorithms
- Graph algorithms
- Interconnection network
- Performance
- Models

About the course(cont.)

- Organization
- Lectures
- Programming projects
- using parallel programming tools PVM and MPI on Linux / Unix machines
- Homeworks
- Class Tests and Final Exam (Open book)

Evolution of Computing Technology

- Hardware
- Vacuum tubes, relay memory
- Discrete transistors, core memory
- Integrated circuits, pipelined CPU
- VLSI microprocessors, solid state memory
- Languages and Software
- Machine / Assembly languages
- Algol / Fortran with compilers, batch processing OS
- C, multiprocessing, timesharing OS
- C++ / Java, parallelizing compilers, distributed OS

Evolution of Computing Technology(cont.)

- The Driving Force Behind the Technology Advances
- The ever-increasing demands on computing power
- Scientific computing (e.g. Large-scale simulations)
- Commercial computing (e.g. Databases)
- 3D graphics and realistic animation
- Multimedia internet applications

Grand Challenge Problem Examples

- Simulations of the earth’s climate
- Resolution: 10 kilometers
- Period: 1 year
- Ocean and biosphere models: simple
- Total requirements: 1016 floating-point operations per second
- With a supercomputer capable of 10 Giga FLOPS, it will take 10 days to execute
- Real-time processing of 3D graphics
- Number of data elements: 109 (1024 in each dimension)
- Number of operations per element : 200
- Update rate: 30 times per second
- Total requirements: 6.4 x 1012 operations per second
- With processor capable of 10 Giga IOPS, we need 640 of them

Motivations for Parallelism

- Conventional computers and sequential
- a single CPU
- a single stream of instructions
- executing one instruction at a time (not completely true)
- Single-CPU processor has a performance limit
- Moore’s Law can’t go on forever
- How to increase computing power?
- Better processor design
- More transistors, larger caches, advanced architectures
- Better system design
- Faster / larger memory, faster buses, better OS
- Scale up the computer (parallelism)
- Replicate hardware at component or whole computer levels
- Parallel processor’s power is virtually unlimited
- 10 processor @ 500 Mega FLOPS each = 5 Giga FLOPS
- 100 processor @ 500 Mega FLOPS each = 50 Giga FLOPS
- 1,000 processor @ 500 Mega FLOPS each = 500 Giga FLOPS
- ...

Motivations for Parallelism(cont.)

- Additional Motivations
- Solving bigger problems
- Lowering cost

Parallel Computation

- Parallel computation means
- multiple CPUs
- single or multiple streams of instructions
- executing multiple instructions at a time
- Typical process
- Breaking a problem into pieces and arranging for all pieces to be solved simultaneously on a multi-CPU computer system
- Requirements
- Parallel algorithms
- only parallelizable applications can benefit from parallel implementation
- Parallel languages and/or constructs
- expressing parallelism
- Parallel architectures
- provide hardware support

A Road Map of Topics

Machine Models

- PRAM
- LogP

Parallel Architectures

- Vector / SIMD / MIMD
- design issues

Programming Models

- data parallel
- shared memory
- message passing

Parallel Algorithms

- master-slave
- divide-conquer
- control vs. Data
- complexity analysis

Programming Languages

- Fortran 90 / 95
- Pthreads, OpenMP
- PVM, MPI

Applications

- Scientific computations
- data-intensive problems
- performance measurement

Parallel Computing Terminology (1)

- Hardware
- Multicomputers
- tightly networked, multiple uniform computers
- Multiprocessors
- tightly networked, multiple uniform processors with additional memory units
- Supercomputers
- general purpose and high-performance, nowadays almost always parallel
- Clusters
- Loosely networked commodity computers

Parallel Computing Terminology (2)

- Programming
- Pipelining
- divide computation into stages (segments)
- assign separate functional units to each stage
- Data Parallelism
- multiple (uniform) functional units
- apply same operation simultaneously to different elements of data set
- Control Parallelism
- multiple (specialized) functional units
- apply distinct operations to data elements concurrently

Parallel Computing Terminology (3)

- Performance
- Throughput
- number of results per unit time
- Speedup

Time needed for the most efficient sequential algorithm

S= ———————————————————————————

Time needed on a pipelined / parallel machine

- Scalability
- An algorithm is scalable if the available parallelism increases at least linearly with problem size
- An architecture is scalable if it gives same performance per processor, as the number of processors and the size of the problem are both increased
- Data-parallel algorithms tend to be more scalable than control-parallel algorithms

An Illustrative Example

- Problem
- Find all primes less than or equal to some positive integer n
- Method (the sieve algorithm)
- Write down all th integers from 1 to n
- Cross out from the list all multiples of 2, 3, 5, 7, … up to sqrt (n)

An Illustrative Example (cont.)

- sequential Implementation
- Boolean array representing the integers from 1 to n
- Buffer for holding current prime
- Index for loop iterating through the array

An Illustrative Example (cont.)

- Control-Parallel Approach
- Different processors strike out multiples of different primes
- The boolean array and the current prime is shared; each processor has its own private copy of loop index

An Illustrative Example (cont.)

- Control-Parallel Approach (cont.)
- Potential Problem — Race Conditions
- Race 1: More than one processor may sieve multiples of the same prime
- a processor reads the current prime, p, and goes off to sieve multiples of p ; later it finds a new prime and updates the current prime buffer
- before the current prime is updated, another processor comes and reads p and goes off to do the same thing
- Race 2: A processor may sieve multiples of a composite number
- processor A start marking multiples of 2
- before it can mark any cells, processor B finds an unmarked cell, 3, and starts marking multiples of 3
- then processor C comes in and finds the next unmarked cell, 4, and starts marking multiples of 4
- These two race conditions would not cause incorrect result, but they will cause inefficiency

An Illustrative Example (cont.)

- Data-Parallel Approach
- Each processor responsible for a unique range of the integers, it does all the striking in that range
- Processor 1 is responsible for broadcasting its findings to other processors
- Potential Problem
- If [n/p] < sqrt(n), more than one processor need to broadcast their findings

An Illustrative Example (cont.)

- Performance Analysis
- Sequential Algorithm
- Cost of sieving multiples of 2: [(n-3)/2]
- Cost of sieving multiples of 3: [(n-8)/3]
- Cost of sieving multiples of 5: [(n-24)/5]
- ...
- For n=1,000, T=1,411
- Control-Parallel Algorithm
- For p=2, n=1,000, T=706
- For p=3, n=1,000, T=499
- For p=4, n=1,000, T=499

An Illustrative Example (cont.)

- Performance Analysis (cont.)
- Data-Parallel Algorithm
- Cost of broadcasting: k(P-1)l
- Cost of striking:

([(n/p)/2]+ [(n/p)/3]+ … + [(n/p)/pk])χ

- For p=2, n=1,000, T≈781
- For p=3, n=1,000, T≈ 471
- For p=4, n=1,000, T≈ 337

Download Presentation

Connecting to Server..