cpe 458 parallel programming spring 2009 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Lecture 1 – Parallel Programming Primer PowerPoint Presentation
Download Presentation
Lecture 1 – Parallel Programming Primer

Loading in 2 Seconds...

play fullscreen
1 / 48

Lecture 1 – Parallel Programming Primer - PowerPoint PPT Presentation


  • 231 Views
  • Uploaded on

CPE 458 – Parallel Programming, Spring 2009. Lecture 1 – Parallel Programming Primer. Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.http://creativecommons.org/licenses/by/2.5. Outline.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Lecture 1 – Parallel Programming Primer' - gay


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cpe 458 parallel programming spring 2009
CPE 458 – Parallel Programming, Spring 2009Lecture 1 – Parallel Programming Primer

Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.http://creativecommons.org/licenses/by/2.5

outline
Outline

Introduction to Parallel Computing

Parallel Architectures

Parallel Algorithms

Common Parallel Programming Models

Message-Passing

Shared Address Space

introduction to parallel computing
Introduction to Parallel Computing
  • Moore’s Law
    • The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months.
    • Self-fulfilling prophecy
      • Computer architect goal
      • Software developer assumption
introduction to parallel computing4
Introduction to Parallel Computing
  • Impediments to Moore’s Law
    • Theoretical Limit
    • What to do with all that die space?
    • Design complexity
    • How do you meet the expected performance increase?
introduction to parallel computing5
Introduction to Parallel Computing
  • Parallelism
    • Continue to increase performance via parallelism.
introduction to parallel computing6
Introduction to Parallel Computing
  • From a software point-of-view, need to solve demanding problems
    • Engineering Simulations
    • Scientific Applications
    • Commercial Applications
  • Need the performance, resource gains afforded by parallelism
introduction to parallel computing7
Introduction to Parallel Computing
  • Engineering Simulations
    • Aerodynamics
    • Engine efficiency
introduction to parallel computing8
Introduction to Parallel Computing
  • Scientific Applications
    • Bioinformatics
    • Thermonuclear processes
    • Weather modeling
introduction to parallel computing9
Introduction to Parallel Computing
  • Commercial Applications
    • Financial transaction processing
    • Data mining
    • Web Indexing
introduction to parallel computing10
Introduction to Parallel Computing
  • Unfortunately, greatly increases coding complexity
    • Coordinating concurrent tasks
    • Parallelizing algorithms
    • Lack of standard environments and support
introduction to parallel computing11
Introduction to Parallel Computing
  • The challenge
    • Provide the abstractions , programming paradigms, and algorithms needed to effectively design, implement, and maintain applications that exploit the parallelism provided by the underlying hardware in order to solve modern problems.
parallel architectures
Parallel Architectures
  • Standard sequential architecture

RAM

CPU

BUS

Bottlenecks

parallel architectures13
Parallel Architectures
  • Use multiple
    • Datapaths
    • Memory units
    • Processing units
parallel architectures14
Parallel Architectures
  • SIMD
    • Single instruction stream, multiple data stream

Processing

Unit

Interconnect

Processing

Unit

Control

Unit

Processing

Unit

Processing

Unit

Processing

Unit

parallel architectures15
Parallel Architectures
  • SIMD
    • Advantages
      • Performs vector/matrix operations well
        • EX: Intel’s MMX chip
    • Disadvantages
      • Too dependent on type of computation
        • EX: Graphics
      • Performance/resource utilization suffers if computations aren’t “embarrasingly parallel”.
parallel architectures16
Parallel Architectures
  • MIMD
    • Multiple instruction stream, multiple data stream

Processing/Control

Unit

Interconnect

Processing/Control

Unit

Processing/Control

Unit

Processing/Control

Unit

parallel architectures17
Parallel Architectures
  • MIMD
    • Advantages
      • Can be built with off-the-shelf components
      • Better suited to irregular data access patterns
    • Disadvantages
      • Requires more hardware (!sharing control unit)
      • Store program/OS at each processor
  • Ex: Typical commodity SMP machines we see today.
parallel architectures18
Parallel Architectures
  • Task Communication
    • Shared address space
      • Use common memory to exchange data
      • Communication and replication are implicit
    • Message passing
      • Use send()/receive() primitives to exchange data
      • Communication and replication are explicit
parallel architectures19
Parallel Architectures
  • Shared address space
    • Uniform memory access (UMA)
      • Access to a memory location is independent of which processing unit makes the request.
    • Non-uniform memory access (NUMA)
      • Access to a memory location depends on the location of the processing unit relative to the memory accessed.
parallel architectures20
Parallel Architectures
  • Message passing
    • Each processing unit has its own private memory
    • Exchange of messages used to pass data
    • APIs
      • Message Passing Interface (MPI)
      • Parallel Virtual Machine (PVM)
parallel algorithms
Parallel Algorithms
  • Algorithm
    • a sequence of finite instructions, often used for calculation and data processing.
  • Parallel Algorithm
    • An algorithm that which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result
parallel algorithms22
Parallel Algorithms
  • Challenges
    • Identifying work that can be done concurrently.
    • Mapping work to processing units.
    • Distributing the work
    • Managing access to shared data
    • Synchronizing various stages of execution.
parallel algorithms23
Parallel Algorithms
  • Models
    • A way to structure a parallel algorithm by selecting decomposition and mapping techniques in a manner to minimize interactions.
parallel algorithms24
Parallel Algorithms
  • Models
    • Data-parallel
    • Task graph
    • Work pool
    • Master-slave
    • Pipeline
    • Hybrid
parallel algorithms25
Parallel Algorithms
  • Data-parallel
    • Mapping of Work
      • Static
      • Tasks -> Processes
    • Mapping of Data
      • Independent data items assigned to processes (Data Parallelism)
parallel algorithms26
Parallel Algorithms
  • Data-parallel
    • Computation
      • Tasks process data, synchronize to get new data or exchange results, continue until all data processed
    • Load Balancing
      • Uniform partitioning of data
    • Synchronization
      • Minimal or barrier needed at end of a phase
    • Ex: Ray Tracing
parallel algorithms27
Parallel Algorithms
  • Data-parallel

O

P

D

D

O

P

D

D

O

P

D

O

P

O

P

parallel algorithms28
Parallel Algorithms
  • Task graph
    • Mapping of Work
      • Static
      • Tasks are mapped to nodes in a data dependency task dependency graph (Task parallelism)
    • Mapping of Data
      • Data moves through graph (Source to Sink)
parallel algorithms29
Parallel Algorithms
  • Task graph
    • Computation
      • Each node processes input from previous node(s) and send output to next node(s) in the graph
    • Load Balancing
      • Assign more processes to a given task
      • Eliminate graph bottlenecks
    • Synchronization
      • Node data exchange
    • Ex: Parallel Quicksort, Divide and Conquer approaches
parallel algorithms30
Parallel Algorithms
  • Task graph

P

P

P

P

O

D

P

D

O

D

P

parallel algorithms31
Parallel Algorithms
  • Work pool
    • Mapping of Work/Data
      • No desired pre-mapping
      • Any task performed by any process
    • Computation
      • Processes work as data becomes available (or requests arrive)
parallel algorithms32
Parallel Algorithms
  • Work pool
    • Load Balancing
      • Dynamic mapping of tasks to processes
    • Synchronization
      • Adding/removing work from input queue
    • Ex: Web Server
parallel algorithms33
Parallel Algorithms
  • Work pool

Work Pool

Input queue

Output queue

P

P

P

P

P

parallel algorithms34
Parallel Algorithms
  • Master-slave
    • Modification to Worker Pool Model
      • One or more Master processes generate and assign work to worker processes
    • Load Balancing
      • A Master process can better distribute load to worker processes
parallel algorithms35
Parallel Algorithms
  • Pipeline
    • Mapping of work
      • Processes are assigned tasks that correspond to stages in the pipeline
      • Static
    • Mapping of Data
      • Data processed in FIFO order
        • Stream parallelism
parallel algorithms36
Parallel Algorithms
  • Pipeline
    • Computation
      • Data is passed through a succession of processes, each of which will perform some task on it
    • Load Balancing
      • Insure all stages of the pipeline are balanced (contain the same amount of work)
    • Synchronization
      • Producer/Consumer buffers between stages
    • Ex: Processor pipeline, graphics pipeline
parallel algorithms37
Parallel Algorithms
  • Pipeline

Input queue

Output queue

buffer

buffer

P

P

P

common parallel programming models
Common Parallel Programming Models
  • Message-Passing
  • Shared Address Space
common parallel programming models39
Common Parallel Programming Models
  • Message-Passing
    • Most widely used for programming parallel computers (clusters of workstations)
    • Key attributes:
      • Partitioned address space
      • Explicit parallelization
    • Process interactions
      • Send and receive data
common parallel programming models40
Common Parallel Programming Models
  • Message-Passing
    • Communications
      • Sending and receiving messages
      • Primitives
        • send(buff, size, destination)
        • receive(buff, size, source)
        • Blocking vs non-blocking
        • Buffered vs non-buffered
      • Message Passing Interface (MPI)
        • Popular message passing library
        • ~125 functions
common parallel programming models41
Common Parallel Programming Models
  • Message-Passing

send(buff1, 1024, p3)

receive(buff3, 1024, p1)

Workstation

Workstation

Workstation

Workstation

P1

P2

P3

P4

Data

common parallel programming models42
Common Parallel Programming Models
  • Shared Address Space
    • Mostly used for programming SMP machines (multicore chips)
    • Key attributes
      • Shared address space
        • Threads
        • Shmget/shmat UNIX operations
      • Implicit parallelization
    • Process/Thread communication
      • Memory reads/stores
common parallel programming models43
Common Parallel Programming Models
  • Shared Address Space
    • Communication
      • Read/write memory
        • EX: x++;
    • Posix Thread API
      • Popular thread API
      • Operations
        • Creation/deletion of threads
        • Synchronization (mutexes, semaphores)
        • Thread management
common parallel programming models44
Common Parallel Programming Models
  • Shared Address Space

Workstation

T1

T2

T3

T4

Data

RAM

SMP

parallel programming pitfalls
Parallel Programming Pitfalls
  • Synchronization
    • Deadlock
    • Livelock
    • Fairness
  • Efficiency
    • Maximize parallelism
  • Reliability
    • Correctness
    • Debugging
prelude to mapreduce
Prelude to MapReduce

MapReduce is a paradigm designed by Google for making a subset (albeit a large one) of distributed problems easier to code

Automates data distribution & result aggregation

Restricts the ways data can interact to eliminate locks (no shared state = no locks!)

prelude to mapreduce47
Prelude to MapReduce

Next time…

MapReduce parallel programming paradigm

references
References
  • Introduction to Parallel Computing, Grama et al., Pearson Education, 2003
  • Distributed Systems
    • http://code.google.com/edu/parallel/index.html
  • Distributed Computing Principles and Applications, M.L. Liu, Pearson Education 2004