1 / 11

EECS 570

EECS 570. Notes on Chapter 2 – Parallel Programs. Terminology. Task: Programmer-defined sequential piece of work Concurrency is only across tasks Qualitative amount of work may be: small large Process (thread): Abstract entity that performs tasks Equivalent OS concepts

Download Presentation

EECS 570

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 570 • Notes on Chapter 2 – Parallel Programs EECS 570: Fall 2003 -- rev1

  2. Terminology • Task: • Programmer-defined sequential piece of work • Concurrency is only across tasks • Qualitative amount of work may be: • small • large • Process (thread): • Abstract entity that performs tasks • Equivalent OS concepts • Must communicate and synchronize with other processes • Execute on processor • typically one-to-one mapping EECS 570: Fall 2003 -- rev1

  3. Step in Creating a Parallel Program EECS 570: Fall 2003 -- rev1

  4. Decomposition • Break up computation into tasks to be divided among processes • could be static, quasi-static or dynamic • i.e., identify concurrency and decide level at which to exploit it Goal: Enough tasks to keep processes busy... ...but not too many EECS 570: Fall 2003 -- rev1

  5. Amdahl's Law • Assume fraction s of sequential execution is inherently serial • remainder (1- s) can be perfectly parallelized • Speedup with p processors is: 1 1- s s - p • Limit: ? EECS 570: Fall 2003 -- rev1

  6. Aside on Cost-Effective Computing • Isn't Speedup(P) < P inefficient? • If only throughput matters, use P computers instead? • But much of a computer's cost is NOT in the processor [Wood & Hill, IEEE Computer 2/95] Let Costup(P) = Cost(P)/Cost(l) • Parallel computing cost-effective: Speedup(P) > Costup(P) • E.g. for SGI PowerChallenge w/500MB: Costup(32) = 8.6 EECS 570: Fall 2003 -- rev1

  7. Assignment • Assign tasks to processes • Again, can be static, dynamic, or in between • Goals: • Balance workload • Reduce communication • Minimize management overhead • Decomposition + Assignment = Partitioning • Mostly independent of architecture/programming model EECS 570: Fall 2003 -- rev1

  8. Orchestration • How do we achieve task communication, synchronization, and assignment given programming model? • data structures (naming) • task scheduling • communication: messages, shared data accesses • synchronization: locks, semaphores, barriers, etc, • Goals • Reduce cost of communication and synchronization • Preserve data locality (reduce communication, enhance caching) • Schedule tasks to satisfy dependencies early • Reduce overhead of parallelism management EECS 570: Fall 2003 -- rev1

  9. Mapping • Assign processes to processors • Usually up to OS, maybe with user hints/preferences • Usually assume one-to-one, static • Terminology: • space sharing • gang scheduling • processor affinity EECS 570: Fall 2003 -- rev1

  10. Parallelizing Computation vs. Data Above view is centered around computation • Computation is decomposed and assigned (partitioned) Partitioning data is often a natural view too • Computation follows data: owner computes • Grid example: data mining; High Performance Fortran (HPF) But not always sufficient • Distinction between comp. and data stronger in many applications • Barnes-Hut, Raytrace EECS 570: Fall 2003 -- rev1

  11. Assignment • Static assignments (given decomposition into rows) • block: row i is assigned to process floor(i/p) • cyclic: process i is assigned rows I, i+p, and so on • Dynamic • get a row index, work on the row, get a new row, and so on • Static assignment reduces concurrency (from n to p) • block assign, reduces communication by keeping adjacent rows together EECS 570: Fall 2003 -- rev1

More Related