Implicitly Parallel Programming Models

Implicitly Parallel Programming Models For Thousand-Core Microprocessors Hwu, Ryoo, Ueng, Kelm, Gelado,Stone, Kidd, Barghsorkhi, Mahesri,Tsao, Navarro, Lumetta, Frank, Patel University of Illinois, Urbana-Champaign Universtat Politecnica de Catalunya

The Next Software Challenge • Today, TLP models potentially make more effective use of area and power than large ILP CPU’s • Multi-core, Many-core, Cell BE™, GPGPU, PPU, etc. • Scaling from 4-core to 1000-core chips could happen in the next 15 years • All semiconductor market domains converging to concurrent system platforms • PCs, game consoles, mobile handsets, servers, supercomputers, networking, etc. We need to make these systems effectively execute valuable, demanding apps.

One Plausible View • One might be able to stretch sequential programming models to small scale multi-core systems • As for large scale, many-core systems, explicitly parallel programming models must be used I would like to convince you that it is the other way around.

Implicitly Parallel Programming Models • Programmers choose or create algorithms that have desired amount of parallelism • The parallel algorithms are expressed in traditional sequential programming languages • With assertions of design properties and assumptions • The tools, compilers, runtime, and HW jointly reconstruct parallelism and arrange for parallel execution of the program

One Important Disclaimer Implicitly Parallel Programming Model ≠ Legacy sequential code • Code change needed whether we use implicitly or explicitly parallel programming models • Legacy code was not developed to give tools deep knowledge about the computation being performed. • E.g., source code does not express high-level properties and assumptions

Another Important Disclaimer Implicitly Parallel Programming Model ≠ Sequential Algorithms • When there is no parallelism in the algorithm,all is lost • The question is how to go from parallel algorithmsto parallel machine code

Human vs. Machine-level Programming Models Pentium Itanium NVida GPU

Cost of Tuning ExplicitlyParallel Programming • Developing an explicitly parallel app is an expensive proposition • Understanding and performing all optimizations needed is hard; transactional memory might help • Optimized parallelization will likely involve finessing libraries and infrastructure software(lack of composability) • The larger the number of cores, the less intuitive the optimization process will be

Cost of Explicitly Parallel Program Verification and Support • Verification and support of a hand parallelized program is even more expensive • Application-level testing needs to cover a much larger state space • Should contain the verification complexity through guarantees at the interface level • Reproducing bugs can be very challenging, deterministic execution semantics help Some of the points in "Software and the Concurrency Revolution." Sutter and Larus, ACM Queue 3(7), Sep. 2005.

Cost of Scaling Explicitly Parallel Programs • Programs developed in explicitly parallel form are unlikely to scale withMoore’s Law • Performance comes from optimizations that craft the computation and data accesses to best get around limitations • Every new generation or new platform will likely require app developers to redo their application

Value and Complexity Magnified

Implicitly Parallel Programming Flow Stylized C/C++ w/ assertions Deep analysis w/ feedback assistance Human Concurrency discovery For increased composability Visualizable concurrent form Systematic search for best/correct code gen Code-gen space exploration For increased scalability Visualizable sequential assembly code with parallel annotations parallel execution w/ sequential semantics Parallel HW w/sequential state gen For increased supportability Debugger

Application Scaling • Traditional GP compilers cover legacy apps (pit) based on sequential algorithms • Explicitly parallel programming models, OpenMP, and MPI cover simpler, earlier parallel apps (peach skin) • Implicitly parallel programming models extend the coverage into more sophisticated parallel apps (peach flesh) • “application scaling”

Different algorithms may expose different levels of parallelism while achieving desired result In motion estimation, can use previous vectors (either from space or time) as guess vectors Parallelism in Algorithms(H.263 motion estimation example)

Concurrency Discovery • Reconstruction of concurrency in a parallel algorithm expressed in a traditional sequential language • Program analysis techniques have advanced sufficiently in the past decade to do this job • Programmers need to avoid crypticprogramming styles • Programmer annotations on high-level design properties and assumptions helpful • This needs to be done when composing an explicitly parallel program from third-party components anyway

MPEG-4 H.263 EncoderParallelism Rediscovery (b) (c) (d) (e) (a)

0 1 2 3 time 1 2 3 4 1 2 3 4 1 2 3 4 Operations Performed On 16 x 16 Macroblocks Motion Estimation Motion Compensation , ( a ) Loop Partitioning Frame Subtraction DCT & Quantization 1 1 2 2 Dequantization , IDCT , Frame Addition 3 3 4 4 Main Memory Access ( b ) Loop Fusion + Memory Privatization Code Gen Space Exploration

Conclusion • Implicitly Parallel Programming Models • Increase code re-use through parallelism rediscovery • Reduce tedious programming chores through code-gen space exploration • Control software support cost through sequential verification and debugging interfaces • Use of appropriate application algorithms, coding style and compiler analysis techniques is the key to success

Implicitly Parallel Programming Models

Implicitly Parallel Programming Models

Presentation Transcript

Parallel Programming Models and Paradigms

PARALLEL programming

Parallel Programming Models

Parallel Programming Models

Parallel Programming Models, Languages and Compilers

Parallel Programming Models and Paradigms

Parallel Programming

Parallel Programming models I

Parallel Programming Models

Parallel Programming

Lecture 2: Part I Parallel Programming Models

Aspects of practical parallel programming Parallel programming models Data parallel

Parallel and Distributed Programming Models and Languages

A Case for Language Support for Implicitly Parallel Programming

Introduction to Parallel Architectures and Programming Models

Parallel Programming Models

Parallel Programming Models and Paradigms

Parallel Programming Models and Paradigms