160 likes | 302 Views
CSCI 4125 Programming for Performance. Andrew Rau-Chaplin arc@cs.dal.ca www.cs.dal.ca/~arc. Course Objectives. Explore techniques for designing, implementing and evaluating efficient programs for Sequential computers, Shared-Memory Multiprocessors, and Distributed Memory Multicomputers
E N D
CSCI 4125 Programming for Performance Andrew Rau-Chaplin arc@cs.dal.ca www.cs.dal.ca/~arc
Course Objectives • Explore techniques for designing, implementing and evaluating efficient programs for • Sequential computers, • Shared-Memory Multiprocessors, and • Distributed Memory Multicomputers • Make it go fast!
Performance oriented dev cycle • techniques and tools for a performance oriented development cycle • Algorithm design • Implementation • Benchmarking/evaluation • Performance Tuning
Quantifying performance • Themes include: • evaluation of performance • design of test data sets • issues of stability/reliability • scalability • common performance enhancing techniques • parallel algorithm design techniques • identification and elimination of dependencies
Skills Development • how to design experiments/benchmarks • how to use of statistics in performance evaluation • how to instrument code to obtain reliable timings • how to use compiler switches • how to use a profiler and performance tuning tools • how to use a debugger/tracing tools • how to plot performance results
Introduction to Parallelism Parallel Programming Parallel Architectures Parallel Algorithms Parallel Applications Other Parallel Architectures & Algorithms Topics
Official Outline • This course explores the design, implementation, and evaluation of computer programs for applications in which performance is a central issue. • In the sequential and multi-core settings, it explores topics such as profiling, cache effects, I/O performance, floating-point issues, multi-threading, and performance tuning techniques. • It introduces techniques for the design, implementation and evaluation of programs for Multicore processors, Shared-Memory Multiprocessors (SMPs) and Distributed Memory Multicomputers (Clusters).
Resources • Course web page: • www.cs.dal.ca/~arc/teaching/CSc4125 • All notes, readings, assignments • Parallel Machines • Your laptop! • CGM6 & CGM7 • Hugh
Readings • Sorry no text book! • Will Assign Readings
Books • Introduction to High Performance Computing for Scientists and Engineers by Georg Hager and Gerhard Wellein • Parallel Programming by Peter Pacheco, Morgan Kaufman • Structured Parallel Programming by Michael McCool, Arch D. Robison, and James Reinders • Parallel Programming in C with MPI and OpenMP by Quinn • Parallel Programming with Intel Parallel Studio XE by S. Blair-Chappell and A. Stokes • Using OpenMP: Portable Shared Memory Parallel Programming By Barbara Chapman, Gabriele Jost and Ruud van der Pas; • Parallel Programming in OpenMP, by Rohit Chandra, Dave Kohr, Jeff McDonald, Morgan Kaufman
Prerequisites • Knowledge of C • Csci3120: Operating systems • Good to have • CSci3110 - Analysis of Algorithms
Course Evaluation • Assignments 50% • Midterm 25% • Final Project 20% • Participation 5% • See course web page for assignment copies and due dates
Assignments • Selected From • Sequential Optimization • OpenMP • Cilk • Thread building blocks • MPI • Hadoop • CUDA/OpenCL Best 4 out of 5 count towards final grade!
“Midterm” • About 2/3rd of the way through… • To test conceptual knowledge gained from classes and readings • If you have not done the readings you will not pass the midterm
Final Project • Select your own topic • Either • Optimize an existing codebase • Design and implementation of an efficient new code • Components: Literature/Code review, some research or programming work, final paper, presentation • Main Deliverable: Conference style paper plus short in-class talk
Questions • Why are you taking this course? • Which performance oriented technologies are you interested in? • How will you know if the course has been a success for you?