1 / 11

Understanding Parallel Computing: Hardware, Programming Models, and Performance Analysis

Gain an in-depth understanding of when and how parallel computing is useful, different parallel computing hardware options, programming models and tools, important parallel applications and algorithms, managing parallelism, tradeoffs, performance analysis and tuning.

angielucas
Download Presentation

Understanding Parallel Computing: Hardware, Programming Models, and Performance Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What you should get out of CS240A In depth understanding of: • When is parallel computing useful? • Understanding of parallel computing hardware options. • Multi-cores. Clusters, shared memory, cache/memory, I/O. • Programming models and tools. • Some important parallel applications and the algorithms • Where is the parallelism? How to manage? • Tradeoff with memory latency, communication, I/O. • Performance analysis (how to evaluate) and tuning • Exposure to various open research questions

  2. Summary: Memory Hierarchy • Details of machine are important for performance • Processor and memory system (not just parallelism) • Before you parallelize, make sure you’re getting good serial performance (Megaflops) • Locality is at least as important as computation • Temporal: re-use of data recently used • Spatial: using data nearby that recently used • Machines have memory hierarchies • 100s of cycles to read from DRAM (main memory) • Caches are fast (small) memory that optimize average case • Can rearrange code/data to improve locality

  3. Questions You Should Be Able to Answer • What is the key to understand algorithm efficiency in our simple memory model? • What is tiling? • Why does block matrix multiply reduce the number of memory references? • What are the BLAS? • Why does loop unrolling improve uniprocessor performance?

  4. CS267 Lecture 3 Hardware and Programming Models • Three basic conceptual models • Shared memory • Distributed memory • Data parallel and hybrid of these machines • Characteristics • Shared memory: impact of cache/consistency • Synchronization • Distributed memory: • Synchronization • how to communicate

  5. CS267 Lecture 3 Programming and Parallelism Management • Threads • Thread management • Synchronization • Locks, semaphore, condition variables, barriers • Correctness • MPI • Coordination, communication primitives • OpenMP • How parallelize loops/regions • MapReduce • Map, reduce, combine • Basic parameters

  6. CS267 Lecture 3 Program Parallelization /Parallel Execution • Program/data Mapping • Program partitioning • dependence analysis • Code/data distribution. • Scheduling of execution • Load balancing • SPMD code • Owner computers rule • Loop transformation • Blocking, unrolling, skewing. • Loop interchange

  7. CS267 Lecture 3 Parallelism in Scientific Computing • Matrix multiplication • HW1. Partitioning vs. parallelism. • Numerical methods for ODE/PDE • High level view • Approximation with linear equations • Iterative methods • Particle methods • Where is the parallelisms? • How to manage parallelism?How to partition?

  8. CS267 Lecture 3 Parallelism in Data-Intensive Computing • Log analysis. HW2 • Parallel Boosted Regression Trees for Web Search Ranking  WWW 2011. • Where is parallelism. • What is the scheduling model? • Optimizing Parallel Algorithms for All Pairs Similarity Search.  WSDM'2013. • Essentially matrix multiplication problem • Where is parallelism? How to utilize parallelism with better performance?

  9. CS267 Lecture 3 MapReduce Optimziation • Incoop: MapReduce for Incremental Computations, ACM Cloud 2011. • Strategies for incremental computing • Cache results, build tree-depenedence • Adaptive data partitioning (splits) • A Platform for Scalable One-pass Analytics using MapReduce, SIGMOD 2011. • What is the cost of map-reduce execution • What parameters are adjusted • How to speedup map-reduce communication?

  10. CS267 Lecture 3 Graph computation & Shared Memory programming • PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs Slides. OSDI 2012 • Programming model • How to partition graph computation? • An Analysis of Linux Scalability to Many Cores  OSDI 2010 • Understand characteristics of shared memory architecture. True/false sharing. • Contention removal in lock& reference counter

  11. Ranking Ranking Ranking Ranking Ranking Ranking Classification Web page index Parallelism in Internet services: Ask.com search engine example Client queries Traffic load balancer Frontend Frontend Frontend Frontend PageInfo Page Info Hierarchical Cache Clustering Middleware Cache Cache Cache Document Abstract Web page index Document Abstract Document Abstract Document description Structured DB Synchronization Fault tolerance

More Related